Todo

Reference & Subpage

Suppose that we observe a quantitative response Y and p different predictors, X = $(X_1, X_2, . . . , X_p)$.
We assume that there is some relationship between Y and X, which can be written in a very general form $Y=f(X)+\epsilon$
- where f is some fixed but unknown function
- and $\epsilon$ is a random error term, which is independent of X and has mean zero.
- In this formulation, f represents the systematic information that X provides about Y.

We can predict Y using $\hat Y=\hat f(X)$
- where $\hat f$ represents our estimate for $f$ and $\hat Y$ represents the resulting prediction for $Y$
reducible error
irreducible error: even if $\hat Y=f(X)$, the $Y$ is still a function of $\epsilon$, and the error is $|\hat Y-Y|$
- The quantity $\epsilon$ may also contain unmeasurable variation or unmeasured variables that are useful in predicting Y.
we assume that $\epsilon$ is independent of X, and we know that $\mathbb{E}(\epsilon)=0,\mathbb{E}^2(\epsilon)=0$

$$ \mathbb{E}[(\hat Y-Y)^2]

=\mathbb{E}[(\hat f(X)-f(X)-\epsilon)^2]\\

=\mathbb{E}[(\hat f(X)-f(X))^2+\epsilon^2-2\epsilon(\hat f(X)-f(X))]\\

=\mathbb{E}[(\hat f(X)-f(X))^2]+ \mathbb{E}[\epsilon^2]-2\mathbb{E}[\epsilon]\mathbb{E}[ \hat f(X)-f(X) ]\\

=\mathbb{E}[(\hat f(X)-f(X))^2]+Var(\epsilon)-\mathbb{E}^2(\epsilon) \\

= \underbrace{\mathbb{E}[(\hat f(X)-f(X))^2]}\text{reducible}+\underbrace{Var(\epsilon)}\text{irreducible} $$
Keep in mind that the irreducible error will always provide an upper bound on the accuracy of our prediction for Y. This bound is almost always unknown in practice. The focus of this course is on techniques for estimating f with the aim of minimizing the reducible error.