macyang - MATH 4432 Statistical Machine Learning
An Introduction to Statistical Learning
We can predict Y using $\hat Y=\hat f(X)$
reducible error
irreducible error: even if $\hat Y=f(X)$, the $Y$ is still a function of $\epsilon$, and the error is $|\hat Y-Y|$
we assume that $\epsilon$ is independent of X, and we know that $\mathbb{E}(\epsilon)=0,\mathbb{E}^2(\epsilon)=0$
$$ \mathbb{E}[(\hat Y-Y)^2]
=\mathbb{E}[(\hat f(X)-f(X)-\epsilon)^2]\\
=\mathbb{E}[(\hat f(X)-f(X))^2+\epsilon^2-2\epsilon(\hat f(X)-f(X))]\\
=\mathbb{E}[(\hat f(X)-f(X))^2]+ \mathbb{E}[\epsilon^2]-2\mathbb{E}[\epsilon]\mathbb{E}[ \hat f(X)-f(X) ]\\
=\mathbb{E}[(\hat f(X)-f(X))^2]+Var(\epsilon)-\mathbb{E}^2(\epsilon) \\
= \underbrace{\mathbb{E}[(\hat f(X)-f(X))^2]}\text{reducible}+\underbrace{Var(\epsilon)}\text{irreducible} $$
Keep in mind that the irreducible error will always provide an upper bound on the accuracy of our prediction for Y. This bound is almost always unknown in practice. The focus of this course is on techniques for estimating f with the aim of minimizing the reducible error.