Within this very nice piece, Rob drops this bomb of mathematical knowledge:
It is not necessary to actually fit
separate models when computing the CV statistic for linear models.
separate models when computing the CV statistic for linear models.Say what?
Here is a broader excerpt and the method itself (after the jump).
While cross-validation can be computationally expensive in general, it is very easy and fast to compute LOOCV for linear models. A linear model can be written as
![Rendered by QuickLaTeX.com \[<br />
\mathbf{Y} = \mathbf{X}\mbox{\boldmath$\beta$} + \mathbf{e}.<br />
\]](http://robjhyndman.com/researchtips/wp-content/ql-cache/quicklatex.com-7e50600b231371a08c582b7d46308497_l2.gif)
Then
![Rendered by QuickLaTeX.com \[<br />
\hat{\mbox{\boldmath$\beta$}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}<br />
\]](http://robjhyndman.com/researchtips/wp-content/ql-cache/quicklatex.com-8b37f88a8ed5d515da0e7efe8975d10b_l2.gif)
and the fitted values can be calculated using
![Rendered by QuickLaTeX.com \[<br />
\mathbf{\hat{Y}} = \mathbf{X}\hat{\mbox{\boldmath$\beta$}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} = \mathbf{H}\mathbf{Y},<br />
\]](http://robjhyndman.com/researchtips/wp-content/ql-cache/quicklatex.com-8fe5c63e9e6182338919d9e74d6697ab_l2.gif)
where
is known as the “hat-matrix” because it is used to compute
(“Y-hat”).
is known as the “hat-matrix” because it is used to compute
(“Y-hat”).If the diagonal values of
are denoted by
, then the cross-validation statistic can be computed using
are denoted by
, then the cross-validation statistic can be computed using![Rendered by QuickLaTeX.com \[<br />
\text{CV} = \frac{1}{n}\sum_{i=1}^n [e_{i}/(1-h_{i})]^2,<br />
\]](http://robjhyndman.com/researchtips/wp-content/ql-cache/quicklatex.com-9c139a93f634723776c3ee3c8d538c89_l2.gif)
where
is the residual obtained from fitting the model to all
observations. See Christensen’s book Plane Answers to Complex Questions for a proof. Thus, it is not necessary to actually fit
separate models when computing the CV statistic for linear models. This remarkable result allows cross-validation to be used while only fitting the model once to all available observations.
is the residual obtained from fitting the model to all
observations. See Christensen’s book Plane Answers to Complex Questions for a proof. Thus, it is not necessary to actually fit
separate models when computing the CV statistic for linear models. This remarkable result allows cross-validation to be used while only fitting the model once to all available observations.
No comments:
Post a Comment