DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

R Squre Statistic and Correlation in Linear Regression


This note is the proof of Exercise 7 in section 3.7 of An Introduction to Statistical Learning.

For n observations \((x_i, y_i), i=1..n\), let:

$$\begin{equation}\begin{aligned} ss_{xx} &\equiv \sum_{i=1}^n (x_i - \bar x)^2 \\ &= \sum_{i=1}^n x_i^2 - 2 \bar x \sum_{i=1}^n x_i + \sum_{i=1}^n \bar x^2 \\ &= \sum_{i=1}^n x_i^2 - 2n\bar x^2 + n \bar x^2 \\ &= \sum_{i=1}^n x_i^2 - n\bar x^2 \end{aligned}\end{equation} \tag{1}\label{eq1} $$

Substitute \(x\) with \(y\), we get:

$$ ss_{yy} \equiv \sum_{i=1}^n (y_i - \bar y)^2 = \sum_{i=1}^n y_i^2 - n\bar y^2 \tag{2}\label{eq2} $$

And:

$$\begin{equation}\begin{aligned} ss_{xy} &\equiv \sum_{i=1}^n (x_i - \bar x) (y_i - \bar y) \\ &= \sum_{i=1}^n (x_i y_i - \bar x y_i - x_i \bar y + \bar x \bar y) \\ &= \sum_{i=1}^n x_i y_i - n \bar x \bar y - n \bar x \bar y + n \bar x \bar y \\ &= \sum_{i=1}^n x_i y_i - n\bar x\bar y \end{aligned}\end{equation} \tag{3}\label{eq3} $$

For correlation between \(X\) and \(Y\), also denoted as \(Cor(X, Y)\):

$$ r \equiv \frac{\sum_{i=1}^n(x_i - \bar x)(y_i - \bar y)} {\sqrt{\sum_{i=1}^n(x_i - \bar x)^2} \sqrt{\sum_{i=1}^n(y_i - \bar y)^2}} = \frac{ss_{xy}}{\sqrt{ss_{xx}ss_{yy}}} \tag{4} \label{eq4} $$

Here \(\bar x = \frac{\sum_{i=1}^n x_i}n, \bar y = \frac{\sum_{i=1}^n y_i}n\).

See Correlation Coefficient and Least Squares Fitting for detailed reasonings.

For the target regression function \(y = a + bx\), \(a\) and \(b\) should be the value that make \(RSS\) get its minimum, where

$$ RSS \equiv \sum_{i=1}^n(y_i - \hat y_i)^2 = \sum_{i=1}^n [y_i - (a + b x_i)]^2 \tag{5}\label{eq5} $$

So we have:

$$ \frac{\partial (RSS)}{\partial a} = -2 \sum_{i=1}^n [y_i - (a + b x_i)] = 0 \\ \frac{\partial (RSS)}{\partial b} = -2 \sum_{i=1}^n [y_i - (a + b x_i)] x_i = 0 $$

These lead to:

$$ \begin{align} na + b\sum_{i=1}^n x_i = \sum_{i=1}^n y_i \tag{6} \label{eq6}\\ a \sum_{i=1}^n x_i + b \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i \tag{7}\label{eq7} \end{align}$$

Eq\(\eqref{eq7}\) can be written as:

$$ an \bar x + b \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i \tag{8}\label{eq8} $$

From eq\(\eqref{eq6}\) we have:

$$a = \bar y - b \bar x \tag{9}\label{eq9}$$

Take eq\(\eqref{eq1}, \eqref{eq3}, \eqref{eq9}\) into eq\(\eqref{eq8}\), we get:

$$ (\bar y - b \bar x) n \bar x + b \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i \\ n \bar x \bar y - bn \bar x^2 + b \sum_{i=1}^n x_i^2 = \sum_{i=1}^n x_i y_i \\ (\sum_{i=1}^n x_i^2 - n \bar x^2) b = \sum_{i=1}^n x_i y_i - n \bar x \bar y \\ \therefore b = \frac{\sum_{i=1}^n x_i y_i - n \bar x \bar y}{\sum_{i=1}^n x_i^2 - n \bar x^2} = \frac{ss_{xy}}{ss_{xx}} \tag{10}\label{eq10} $$

Take eq\(\eqref{eq9}, \eqref{eq10}\) into eq\(\eqref{eq5}\), we get:

$$\begin{equation}\begin{aligned} RSS &= \sum_{i=1}^n [y_i - (a + b x_i)]^2 \\ &= \sum_{i=1}^n (y_i - \bar y + b \bar x - b x_i)^2 \\ &= \sum_{i=1}^n [(y_i - \bar y) - b (x_i - \bar x)]^2 \\ &= \sum_{i=1}^n (y_i - \bar y)^2 + b^2 \sum_{i=1}^n (x_i - \bar x)^2 -2b \sum_{i=1}^n (x_i - \bar x) (y_i - \bar y) \\ &= ss_{yy} + b^2 ss_{xx} -2bss_{xy} \\ &= ss_{yy} + \frac{ss_{xy}^2}{xx_{xx}^2}ss_{xx} -2 \frac{ss_{xy}}{ss_{xx}}ss_{xy} \\ &= ss_{yy} - \frac{ss_{xy}^2}{ss_{xx}} \end{aligned}\end{equation} \tag{11}\label{eq11} $$

Take eq\(\eqref{eq11}\) into equation (3.17) of An Introduction to Statistical Learning, we have:

$$ R^2 = \frac{TSS - RSS}{TSS} = \frac{ss_{yy} - ss_{yy} + \frac{ss_{xy}^2}{ss_{xx}}}{ss_{yy}} = \frac{ss_{xy}^2}{ss_{xx} ss_{yy}} $$

With eq\(\eqref{eq4}\), we get \(R^2 = Cor(X, Y)^2\).

Here \(RSS\), \(R^2\), \(TSS\) and \(Cor(X, Y)\) is defined in Equation (3.16) ~ (3.18) of An Introduction to Statistical Learning. \(ss\), \(r\) is defined in Correlation Coefficient and Least Squares Fitting.

Other references:

  • Relationship between R2 and correlation coefficient


Published

Apr 11, 2018

Last Updated

Apr 11, 2018

Category

Tech

Tags

  • linear regression 1
  • statistics 2

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor