Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total sum of squares formula #2373

Closed
vadim0x60 opened this issue Feb 12, 2024 · 3 comments
Closed

Total sum of squares formula #2373

vadim0x60 opened this issue Feb 12, 2024 · 3 comments
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@vadim0x60
Copy link

In _r2_score_compute r2.py:80 Total Sum of Squares is defined as

mean_obs = sum_obs / num_obs
tss = sum_squared_obs - sum_obs * mean_obs

i.e.

$$\text{TSS}= \sum_{i=1}^{n}y_{i}^2 - (\sum_{i=1}^{n}y_{i})\bar{y}$$

which is different from the usual formula

$$\mathrm{TSS}=\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^2$$

Am I missing someting? Could this be a mistake

@vadim0x60 vadim0x60 added bug / fix Something isn't working help wanted Extra attention is needed labels Feb 12, 2024
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi @vadim0x60, thanks for opening this issue.
The two formulations are equal:

$$TSS = \sum_{i=1}^n (y_i - \bar{y})^2$$

substitute the definition of

$$\bar{y} = \frac{\sum_{i=1}^{n} y_i}{n}$$

into the expression we get

$$TSS = \sum_{i=1}^n (y_i - \frac{1}{n}\sum_{i=1}^{n} y_i)^2$$

then expand the square term

$$TSS = \sum_{i=1}^{n} \left( y_i^2 - \frac{2y_i}{n} \sum_{i=1}^{n} y_i + \frac{1}{n^2} \sum_{i=1}^{n} y_i^2 \right)$$

Next, distributing the summation across the terms:

$$TSS = \sum_{i=1}^{n} y_i^2 - \frac{2}{n}\sum_{i=1}^{n} y_i^2 + \frac{1}{n}\sum_{i=1}^{n} y_i^2$$

And finally, simplifying the expression:

$$TSS = \sum_{i=1}^{n} y_i^2 - \frac{1}{n}(\sum_{i=1}^{n} y_i)^2$$

the last term we can rewrite again by using the definition of the average (in reverse order)

$$TSS = \sum_{i=1}^{n} y_i^2 - \frac{n^2 \times \bar{y}^2}{n}$$

expand again

$$TSS = \sum_{i=1}^{n} y_i^2 - \frac{n^2 \times \bar{y} \times \bar{y}}{n}$$

split

$$TSS = \sum_{i=1}^{n} y_i^2 - \frac{n^2 \times \bar{y}}{n} \times \bar{y}$$

simplify

$$TSS = \sum_{i=1}^{n} y_i^2 - n \times \bar{y} \times \bar{y}$$

again use the definition of the average to convert into the expression we want

$$TSS = \sum_{i=1}^{n} y_i^2 - \left( \sum_{i=1}^{n} y_i \right) \bar{y}$$

Hopefully this convince you that the two expressions are equal. The reason for why we use one over the other is kind of lost on me right now, but if I remember correctly one expression is a bit more numerical stable than the other, which is why we have chosen it.

@vadim0x60
Copy link
Author

Thanks for taking the time to lay this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants