-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
69 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Does mean centering or feature scaling affect a Principal Component Analysis? | ||
|
||
Let us think about whether it matters or not if the variables are centered for applications such as Principal Component Analysis (PCA) if the PCA is calculated from the covariance matrix (i.e., the k principal components are the eigenvectors of the covariance matrix that correspond to the k largest eigenvalues. | ||
|
||
### 1. Mean centering does not affect the covariance matrix | ||
Here, the rational is: If the covariance is the same whether the variables are centered or not, the result of the PCA will be the same. | ||
|
||
Let’s assume we have the 2 variables **x** and **y**. Then the covariance between the attributes is calculated as | ||
|
||
data:image/s3,"s3://crabby-images/29f7b/29f7b93bec0bcf20fc77543897e279eb4dd9bd47" alt="" | ||
|
||
Let us write the centered variables as | ||
|
||
data:image/s3,"s3://crabby-images/3c46d/3c46d1d777b3d12a90c506a4a776c2121c146cf6" alt="" | ||
|
||
The centered covariance would then be calculated as follows: | ||
|
||
data:image/s3,"s3://crabby-images/c3787/c37876191bc333206fe05f055afa18ddad830be0" alt="" | ||
|
||
|
||
But since after centering, x̄'=0 and ȳ'=0 we have | ||
|
||
|
||
data:image/s3,"s3://crabby-images/bd07f/bd07f510bebf8dba25f392f3ef8d9f67ea5f1ea1" alt="" | ||
|
||
which is our original covariance matrix if we resubstitute back the terms | ||
|
||
data:image/s3,"s3://crabby-images/f7584/f7584176e6e2c05d68447bd16ece6bd1da9bd550" alt="" | ||
|
||
Even centering only one variable, e.g., **x** wouldn’t affect the covariance: | ||
|
||
data:image/s3,"s3://crabby-images/4ba77/4ba7739d5a287624e23f9c81548727fb3162388f" alt="" | ||
|
||
|
||
### 2. Scaling of variables does affect the covariance matrix | ||
|
||
If one variable is scaled, e.g, from pounds into kilogram (1 pound = 0.453592 kg), it does affect the covariance and therefore influences the results of a PCA. | ||
|
||
Let *c* be the scaling factor for *x* | ||
|
||
Given that the “original” covariance is calculated as | ||
|
||
data:image/s3,"s3://crabby-images/40bad/40badca07f64c37889d2f0bd6f219e00a6478c23" alt="" | ||
|
||
the covariance after scaling would be calculated as: | ||
|
||
data:image/s3,"s3://crabby-images/3fe02/3fe02c0e81ce827cb2f65fd43098d37096b24b9a" alt="" | ||
|
||
|
||
Therefore, the covariance after scaling one attribute by the constant *c* will result in a rescaled covariance *c<sub>σ<sub>xy</sub></sub>*. So, if we’d scaled x from pounds to kilograms, the covariance between x and y will be 0.453592 times smaller. | ||
|
||
### 3. Standardizing affects the covariance | ||
|
||
|
||
Standardization of features will have an effect on the outcome of a PCA (assuming that the variables are originally not standardized). This is because we are scaling the covariance between every pair of variables by the product of the standard deviations of each pair of variables. | ||
|
||
The equation for standardization of a variable is written as | ||
|
||
data:image/s3,"s3://crabby-images/a326d/a326d105fd29b123b397fd34474ff6f632712b65" alt="" | ||
|
||
The “original” covariance matrix: | ||
|
||
data:image/s3,"s3://crabby-images/6e706/6e706d6d7ae90b78d6d6820c9abace07f604822f" alt="" | ||
|
||
And after standardizing both variables: | ||
|
||
data:image/s3,"s3://crabby-images/00d01/00d0137c10c344389fe0429742c953461eb630f8" alt="" |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.