-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathresiduals.Rmd
216 lines (108 loc) · 8.73 KB
/
residuals.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: "Overview of Residuals"
author: "ggResidpanel version `r packageVersion('ggResidpanel')`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Overview of Residuals}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# Linear Models
The option `type` in the function `resid_panel`, `resid_compare`, and `resid_interact` offers the options of "response", "pearson", or "standardized" for linear models.
By linear models, this text is referring to models of the form: $Y = X\beta+\epsilon$ where $\epsilon \sim N(0,\sigma^2)$.
### Response
The "response" residuals are the raw residuals: $y_i-\hat{y_i}$. In this case, $\hat{y} = X\hat{\beta}$.
The `r` code used to obtain the raw residuals from a linear model is: `resid(model, type="response")`.
### Pearson
In the case of a linear model, a Pearson residual is the raw residual scaled by the estimate of the variance of the response variable: $\hat{\sigma}$. The formula is below:
$\frac{y_i-\hat{y_i}}{\sqrt{\hat{\sigma}^2}}$
Selecting the "pearson" option in the `resid` function does not actually return this result. Thus, the following code is implemented in order to retrieve the Pearson residuals: `resid(model, type = "response") / summary(model)$sigma`
http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_intromod_sect046.htm
https://v8doc.sas.com/sashtml/insight/chap39/sect56.htm
### Standardized
$\frac{y_i-\hat{y_i}}{\sqrt{\sigma^2(1-h_{ii})}}$ where $h_{ii}$ is the diagonal from the matrix $X(X'X)^{-1}X'$
This is what 'stdres(model)' reports for a linear model.
https://v8doc.sas.com/sashtml/insight/chap39/sect54.htm
# Linear Model
'Proc Reg' offers raw residuals and studentized residuals (internally).
'proc glm' offers raw, internally studentized, and externally studentized residuals.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect034.htm
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect020.htm
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introreg_sect010.htm
SAS refers to the residuals as standardized for when $\sigma^2$ is known. SAS refers to the residuals as internally studentized when $s^2$ is used. They are called externally studentized when $s_{-i}^2$ is used.
Scaled (divided by standard deviation) are called pearson.
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_intromod_a0000000355.htm
Not consistent though and sometimes calls the internally studentized, the standardized residual:
http://support.sas.com/documentation/cdl/en/sgug/59902/HTML/default/viewer.htm#fit_sect54.htm
*SUGGESTION: Only offer raw, pearson, and internally studentized residuals (what stdres does)*
#### Deviance Residuals
$\frac{y_i-\hat{y_i}}{\sqrt{s^2}}$ or $y_i-\hat{y_i}$
I found this in my 520 notes. The first is the scaled deviance. The un-scaled deviance is the raw residuals (page 130 of Dr. Kaiser's 520 notes).
The function 'resid(model, type="deviance")' reports the raw residuals.
These don't really make sense to use for linear models so might want to add in a warning if a person enters a linear model and request deviance residuals.
#### Studentized Residuals
$\frac{y_i-\hat{y_i}}{\sqrt{\sigma_{-i}^2(1-h_{ii})}}$ where $h_{ii}$ is the diagonal from the matrix $X(X'X)^{-1}X'$ and $\sigma_{-i}^2$ is the estimate of the variance of the model not including row $i$.
This is what 'studres(model)' reports for a linear model.
# Generalized Linear Model
'proc genmod' offers pearson residuals, deviance residuals, raw residuals, standardized pearson residuals, and standardized deviance residuals
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect020.htm
Deviance and standardized deviance:
http://support.sas.com/documentation/cdl/en/sgug/59902/HTML/default/viewer.htm#fit_sect55.htm
Pearson and standardized deviance:
http://support.sas.com/documentation/cdl/en/sgug/59902/HTML/default/viewer.htm#fit_sect56.htm
*SUGGESTION: provide raw residuals on ilink scale, deviance, pearson, standardized pearson, and standardized deviance.*
#### Raw Residuals
$y_i-\hat{\mu_i}$ where $\hat{\mu_i}$ is the prediction with the inverse link function applied.
This is what 'resid(model, type="response")' returns for a generalized linear. I checked this for Poisson, binomial, and Bernoulli.
#### Pearson Residuals.
Poisson:
$\frac{y_i-\hat{\lambda_i}}{\sqrt{\hat{\lambda_i}}}$
Binomial:
$\frac{y_i/m_i-\hat{\pi_i}}{\sqrt{\hat{\pi_i}(1-\hat{\pi_i})}}*\sqrt{m_i}$
Bernoulli:
$\frac{y_i-\hat{\pi_i}}{\sqrt{\hat{\pi_i}(1-\hat{\pi_i})}}$
This is what the 'resid(model, type="pearson")' returns for a generalized linear model. I checked this for Poisson, binomial, and Bernoulli.
#### Deviance Residuals
Poisson:
$sign(y_i-\hat{\lambda_i})\sqrt{2(y_i*log(y_i/\hat{\lambda_i})-(y_i-\hat{\lambda_i}))}$
Binomial:
$sign(y_i/m_i-\hat{\pi_i})\sqrt{2(y_ilog(y_i/(m_i\hat{\pi_i}))+(m_i-y_i)log((m_i-y_i)/(m_i-m_i\hat{\pi_i}))}$
Bernoulli:
$sign(y_i-\hat{\pi_i})\sqrt{-2(y_ilog(\hat{\pi_i})+(1-y_i)log(1-\hat\pi_i))}$
This is what the 'resid(model, type="deviance")' returns for a generalized linear model. I checked this for Poisson, binomial, and Bernoulli. I am not sure what is does for when $y_i=0$.
#### Standardized Residuals
For each type of residual, for a generalized linear model, SAS uses the formula below.
$\frac{r_{type}}{\phi\sqrt{(1-h_{ii})}}$ where $h_{ii}$ is the diagonal from the matrix $W^{1/2}X(X'WX)^{-1}X'W^{1/2}$.
For Poisson, binomial, and Bernoulli, $\phi$ is assumed to be one.
When I calculated this by hand using the 'hatvalues' function, it matched exactly what SAS returned for standardized Pearson and and standardized deviance residuals.
For Gamma and other continuous distributions, the dispersion parameter is estimated using the deviance/residual df. This matches what SAS returns for standardized if set 'scale=deviance' in the 'model' statement.
I have not found exactly how R is calculating it yet. What I have calculated so far that is closest to stdres is using the deviance/df.residual in place of $\phi$ for all distributions on the 'pearson' residuals (for Poisson and binomial). It is different for each distribution.
This was all tested in proc genmod.
'rstandard' calculates the standardized deviance residuals just like SAS.
# Mixed Linear Model
'proc mixed' reports raw (conditional and marginal) residuals, the pearson residuals, and the internally studentized (or standardized) residuals.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect015.htm
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect024.htm
Also includes notes on Cook's D and leverage for mixed linear model. These both lead to acquiring matrices that are not simple (or even possible in some cases) to extract from an lmer object.
*SUGGESTION: include raw, pearson for now, and eventually internally studentized (standardized) residuals (r is set up to do conditional. Will have to code all by hand if want raw marginal.*
#### Raw Residuals
$y_i-\hat{y_i}|blups$
The 'resid(model, type="response")' gives the raw residuals where the predictions include the blups of the random effects.
#### Pearson Residuals
*The 'resid(model, type="pearson")' does not give the pearson residuals.* It gives the raw residuals conditional on the blups.
For the conditional residuals, SAS calculates it in the following way:
$\frac{y_i-\hat{y_i}|blups}{\sqrt{\sigma^2}}$ where $\sigma^2$ is the estimate of the residual variance.
# Mixed Generalized Linear Model
'proc glimmix' reports raw (marginal and conditional), pearson, and internally studentized (standardized) residuals.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glimmix_a0000001413.htm
*SUGGESTION: Include raw, pearson, and deviance as given by the 'resid' function.
#### Raw Residuals
$y_i-\hat{\mu_i}|blups$
The 'resid(model, type="response")' gives the raw residuals on the ilink scale where the predictions include the blups of the random effects.
#### Pearson
$raw/sqrt(var(\mu|blups))$
I believe the 'resid(model, type="pearson")' gives the correct values, but I cannot get SAS to give me pearson residuals on ilink.
#### Deviance
I believe the 'resid(model, type="deviance")' gives the correct values, but I cannot get SAS to give me pearson residuals on ilink. Using the deviance functions for a generalized linear model gives the same results as using 'resid'
#### Scaled
SAS says to take raw residuals and divide by the estimate of the variance of the residuals. I cannot tell if this is just the sample standard deviation of the residuals.