-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathweek_8_lab.Rmd
185 lines (136 loc) · 5.51 KB
/
week_8_lab.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
title: "Hierarchical GLM"
author: "Monica Alexander"
date: "March 10 2022"
output:
pdf_document:
number_sections: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```
```{r, message=FALSE, warning=FALSE}
library(tidyverse)
library(here)
# for bayes stuff
library(rstan)
library(coda)
library(bayesplot)
library(loo)
library(tidybayes)
library(ggplot2)
library(dplyr)
library(arm)
```
Please hand in Rmd, pdf, and stan files. Due next Wednesday because of delay in lecture.
# Lip cancer
Here is the lip cancer data as seen in the lecture.
- `observe.i` is observed deaths in each region
- `expect.i` is expected deaths, based on region-specific age distribution and national-level age-specific mortality rates.
```{r}
observe.i <- c(
5,13,18,5,10,18,29,10,15,22,4,11,10,22,13,14,17,21,25,6,11,21,13,5,19,18,14,17,3,10,
7,3,12,11,6,16,13,6,9,10,4,9,11,12,23,18,12,7,13,12,12,13,6,14,7,18,13,9,6,8,7,6,16,4,6,12,5,5,
17,5,7,2,9,7,6,12,13,17,5,5,6,12,10,16,10,16,15,18,6,12,6,8,33,15,14,18,25,14,2,73,13,14,6,20,8,
12,10,3,11,3,11,13,11,13,10,5,18,10,23,5,9,2,11,9,11,6,11,5,19,15,4,8,9,6,4,4,2,12,12,11,9,7,7,
8,12,11,23,7,16,46,9,18,12,13,14,14,3,9,15,6,13,13,12,8,11,5,9,8,22,9,2,10,6,10,12,9,11,32,5,11,
9,11,11,0,9,3,11,11,11,5,4,8,9,30,110)
expect.i <- c(
6.17,8.44,7.23,5.62,4.18,29.35,11.79,12.35,7.28,9.40,3.77,3.41,8.70,9.57,8.18,4.35,
4.91,10.66,16.99,2.94,3.07,5.50,6.47,4.85,9.85,6.95,5.74,5.70,2.22,3.46,4.40,4.05,5.74,6.36,5.13,
16.99,6.19,5.56,11.69,4.69,6.25,10.84,8.40,13.19,9.25,16.98,8.39,2.86,9.70,12.12,12.94,9.77,
10.34,5.09,3.29,17.19,5.42,11.39,8.33,4.97,7.14,6.74,17.01,5.80,4.84,12.00,4.50,4.39,16.35,6.02,
6.42,5.26,4.59,11.86,4.05,5.48,13.13,8.72,2.87,2.13,4.48,5.85,6.67,6.11,5.78,12.31,10.56,10.23,
2.52,6.22,14.29,5.71,37.93,7.81,9.86,11.61,18.52,12.28,5.41,61.96,8.55,12.07,4.29,19.42,8.25,
12.90,4.76,5.56,11.11,4.76,10.48,13.13,12.94,14.61,9.26,6.94,16.82,33.49,20.91,5.32,6.77,8.70,
12.94,16.07,8.87,7.79,14.60,5.10,24.42,17.78,4.04,7.84,9.89,8.45,5.06,4.49,6.25,9.16,12.37,8.40,
9.57,5.83,9.21,9.64,9.09,12.94,17.42,10.29,7.14,92.50,14.29,15.61,6.00,8.55,15.22,18.42,5.77,
18.37,13.16,7.69,14.61,15.85,12.77,7.41,14.86,6.94,5.66,9.88,102.16,7.63,5.13,7.58,8.00,12.82,
18.75,12.33,5.88,64.64,8.62,12.09,11.11,14.10,10.48,7.00,10.23,6.82,15.71,9.65,8.59,8.33,6.06,
12.31,8.91,50.10,288.00)
```
## Question 1
The `expect.i` indicates the expected number of lip cancer deaths for a particular area if the relative risk of that area is at average level.
## Question 2
Run three different models in Stan with three different set-up's for estimating $\theta_i$, that is the relative risk of lip cancer in each region:
1. $\theta_i$ is same in each region $= \theta$
2. $\theta_i$ is different in each region and modeled separately
3. $\theta_i$ is different in each region and modeled hierarchically
```{r}
stan_data <- list(N = length(observe.i),
offset = expect.i,
deaths = observe.i)
mod1 <- stan(data = stan_data,
file = "lip1.stan",
iter = 1000,
seed = 23)
```
```{r}
summary(mod1)$summary[c("theta"),]
```
```{r}
stan_data <- list(N = length(observe.i),
offset = expect.i,
deaths = observe.i)
mod2 <- stan(data = stan_data,
file = "lip2.stan",
iter = 1000,
seed = 23)
```
```{r}
stan_data <- list(N = length(observe.i),
offset = expect.i,
deaths = observe.i)
mod3 <- stan(data = stan_data,
file = "lip3.stan",
iter = 1000,
seed = 23)
```
```{r}
summary(mod3)$summary[c("theta[2]","mu","sigma_mu"),]
```
## Question 3
Make three plots (appropriately labeled and described) that illustrate the differences in estimated $\theta_i$'s across regions and the differences in $\theta$s across models.
```{r}
theta_mod1 <- summary(mod1)$summary[c(paste0('theta')),1]
```
```{r}
theta_mod2 <- c()
for (i in 1:length(expect.i)){
theta_mod2[i] <- summary(mod2)$summary[c(paste0("theta[",i,']')),1]
}
```
```{r}
theta_mod3 <- c()
for (i in 1:length(expect.i)){
theta_mod3[i] <- summary(mod3)$summary[c(paste0("theta[",i,']')),1]
}
```
```{r}
plot(density(observe.i/expect.i),main='density of SMRs v.s. estimated theta in model 1',xlab='SMRs')
abline(v=theta_mod1,col='red')
```
```{r}
plot(observe.i/expect.i,theta_mod2,main='SMR v.s. theta in model 2',xlab='SMRs',ylab='theta')
abline(0,1,col='red')
```
```{r}
plot(observe.i/expect.i,theta_mod3,main='SMR v.s. theta in model 3',xlab='SMRs',ylab='theta')
abline(0,1,col='red')
```
I feel model 2 preforms better as its predictions are closer to the actual values. Model3 can potentially do a better job if we add covariates. However, since we don't have the data of covariates in the assignment, model 3 doesn't have any advantages over model2.
## Question 4
Rerun model 3 (the hierarchical model), but also including an overdispersion parameter. Compare the two models and decide which is more appropriate.
```{r}
stan_data <- list(N = length(observe.i),
offset = expect.i,
deaths = observe.i)
mod4 <- stan(data = stan_data,
file = "lip4.stan",
iter = 1000,
seed = 23)
```
```{r}
summary(mod4)$summary[c("theta[2]","mu","sigma_mu"),]
```
I don't think adding a $\epsilon$ term is necessary, since it can be absorbed by alpha. SO I think the previous model in Q1 is better.