-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoverview.qmd
613 lines (511 loc) · 24 KB
/
overview.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
---
title: "Overview"
bibliography: references.bib
format:
html:
df-print: kable
code-fold: show
code-summary: "Hide code"
code-overflow: wrap
toc-title: Page Contents
toc: true
toc-depth: 3
toc-location: right
number-sections: false
html-math-method: katex
smooth-scroll: true
editor: source
editor_options:
chunk_output_type: console
---
```{=html}
<style type="text/css">
body, td {
font-size: 13pt;
}
code.r{
font-size: 9pt;
}
pre {
font-size: 11pt
}
</style>
```
::: {.callout-note icon=false}
#### Supporting Information
This site provides a demonstrative workflow and supporting information for the publication:
**Modeling the 2014-2015 Vesicular Stomatitis Outbreak in the United States using an SEIR-SEI Approach**
Authors:
Humphreys, Pelzel-McCluskey, Shults, Velazquez Salinas, Bertram, McGregor, Cohnstaedt, Swanson, Scroggs, Fautt, Mooney, Peters, and Rodriguez
**Viruses** 2024, 16, 1315. [https://doi.org/10.3390/v16081315](https://doi.org/10.3390/v16081315)
:::
::: {.callout-tip icon=false}
#### External Resources
An interactive dashboard to demonstrate the described SEIR-SEI model is available:
[ShinyApps: https://geoepi.shinyapps.io/seir-vector/](https://geoepi.shinyapps.io/seir-vector/)
Code for this site and ShinyApp above are available:
[GitHub: https://github.com/geoepi/seir-vector](https://github.com/geoepi/seir-vector)
Large model objects, incidence data, and other archived information are available on the Open Science Framework: [Project OSF site: https://osf.io/vqgxs/](https://osf.io/vqgxs/). The R-function *get_data_osf()* is provided to download and read these data.
:::
## Workflow Synopsis
This script provides an overview of the methodologies and code presented in the publication above. Note that data, code, and website configuration files, as well as code for the [VSV ShinyApp](https://geoepi.shinyapps.io/seir-vector/), are available on the project's [GitHub site](https://github.com/geoepi/seir-vector) and [OSF Project site](https://osf.io/vqgxs/). Executed Stan model objects are approximately 200mb in size, code to download and read these objects is provided in the script.
For simplicity, this workflow will demonstrate analysis for outbreaks observed in 2014, however the workflow can be applied to 2015 outbreaks with minimal modification. The PAGE CONTENTS menu in the right upper corner can be used to navigate to specific sections.
## Parameters and Formula
#### Model Parameters
Brief descriptions of parameters used in the model. Note that additional information and cited sources for selected parameter values are provided in the associated publication.
\
\begin{array}{|c|l|}
\hline
\text{Parameter} & \text{Description} \\
\hline
N_h & \text{Total number of vertebrate hosts} \\
N_v & \text{Total number of insect vectors} \\
\upsilon_h & \text{Contact rate, bites sustained by host} \\
\upsilon_v & \text{Contact rate, vector biting rate} \\
\beta_h & \text{Vector to host transmission probability} \\
\beta_v & \text{Host to vector transmission probability} \\
\sigma_h & \text{Intrinsic Incubation Period (IIP)} \\
\sigma_v & \text{Extrinsic Incubation Period (EIP)} \\
\gamma_h & \text{Host removal rate (recovery or quarantine)} \\
\rho_h & \text{Host heterogeneous competency} \\
\rho_v & \text{Vector heterogeneous competency} \\
\mu_v & \text{Vector background mortality rate} \\
\kappa & \text{Observation bias (proportion observed)} \\
\hline
\end{array}
\
#### Model Specification
Mechanistic aspects of the model can can be diagrammatically represented as shown below. Note that in case of both the vertebrate hosts ($h$ subscript) and vectors ($v$ subscript), the term $\rho$ influences the number of exposed individuals (**E**) that proceed to the infectiousness compartment (**I**).
![](images/diagram.png){ width=75% height=auto }
**More formally, the model is given as: **
$$\begin{align}
\lambda_h &= \upsilon_h \cdot \beta_h \cdot \frac{I_v}{N_v} \\
\lambda_v &= \upsilon_v \cdot \beta_v \cdot \frac{I_h}{N_h} \\
\frac{dS_h}{dt} &= -\lambda_h \cdot S_h \\
\frac{dE_h}{dt} &= \lambda_h \cdot S_h - \sigma_h \cdot E_h \\
\frac{dI_h}{dt} &= \sigma_h \cdot E_h \cdot \rho_h - \gamma_h \cdot I_h \\
\frac{dR_h}{dt} &= \gamma_h \cdot I_h \\
\frac{dS_v}{dt} &= \mu_v \cdot N_v - \lambda_v \cdot S_v - \mu_v \cdot S_v \\
\frac{dE_v}{dt} &= \lambda_v \cdot S_v - (\sigma_v + \mu_v) \cdot E_v \\
\frac{dI_v}{dt} &= \sigma_v \cdot E_v \cdot \rho_v - \mu_v \cdot I_v \\ \nonumber
\end{align}$$ {#eq-seirsei}
**in which the first two lines give the Force of Infection (FOI, $\lambda$'s) for hosts and vectors respectively** and $\rho$ parameters (range 0.00-1.00) potentially reduce the size of the exposed (**E**) populations used in calculating the number of infectious individuals (**I**).
## Model Preliminaries
Before constructing a Bayesian model in Stan ([Stan Installation](https://mc-stan.org/users/interfaces/)), we first visualize the data and construct an exploratory compartmental model using non-Bayesian Ordinary Differential Equations (ODE) and the **deSolve** Package.
### Needed Libraries
Load needed R-Packages and custom functions.
```{r warning=FALSE, message=FALSE}
library(here) # to manage directories
library(tidyverse) # data wrangling
library(deSolve) # for exploratory models
library(ggdist) # visualizations
library(rstan) # interface with Stan from R
rstan_options(auto_write = TRUE) # options
options(mc.cores = parallel::detectCores())
library(cmdstanr) # additional Stan options in R
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
register_knitr_engine()
# custom functions (mostly plotting)
source(here("./R/utilities.R"))
source_dir("./R")
```
### Observed Incidence
Read observed incidence data as a data frame called *truth_df* and create a simple plot figure. Note that 2014 and 2015 data are available in the *vsv_ind_truth.rds* object.
##### Contents for *truth_df*
- date: Estimated date of disease onset (symptoms) as determined by examination at time of inspection.
- count: Number of individuals animals with Vesicular Stomatitis (VS)
- susc: Number of individual animals located with same herd or on same premises as confirmed animal. These values are assumed to reflect susceptible animals.
- year: Year of confirmed VS onset.
- doy: Integer 1-365 reflecting the day of year for the *date*.
```{r message=FALSE, warning=FALSE}
truth_df <- get_data_osf("vs_inc_truth") |>
filter(year == 2014) # change to 2015 to examine 2015 data
str(truth_df)
```
```{r fig.width=8, fig.height=6}
#| label: fig-incidence
#| fig-cap: "Observed VS incidence 2014."
#|
plot_incidence(truth_df)
```
### Exploratory Model
Before coding a Stan model, model parameters are through construction of a relatively simple Ordinary Differential Equations (ODE) model using the **deSolve** package. This model uses discrete, user specified values and does not incorporate uncertainty.
##### Initial Parameters
Parameters are describe in the Table at top of page. The specific values given in this example were arbitrarily selected from the ranges evaluated in the full model (presented later in script).
```{r}
initial_parameters <- c(
beta_h = 0.95,
gamma_h = 0.10,
sigma_h = 0.25,
sigma_v = 0.4,
rho_h = 0.3,
rho_v = 0.85,
beta_v = 0.85,
mu_v = 0.1,
upsilon_h = 0.6,
upsilon_v = 0.35
)
```
##### Initial Conditions
Initial population estimates and counts of exposed and infectious individuals.
```{r}
N_h <- 10000 # number of vertebrate hosts
Eh <- 0 # initial hosts exposed
Ih <- 1 # initial hosts infectious
N_v <- 300 * N_h # number of vectors per host
Ev <- 0 # initial exposed vectors
Iv <- 1 # initial infectious vectors
initial_conditions <- c(
Sh = N_h - Eh - Ih, # Susceptible hosts
Eh = Eh, # Exposed hosts
Ih = Ih, # Infectious hosts
Rh = 0, # Recovered hosts
Sv = N_v - Iv - Ev, # Susceptible vectors
Ev = Ev, # Exposed vectors
Iv = Iv # Infectious vectors
)
```
##### ODE Function
Results from this exploratory model can be examined more thoroughly on the [SEIR-SEI ShinyApp](https://geoepi.shinyapps.io/seir-vector/), which uses the same underlying function as given here.
```{r}
seirsei_ode <- function(t, state, parameters) {
with(as.list(c(state, parameters)), {
lambda_h = upsilon_h * beta_h * Iv / N_v
lambda_v = upsilon_v * beta_v * Ih / N_h
dSh = -lambda_h * Sh
dEh = lambda_h * Sh - sigma_h * Eh
dIh = sigma_h * Eh*rho_h - gamma_h * Ih
dRh = gamma_h * Ih
dSv = mu_v * N_v - lambda_v * Sv - mu_v * Sv
dEv = lambda_v * Sv - (sigma_v + mu_v) * Ev
dIv = sigma_v * Ev*rho_v - mu_v * Iv
return(list(c(dSh, dEh, dIh, dRh, dSv, dEv, dIv)))
})
}
```
##### Time Steps
```{r}
times <- seq(0, 365, by = 1)
```
##### Run Model
```{r}
out <- as.data.frame(
ode(y = initial_conditions, times = times, func = seirsei_ode, parms = initial_parameters)
)
```
#### Plot Results
```{r fig.width=8, fig.height=6}
#| label: fig-ode_dyn
#| fig-cap: "Host population dynamics from exploratory model."
#|
plot_ode_dynamics(out)
```
```{r fig.width=8, fig.height=6}
#| label: fig-comp_ii
#| fig-cap: "Comparison of incident infections. Line labeled Infected indicates the number of new indiduals added to the Exposed compartment per time step, whereas the line labeled Symptomatic refers to the number of new individuals added to the infectious compartment. The ratio of the sums for the Infected and Symptomatic curves is shown as Estimated rho in the plot. This reverse calculation demonstrates what the rho competency parameter is doing in the model. "
#|
compare_ei_incidence(out, gamma_h=initial_parameters["gamma_h"])
```
## Bayesian Model
Having demonstrated model parameter relationships using the ODE function above, a full Bayesian model is next constructed in Stan. The Bayesian approach allows for parameters to specified based on what is currently understood about VSV hosts, vectors, and epidemiology. This done through prior distribution specifications. The Bayesian approach also provides for better accounting of uncertainty in the model.
In addition to translating the SEIR-SEI equations (@eq-seirsei) to the Stan language, a likelihood function is needed as well as incorporation of an additional *observation error* term to help account for the difference between disease cases as observed and documented and the case number (theoretically) needed to propagate the disease, assuming the model's mechanistic parameters accurately reflect the reality of the VS epidemiology.
### Prior Model
Before incorporating observed data into the model, a prior model is coded and evaluated to ensure that assumptions about prior distributions are reasonable.
#### Prior model code
```stan
functions {
// SEIR-SEI function
real[] seirsei(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {
real N_h = x_i[1]; // parameters are described in table above
real beta_h = theta[1];
real gamma_h = theta[2];
real sigma_h = theta[3];
real sigma_v = theta[4];
real rho_h = theta[5];
real rho_v = theta[6];
real beta_v = theta[7];
real mu_v = theta[8];
real upsilon_h = theta[9];
real upsilon_v = theta[10];
real i_0 = theta[11]; // initial conditions
real e_0 = theta[12];
real ev_0 = theta[13];
real iv_0 = theta[14];
real N_v = theta[15];
// initial compartment sizes
real init[7] = {N_h - i_0 - e_0, e_0, i_0, 0, N_v - iv_0 - ev_0, ev_0, iv_0};
real Sh = y[1] + init[1]; // compartments
real Eh = y[2] + init[2];
real Ih = y[3] + init[3];
real Rh = y[4] + init[4];
real Sv = y[5] + init[5];
real Ev = y[6] + init[6];
real Iv = y[7] + init[7];
// FORCE of Infection
real lambda_h = upsilon_h * beta_h * Iv / N_v;
real lambda_v = upsilon_v * beta_v * Ih / N_h;
// differential equations
real dS_dt = -lambda_h * Sh;
real dE_dt = lambda_h * Sh - sigma_h * Eh;
real dI_dt = sigma_h * Eh*rho_h - gamma_h * Ih;
real dR_dt = gamma_h * Ih;
real dSv_dt = mu_v*N_v - lambda_v*Sv - mu_v*Sv;
real dEv_dt = lambda_v*Sv - (sigma_v + mu_v)*Ev;
real dIv_dt = sigma_v * Ev*rho_v - mu_v*Iv;
return {dS_dt, dE_dt, dI_dt, dR_dt, dSv_dt, dEv_dt, dIv_dt};
}
}
data {
int<lower=1> n_days; // number of days
real t0; // initial step
real ts[n_days]; // time steps
int N_h; // host population size
int N_v; // vector population size
}
transformed data {
real x_r[0]; // data array
int x_i[1] = { N_h }; // array with host population size
}
parameters {
real<lower=0, upper=1> beta_h; // parameters, see table above
real<lower=0> gamma_h;
real<lower=0> sigma_h;
real<lower=0> sigma_v;
real<lower=0, upper=1> rho_h;
real<lower=0, upper=1> rho_v;
real<lower=0, upper=1> beta_v;
real<lower=0> mu_v;
real<lower=0> upsilon_h;
real<lower=0> upsilon_v;
real<lower=0> phi_inv;
real<lower=0> i_0;
real<lower=0> e_0;
real<lower=0> ev_0;
real<lower=0> iv_0;
}
transformed parameters{
real y[n_days, 7]; // outputs, state variables
real incidence[n_days - 1]; // incidence
real phi = 1. / phi_inv; // dispersian parameter for NegBinomial
real theta[15] = {beta_h, gamma_h, sigma_h, sigma_v, rho_h, rho_v, beta_v, mu_v, upsilon_h, upsilon_v, i_0, e_0, ev_0, iv_0, N_v};
// solve ODEs using Runge-Kutta 45 integration
y = integrate_ode_rk45(seirsei, rep_array(0.0, 7), t0, ts, theta, x_r, x_i);
// daily incidence
for (i in 1:n_days-1){
incidence[i] = -(y[i+1, 2] - y[i, 2] + y[i+1, 1] - y[i, 1]) + 1e-8;
}
}
model { // model priors
beta_h ~ beta(10, 2);
gamma_h ~ lognormal(log(0.13), 0.1);
sigma_h ~ lognormal(log(0.12), 0.1);
sigma_v ~ lognormal(log(0.4), 0.1);
rho_h ~ beta(5, 10);
rho_v ~ beta(10, 2);
beta_v ~ beta(10, 2);
mu_v ~ lognormal(log(0.1), 0.1);
upsilon_h ~ lognormal(log(0.6), 0.1);
upsilon_v ~ lognormal(log(0.3), 0.1);
phi_inv ~ exponential(2);
i_0 ~ lognormal(0, 0.1);
e_0 ~ lognormal(0, 0.1);
ev_0 ~ lognormal(0, 0.1);
iv_0 ~ lognormal(0, 0.1);
}
generated quantities {
real duration = 1 / gamma_h; // disease duration
real incub_h = 1 / sigma_h; // intrinsic incubation period
real incub_v = 1/ sigma_v; // extrinsic incubation period
real pred_infected[n_days-1]; // predicted incidence
pred_infected = neg_binomial_2_rng(incidence, phi); // sample from NegBinomial
}
```
#### NegativeBinomial Sampling
Note that the Stan code above includes a negative binomial distribution that samples from estimated incidence. The SEIR-SEI ODE equations estimate prevalence but not incidence. Therefore, the code first estimates incidence (new infections each day) using the susceptible (**S**) and exposed (**E**) compartments. A small value (1e-8) is included to prevent incidence from being negative on the initial iteration. The code as shown in the `transformed parameters` block is,
```stan
for (i in 1:n_days-1){
incidence[i] = -(y[i+1, 2] - y[i, 2] + y[i+1, 1] - y[i, 1]) + 1e-8;
}
```
Later in the `generated quantities` block, incidence is used in sampling from the NgBinomial,
```stan
real pred_infected[n_days-1];
pred_infected = neg_binomial_2_rng(incidence, phi);
```
Consdering that incidence is estimated from the daily change ($\Delta \text{I}_{t}$) in infections (virus exposures), and this quantity is used to sample from the NegBinomial, the notation with prior distributions couldbe shown as,
$$\begin{align} \nonumber
\text{Y}_{t}|\theta &\sim \text{NegBinomial}(\text{Y}|\Delta\text{I}_{t}, \phi) \\ \nonumber
\theta &= \{\upsilon_h,\upsilon_v, \beta_h, \beta_v, \sigma_h, \sigma_v, \gamma_h, \mu, \rho \} \\ \nonumber
\upsilon_h &\sim \text{Lognormal}(log(0.25), 0.1) \\ \nonumber
\upsilon_v &\sim \text{Lognormal}(log(0.15), 0.1) \\ \nonumber
\beta_h &\sim \text{Beta}(10, 1) \\ \nonumber
\beta_v &\sim \text{Beta}(1, 10) \\
\sigma_h &\sim \text{Lognormal}(log(0.15), 0.1) \\ \nonumber
\sigma_v &\sim \text{Lognormal}(log(0.3), 0.1) \\ \nonumber
\rho_h &\sim \text{Beta}(10, 100) \\ \nonumber
\rho_v &\sim \text{Beta}(5, 10) \\ \nonumber
\gamma_h &\sim \text{Lognormal}(log(0.085), 0.1) \\ \nonumber
\mu &\sim \text{Lognormal}(log(0.1), 0.1) \\ \nonumber
{1}/{\phi} &\sim \text{Exponential}(2) \\ \nonumber
\end{align}$${#eq-negbin}
#### Prior model setup
Organize data needed to run the prior model.
```{r}
# total host population
N = 14160
# time steps
n_days = 365
t = seq(0, n_days, by = 1)
t0 = 0
t = t[-1]
# orgainize data for Stan
data_seir = list(n_days = n_days,
t0 = t0,
ts = t,
N_h = N,
N_v = 300*N)
str(data_seir)
# number of MCMC steps
niter = 2000
# number of chains, only a short run for prior model
num_chains <- 2
# unique initial values for each chain
init_list <- vector("list", num_chains)
for (i in 1:num_chains) {
init_list[[i]] <- list(beta_h = rbeta(1, 10, 1),
gamma_h = rlnorm(1, log(0.15), 0.1),
sigma_h = rlnorm(1, log(0.17), 0.1),
sigma_v = rlnorm(1, log(0.3), 0.1),
beta_v = rbeta(1, 15, 100),
mu_v = rlnorm(1, log(0.1), 0.1),
upsilon_h = rlnorm(1, log(0.25), 0.1),
upsilon_v = rlnorm(1, log(0.15), 0.1),
i_0 = runif(1, 1, 2),
e_0 = runif(1, 1, 2),
ev_0 = runif(1, 1, 2),
iv_0 = runif(1, 1, 2))
}
```
#### Read stan model
The Stan code above is saved as a text file. It is read in to memory.
```{r}
seirsei.st = stan_model("inst/stan/vsv_prior.stan")
```
#### Run prior model
Data, model code, initial values are run. This model has been verified to run, however to reduce run time, a saved copy is loaded below.
```{r eval=FALSE}
prior_model <- sampling(seirsei.st,
data = data_seir,
iter = niter,
chains = num_chains,
cores = parallel::detectCores(),
warmup = 500,
save_warmup = FALSE,
init = init_list,
seed = 1976)
```
load saved model
```{r}
prior_model <- get_data_osf("prior_model")
class(prior_model)
```
#### Check results
**Parameter Distributions**
Parameter distributions are plotted with medians as a visual check. In addition to parameters specified in prior distributions, the Extrinsic Incubation Period (*incub_v*), Intrinsic Incubation Period (*incub_h*), and basic reproduction number (R0_h) are shown in @fig-prior_params. The EIP and IIP are respectively calculated as the reciprocal of the estimated $\sigma_v$ and $\sigma_h$ parameters. The reproduction number ($R_0$) is calculated as $\beta_h \cdot \upsilon_h/\gamma_h$.
```{r fig.width=8, fig.height=10}
#| label: fig-prior_params
#| fig-cap: "Distributions from modeled priors. Dashed lines show the median value for each parameter."
#|
plot_params_posteriors(prior_model)
```
**Prior Predictive Distributions**
The prior model does not have information (data) about the number of observed number of VS cases, therefore predictions here (@fig-prior_traj) are based only on incidence as estimated from the SEIR-SEI ODE system and the prior distributions intended to capture mechanistic aspects of VS epidemiology. As a prior predictive check, it's assumed that predictions will have considerable uncertainty, which is the case in @fig-prior_traj, however it is also case that the median counts across the random draws are near the truth.
```{r fig.width=6, fig.height=8}
#| label: fig-prior_traj
#| fig-cap: "Random draws from predictive distribution. Colored lines show sampled trajectories. Height of black points indicate observed daily incidence."
#|
plot_sample_trajectories(prior_model, truth_df=truth_df, n_draws = 100)
```
To visualize the prior predictions differently, credible intervals can plotted along with the estimated median prediction (@fig-prior_ci).
```{r fig.width=8, fig.height=6}
#| label: fig-prior_ci
#| fig-cap: "Figure displays summary credible intervals (blue shading) and median prediction from the prior predictive model. Height of red points indicate observed daily incidence."
#|
plot_pred_incidence(prior_model, my_truth = truth_df)
```
### Full Bayesian Model
Although parameter values from the prior model are reasonable based on current understanding of VS epidmiology, draws from the prior predictions show far too much uncertainty to have confidence in the model.
To improve the model, the Stan code is updated to include the following elements:
**Observed Incidence**
The `data` block shown in the code is modfied to take observed incidence as an input. The observed incidence is referred to as *cases* in the code.
```stan
data {
...
int cases[n_days];
}
```
The `model` block is revised to include the observed cases when sampling from the NegBinomial.
```stan
model {
...
cases[1:(n_days-1)] ~ neg_binomial_2(incidence, phi);
}
```
**Observation Bias**
In addition to incorporting observed incidence, the model a new model parameter is added to account for underreporting and other possible errors in the observations. The bias parameter is designated as $\kappa$ and used to reduce incidence as estimated by the SEIR-SEI system.
The new $\kappa$ parameter in the `transformed parameters` block,
```stan
transformed parameters{
...
for (i in 1:n_days-1){
incidence[i] = -(y[i+1, 2] - y[i, 2] + y[i+1, 1] - y[i, 1])*kappa;
}
}
```
and the $\kappa$ prior specification is added to the `model block`.
```stan
model {
...
kappa ~ beta(15, 100);
}
```
With the addition of $\kappa$, the previously described sampling distribution (@eq-negbin) becomes,
$$\begin{align} \nonumber
\text{Y}_{t}|\theta &\sim \text{NegBinomial}(\text{Y}|\Delta\text{I}_{t}\bold{\kappa}, \phi) \\ \nonumber
\end{align}$$
**Pointwise Log-Likelihood**
Lastly, code is added to the `generated quantities` block to facilitate some checks after the model is run.
```stan
generated quantities{
...
real log_lik[n_days-1];
for (i in 1:(n_days - 1)) {
log_lik[i] = neg_binomial_2_lpmf(cases[i] | incidence[i], phi);
}
}
```
The Stan code for the full Bayesian model, with the revisions decribed above can be viewed here: [Full SEIR-SEI Model](https://github.com/geoepi/seir-vector/blob/main/inst/stan/vsv_full_model.stan)
#### Full Bayesian model results
Download the executed model object for 2014 VS Outbreaks
```{r eval=TRUE, echo=TRUE}
fit_2014 <- get_data_osf("fit_2014")
```
@fig-post_traj provides posterior predictive checks for the full model for comparison to the prior model @fig-prior_traj.
```{r fig.width=6, fig.height=8, eval=TRUE}
#| label: fig-post_traj
#| fig-cap: "Random draws from posterior predictive distribution. Colored lines show sampled trajectories. Height of red circles indicate observed daily incidence."
#|
plot_sample_trajectories(fit_2014, truth_df=truth_df, n_draws = 100, ymax_prob = 1)
```
To visualize the prior predictions differently, (@fig-post_ci) displays credible intervals with the estimated median as shown for the prior model (@fig-prior_ci).
```{r fig.width=8, fig.height=6}
#| label: fig-post_ci
#| fig-cap: "Figure displays summary credible intervals (blue shading) and median prediction from the prior predictive model. Height of red points indicate observed daily incidence."
#|
plot_pred_incidence(fit_2014, my_truth = truth_df)
```
::: {.callout-tip icon=false}
#### Diagnostics
Please see the [Diagnostics tab](https://geoepi.github.io/seir-vector/diagnostics.html) on this website for additional model checks.
:::