Assignment4/Assignment4-TuWenjie.Rmd

---
title: "Assignment 4"
author: "Wenjie Tu"
date: "Spring Semester 2022"
output:
  html_document:
    toc: true
    toc_depth: 5
    toc_float: true
subtitle: Bayesian Data Analysis and Models of Behavior
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
Sys.setenv(lang="us_en")
rm(list=ls())
```

The goal of this assignment is to verify the implicit assumption in Pedersen et al. (2017) that the DDM will yield more precise estimates of the learning rate parameters than the softmax choice rule (i.e., logistic regression) typically used in the studies of reward learning.

$~$

### Part 1

$Q_i(t)$ is calculated by summing the reward expectation from the previous trial and the reward prediction error (PE):

$$
Q_i(t)=Q_i(t-1)+\eta
\underbrace{[\text{Reward}_i(t-1)-Q_i(t-1)]}_\text{prediction error (PE)}
$$

* $Q_i(t)$: the reward value expectation for the chosen option $i$ on trial $t$.
* $\eta$: the learning rate parameter.

<!-- The process of choosing between options can be described by the softmax choice rule: -->
<!-- $$ -->
<!-- p_i(t)=\frac{\exp(\beta(t)\times V_i(t))} -->
<!-- {\sum_{j=1}^{n}\exp(\beta(t)\times V_j(t))} -->
<!-- $$ -->


<!-- * $p_i(t)$: the probability that a decision maker will choose one option $i$ among all options $j$. -->
<!-- * $\beta$: the parameter that governs the sensitivity to rewards and the exploration-exploitation trade-off. -->

<!-- The DDM calculates the likelihood of the reaction time (RT) as a choice $x$ with the Wiener first-passage time (WFPT) distribution: -->
<!-- $$ -->
<!-- RT(x)\sim \text{WFPT}[a, T_\text{er},z,\nu(t)] -->
<!-- $$ -->
<!-- * The WFPT returns the probability that $x$ is chosen with the observed RT. -->
<!-- * $T_\text{er}$: the nondecision time. -->
<!-- * $z$: starting point. -->
<!-- * $a$: boundary separation (trial-independent free parameters). -->
<!-- * $\nu(t)$: drift rate. -->


The drift rate $\nu(t)$ varies from trial to trial as a function of the difference in the expected rewards, multiplied by a scaling parameter $m$, which can capture differences in the ability to use knowledge of the reward probabilities:
$$
\nu(t)=[Q_\text{upper}(t)-Q_\text{lower}(t)]\times m
$$

* $Q_\text{upper}(t)$ and $Q_\text{lower}(t)$ represent the reward expectations for the two response options.

The process of choosing between options can be described by the sigmoid (ilogit) function:
$$
p_i(t)=\frac{1}{1+\exp\{-(\beta(t)+\nu(t))\}}
$$

* $p_i(t)$: the probability that a decision maker will choose one option $i$ among all options $j$.
* $\beta$: the parameter that governs the sensitivity to rewards and the exploration-exploitation trade-off.

#### Summary table

```{r, warning=FALSE, message=FALSE}
library(runjags)
library(coda)
library(rjags)
library(ggplot2)
library(HDInterval)
library(knitr)
```

```{r}
# Read in data
dat <- read.csv("./data/data.csv")

# Look at the structure of the data
str(dat)
```

```{r}
modelString <- "model {
  # Prior for the learning rate parameter eta
  eta_g_mu ~ dunif(0, 1)
  eta_g_sig ~ dunif(0.001, 5)
  
  # Prior for the drift rate saling parameter m
  m_g_mu ~ dunif(1, 19) 
  m_g_sig ~ dunif(0.001, 5)
  
  # Prior for the sensitivity parameter beta
  beta_g_mu ~ dunif(0, 1)
  beta_g_sig ~ dunif(0.001, 5)
  
  # Transform sigma into precision
  eta_g_tau <- 1 / eta_g_sig^2
  m_g_tau <- 1 / m_g_sig^2
  beta_g_tau <- 1 / beta_g_sig^2
  
  # Group parameters
  for (g in 1:G) {
    # Dual learning rates
    for (v in 1:2) {
      eta_g[g, v] ~ dnorm(eta_g_mu, eta_g_tau)T(0, 1) # Truncation
      eta_sig[g, v] ~ dunif(0.001, 5)
      eta_tau[g, v] <- 1 / eta_sig[g, v]^2
    }
    
    m_g[g] ~ dnorm(m_g_mu, m_g_tau)
    m_sig[g] ~ dunif(0.001, 5)
    m_tau[g] <- 1 / m_sig[g]^2
    
    beta_g[g] ~ dnorm(beta_g_mu, beta_g_tau)
    beta_sig[g] ~ dunif(0.001, 5)
    beta_tau[g] <- 1 / beta_sig[g]^2
  }
  
  # Subject parameters
  for (g in 1:G) {
    for (s in 1:S) {
      for (v in 1:2) {
        eta[g, s, v] ~ dnorm(eta_g[g, v], eta_tau[g, v])T(0, 1) # Truncation 
      }
      beta[g, s] ~ dnorm(beta_g[g], beta_tau[g])
      m[g, s] ~ dnorm(m_g[g], m_tau[g])
    }
  }
  
  # Loop over trials for each group and subject
  for (g in 1:G) {
    for (s in 1:S) {
      # Assign starting values to q[group,trial,stimulus pair,option].
      # 'first' is a two-dimensional-array identifying first trial for each subject in each group.
      for (stim_pair in 1:3) {
        q[g, first[s, g], stim_pair, 1] <- 0
        q[g, first[s, g], stim_pair, 2] <- 0
      }
      
      # Run through trials
      # 'last' is a two-dimensional-array identifying last trial for each subject in each group.
      for (trial in (first[s, g]):(last[s, g]-1)) {
        # Calculate drift rate parameter as q-delta multiplied by m
        v[trial] <- ( q[g, trial, pair[trial], 1] - q[g, trial, pair[trial], 2] ) * m[g, s]
        
        pr[trial] <- ilogit(beta[g,s] + v[trial])
        rt_binary[trial] ~ dbern(pr[trial])
	      rt_predict[trial] ~ dbern(pr[trial])
	      
	      # Update q-values for next trial. 
        # 'pair' identifies the stimulus pair in the current trial 'not1' and 'not2' identifies the other stimulus pairs.
        q[g, trial+1, pair[trial], choice[trial]] <- q[g, trial, pair[trial], choice[trial]] + 
          eta[g, s, valence[trial]] * (value[trial] - q[g, trial, pair[trial], choice[trial]])
        
        q[g, trial+1, pair[trial], nonchoice[trial]] <- q[g, trial, pair[trial], nonchoice[trial]]
        q[g, trial+1, not1[trial], 1] <- q[g, trial, not1[trial], 1]
        q[g, trial+1, not1[trial], 2] <- q[g, trial, not1[trial], 2]
        q[g, trial+1, not2[trial], 1] <- q[g, trial, not2[trial], 1]
        q[g, trial+1, not2[trial], 2] <- q[g, trial, not2[trial], 2]
      }
      
      # Q-values are not updated in last trial
      for (trial in last[s, g]) {
        v[trial] <- (q[g, trial, pair[trial], 1] - q[g, trial, pair[trial], 2]) * m[g, s]
        
        # following the paper, beta is modeled using the power function and depends on t
        pr[trial] <- ilogit(beta[g, s] + v[trial])
        
        rt_binary[trial] ~ dbern(pr[trial])
	      rt_predict[trial] ~ dbern(pr[trial])
      }
    }
  }
}"

writeLines(modelString, con="./models/logistic.txt")
```

```{r}
dat$rt_binary <- ifelse(dat$rt >= 0, 1, 0)
niters <- 120 # max "trial" number for each subject (true max = 120)
dat <- dat[dat$iter <= niters, ]
nsubs <- 17   # max number of subjects (true max = 17)
dat <- dat[dat$ord_sbj <= nsubs, ]
dat$rownum <- 1:nrow(dat)

first <- aggregate(rownum ~ ord_sbj + med, data = dat, min)
first <- matrix(first$rownum, ncol = 2, nrow = nsubs)
last <- aggregate(rownum ~ ord_sbj + med, data = dat, max)
last <- matrix(last$rownum, ncol = 2, nrow = nsubs)

dat.jags <- dump.format(list(choice = dat$choice,
                             nonchoice = dat$nonchoice, 
                             value = dat$value, 
                             rt_binary = dat$rt_binary, 
                             pair = dat$pair, 
                             valence = dat$valence, 
                             S = length(unique(dat$ord_sbj)), 
                             first = as.matrix(first), 
                             last = as.matrix(last), 
                             not1 = dat$not_cond1, 
                             not2 = dat$not_cond2, 
                             G = 2))


inits1 <- dump.format(list(.RNG.name="base::Super-Duper", .RNG.seed=99999 ))
inits2 <- dump.format(list(.RNG.name="base::Wichmann-Hill", .RNG.seed=1234 ))
inits3 <- dump.format(list(.RNG.name="base::Mersenne-Twister", .RNG.seed=6666 ))

n.burnin <- 40000
n.sample <- 2000
n.thin <- 5

# Tell JAGS which latent variables to monitor
params <- c("eta_g", "eta", "m_g", "m", "beta_g", "beta") 
```

```{r}
# Run the function that fits the models using JAGS
start.time <- Sys.time()
results.log <- run.jags(model = "./models/logistic.txt", 
                        monitor = params, 
                        data = dat.jags, 
                        inits = c(inits1, inits2, inits3), #, inits4),
                        plots = FALSE,
                        n.chains = 3, 
                        burnin = n.burnin, 
                        sample = n.sample, 
                        thin = n.thin, 
                        method = c("parallel"))
end.time <- Sys.time()
```

```{r}
end.time  - start.time
```

```{r}
kable(summary(results.log), digits=6, caption="Summary results for the logistic regression")
```

#### Density plot for learning rates

```{r}
chains.log <- data.frame(rbind(results.log$mcmc[[1]], results.log$mcmc[[2]], results.log$mcmc[[3]]))

# Select columns for group-level learning rate parameters
chains.log <- chains.log[, 1:4]

column.names <- c("OFF.negative", "ON.negative", "OFF.positive", "ON.positive")
colnames(chains.log) <- column.names
```

```{r}
d.plot <- data.frame(
  eta_g = c(chains.log$OFF.negative, 
            chains.log$ON.negative, 
            chains.log$OFF.positive, 
            chains.log$ON.positive), 
  Group = rep(column.names, each=nrow(chains.log))
)
```

```{r, out.width='100%'}
ggplot(d.plot, aes(x=eta_g, y=..density.., color=Group, fill=Group)) + 
  geom_density(alpha=0.2) + 
  labs(title="Density plot of four group-level learning rate parameters", x=expression(eta[g]), y="Density") + 
  theme_minimal()
```

#### Density plot for differences

```{r, fig.show='hold', out.width='50%'}
# Histogram
ggplot(chains.log, aes(x=ON.positive - OFF.positive, y=..density..)) + 
  geom_histogram(color=3, fill=3, alpha=0.2, bins=30) + 
  geom_vline(xintercept=hdi(chains.log$ON.positive - chains.log$OFF.positive), color=2, lty="longdash") + 
  labs(title="Positive learning rate (ON-OFF)", x=expression(eta^"+" ~ "(ON-OFF)"), y="Density") + 
  theme_minimal()

# Density plot
ggplot(chains.log, aes(x=ON.positive - OFF.positive, y=..density..)) + 
  geom_density(color=3, fill=3, alpha=0.2) + 
  geom_vline(xintercept=hdi(chains.log$ON.positive - chains.log$OFF.positive), color=2, lty="longdash") + 
  labs(title="Positive learning rate (ON-OFF)", x=expression(eta^"+" ~ "(ON-OFF)"), y="Density") + 
  theme_minimal()
```


```{r, fig.show='hold', out.width='50%'}
# Histogram
ggplot(chains.log, aes(x=ON.negative - OFF.negative, y=..density..)) + 
  geom_histogram(color=4, fill=4, alpha=0.2, bins=30) + 
  geom_vline(xintercept=hdi(chains.log$ON.negative - chains.log$OFF.negative), color=2, lty="longdash") +
  labs(title="Negative learning rate (ON-OFF)", x=expression(eta^"-" ~ "(ON-OFF)"), y="Density") + 
  theme_minimal()

# Density plot
ggplot(chains.log, aes(x=ON.negative - OFF.negative, y=..density..)) + 
  geom_density(color=4, fill=4, alpha=0.2) + 
  geom_vline(xintercept=hdi(chains.log$ON.negative - chains.log$OFF.negative), color=2, lty="longdash") +
  labs(title="Negative learning rate (ON-OFF)", x=expression(eta^"-" ~ "(ON-OFF)"), y="Density") + 
  theme_minimal()
```


#### Interpretation

The posterior distribution of differences for attention-deficit-hyper-activity disorder (ADHD) subjects on vs. off stimulant medication for positive learning rate is consistent with what is shown in the paper. Both are left-skewed. However, the posterior distribution of differences for ADHD subjects on vs. off stimulant medication for negative learning rate seems not consistent well with what is presented in the paper in terms of skewness, which is probably due to the randomness in the algorithm. Since initial values are not pre-defined for each chain and JAGS initializes chains automatically, the results are not reproducible even we specify the random number generators and random number seeds. Furthermore, the 95% HDI for the difference between ON and OFF medication for the negative learning rate is wider than what is presented in the paper.


$~$

### Part 2

#### Summary table

```{r}
model.sim <- extend.jags(results.log, 
                         drop.monitor = results.log$monitor, 
                         add.monitor = c("rt_predict"), 
                         sample = 1, 
                         adapt = 0, 
                         burnin = 0, 
                         summarise = FALSE, 
                         method = "parallel")
```

```{r}
newdat.jags <- dump.format(list(choice = dat$choice, 
                                nonchoice = dat$nonchoice, 
                                value = dat$value, 
                                rt_binary = as.integer(model.sim$mcmc[[1]]), 
                                pair = dat$pair, 
                                valence = dat$valence, 
                                S = length(unique(dat$ord_sbj)), 
                                first = as.matrix(first), 
                                last = as.matrix(last), 
                                not1 = dat$not_cond1, 
                                not2 = dat$not_cond2, 
                                G = 2))

n.burnin <- 40000 
n.sample <- 2000 
n.thin <- 5

# Tell JAGS which latent variables to monitor
params <- c("eta_g", "eta", "m_g", "m", "beta_g", "beta") 
```

```{r}
start.time <- Sys.time()
results.rec <- run.jags(model = "./models/logistic.txt", 
                        monitor = params, 
                        data = newdat.jags, 
                        inits = c(inits1, inits2, inits3), 
                        plots = FALSE, 
                        n.chains = 3, 
                        burnin = n.burnin, 
                        sample = n.sample, 
                        thin = n.thin, 
                        method = c("parallel"))
end.time <- Sys.time()
```

```{r}
end.time - start.time
```

```{r}
kable(summary(results.rec), digits=6, caption="Summary results for the logistic regression (simulated data)")
```

#### Boxplots

```{r}
chains.rec <- data.frame(rbind(results.rec$mcmc[[1]], results.rec$mcmc[[2]], results.rec$mcmc[[3]]))

# Select columns for group-level learning rate parameters
chains.rec <- chains.rec[, 1:4]

column.names <- c("OFF.negative", "ON.negative", "OFF.positive", "ON.positive")
colnames(chains.rec) <- column.names
```

```{r}
d.recovery <- data.frame(rbind(chains.log, chains.rec), 
                         Parameters = rep(c("Logistic", "Recovered"), each=nrow(chains.log)))
```

```{r, out.width='100%'}
ggplot(d.recovery, aes(x=Parameters, y=OFF.negative, color=Parameters)) + geom_boxplot() + 
  labs(title="Negative learning rate for the OFF medication group", x=expression(eta^"-"~"(OFF)")) + theme_minimal()
```

```{r, out.width='100%'}
ggplot(d.recovery, aes(x=Parameters, y=ON.negative, color=Parameters)) + geom_boxplot() +
  labs(title="Negative learning rate for the ON medication group", x=expression(eta^"-"~"(ON)")) + theme_minimal()
```

```{r, out.width='100%'}
ggplot(d.recovery, aes(x=Parameters, y=OFF.positive, color=Parameters)) + geom_boxplot() + 
  labs(title="Positive learning rate for the OFF medication group", x=expression(eta^"+"~"(OFF)")) + theme_minimal()
```

```{r, out.width='100%'}
ggplot(d.recovery, aes(x=Parameters, y=ON.positive, color=Parameters)) + geom_boxplot() +
  labs(title="Positive learning rate for the ON medication group", x=expression(eta^"+"~"(ON)")) + theme_minimal()
```

#### Interpretation

We present four boxplots of the posterior chains for four group-level learning rates (ON-positive, ON-negative, OFF-positive, OFF-negative) from both logistic regression results (Part 1) and parameter recovery results. We see that significant differences between the parameter recovery results and the logistic regression results for four group-level learning rates. In particular, such difference is even larger for the On medication group with positive learning rate. We therefore conclude that the decision-making process is better characterized by the DDM. This also confirms the assumption that the DDM outperforms the rewarding model using softmax choice rule in terms of learning parameter estimation.