In this practical you will practice with fitting multiple (Bayesian) single subject models for intensive longitudinal data. Specifically AR(1) and AR(2) models, an AR(1) model with a covariate, a VAR(1) model and two models that account for measurement error. You will fit models for the data of two participants to get an impression of individual differences in dynamics.

It is assumed that you have some basic experience with R for this, and some experience with Bayesian models is advisable as well. You can copy the lines of codes you find in this file directly to R or an R script (so no need to type everything).

Installing and loading packages and other needed software

The analyses depend on three R-packages: rjags, for fitting the Bayesian models; foreign, for loading an SPSS data file; coda, for running summary stats and plots on the Bayesian analysis results.

In order for these analyses to run on your computer, you will also need to download and install JAGS, because rjags depends on this program.

#install(foreign)  ###run these codes to install these packages
#install(rjags)
#install(coda)
library(foreign)  ##run these codes to load the packages for usage in R
library(rjags)
library(coda)

Loading the data

We will work with the two spss files, one for each of two female participants: Jane and Mary (Names ficional, data courtesy of Schuurman, Houtveen and Hamaker, 2015). The data consists of daily diary measurements, for 91 days for Jane and 105 days for Mary. The data contains various measures on mood, positive and negative affect, and symptoms that are thought to be related to the women’s periods. Load the packages with help from package foreign:

##Run these codes to load the data and store it in objects "mydata_Jane"  and "mydata_Mary""

mydata_Jane= read.spss("dat2.sav", use.value.labels = FALSE, to.data.frame = FALSE, trim.factor.names = FALSE,
                  trim_values = TRUE, reencode = NA, use.missings = TRUE)  # This may return a warning about a "record type 7", but this warning can be ignored.

mydata_Mary= read.spss("dat4.sav", use.value.labels = FALSE, to.data.frame = FALSE, trim.factor.names = FALSE,
                  trim_values = TRUE, reencode = NA, use.missings = TRUE)  # This may return a warning about a "record type 7", but this warning can be ignored.

Note that the variable names in the files are in Dutch, but we will translate the relevant variables on the go when we use them.

We will start by looking at both women’s rating (scale 1-100 low-high) of their daily overall mood. Make a time series plot for the mood of both participants, and inspect the autocorrelations and partial autocorrelations for the data of each women. You can use functions ts.plot(), acf(), and pacf() for this respectively. Throughout this workshop, we have prepared this code for you for Jane, but not for Mary, so try to adapt the code so you can do the same things for Mary. Compare what you see for the two women.

###Extracting mood from Jane's dataset##
dat_Jane=data.frame(mydata_Jane$scale)
dat_Jane=as.matrix(dat_Jane)
colnames(dat_Jane) = c("mood")

###Extracting mood from Mary's dataset##
dat_Mary=data.frame(mydata_Mary$scale)
dat_Mary=as.matrix(dat_Jane)
colnames(dat_Mary) = c("mood")

###Exploring the data for Jane: Time series plot, autocorrelation of the data, and partial autocorrelations of the data.
ts.plot(dat_Jane,col="mediumseagreen",lwd=5, ylab= "mood")
acf(dat_Jane,na.action=na.pass)
pacf(dat_Jane,na.action=na.pass)
mean(dat_Jane, na.rm=TRUE)
var(dat_Jane, na.rm=TRUE)

###Exploring the data for Mary: Time series plot, autocorrelation of the data, and partial autocorrelations of the data.

Basic AR models

Based on the autocorrelations and partial autocorrelations for the different lags of the women’s data, what order of autoregressive model would you fit for each of them (how many lags would you like to include)?

We can easily fit n=1 time series models with package arima which is native in R (you do not need to install a package for this). Fit at least an order 1, and an order 2 autoregressive model for both women using package arima. ARIMA uses a state space modeling framework which combines maximum likelihood estimation and the kalman filter. Use ?arima in R to learn more. Later we will fit the same models with Bayesian estimation techniques. Interpret and compare the results for the two women.

###AR(1) and AR(2) model fitted for Jane in arima##
arima(dat_Jane, c(1,0,0))
arima(dat_Jane, c(2,0,0))

###AR(1) and AR(2) model fitted for Mary in arima##

Bayesian n=1 AR1 Analysis

Now we will fit the same model in a Bayesian way in JAGS. This is the JAGS code for the model (everything in between parentheses), which we will supply to jags for the analysis. Take a look at the model specification, including the priors that were specified. The priors specified are intended to be uninformative.

AR1model <-"
model{
   

#####model######
    for (t in 2:nt) ###model for timepoint 2 onwards
    {
        y[t]~dnorm(muy[t],Ipre)  ###The dependent variable is assumed to be normally distributed. muy[t] is the expected value at each time point, Ipre is innovation precision (the inverse of the innovation variance)
        muy[t]<- mu + phi*z[t-1] ### mu is the mean, phi is the autoregressive coefficient, and z[t-1] is the centered mood score for timepoint t-1
        z[t] <- y[t]-mu      ##we create centered mood score z[t] here, such that mu in the previous equation reflects the mean for the participant, and not just the intercept
        
    }

####timepoint 1#######

    z[1] <- y[1]-mu  ###model for time point one.
    
    
            
####priors###########
       
    Ipre <- inverse(Ivar) ## innovation precision
    Ivar ~dunif(0,300)   ##innovation variance, a uniform prior ranging from 0 to 300

    phi~dunif(-1,1) ##autoregression coefficient, a uniform prior ranging from -1 to 1
    mu~dnorm(50,.001) ##mean, a normal prior with a mean of 50 and avery small precision of .001.
        
              

}

"
AR1model.txt<-textConnection(AR1model)

We will fit the model in JAGS via package rjags with three chains. For each chain, initial values for each parameter can be specified in lists as follows

#specify initial values for each of the three chains
ins = list(list( phi=.6, Ivar = 100,  mu=50),
           list( phi=.3, Ivar = 60,  mu=60),
           list( phi=-.2, Ivar = 80, mu = 70))  ###initial values for three chains

#specify the number of time points in the data (needed as data for the jags model)
nt=length(dat_Jane)

It is also possible to not specify any starting values, in which case the will be generated by jags based on the specified priors.

Next, we specify the model file, data to be used with the model file (for Jane), initial values, and the number of chains, for compiling the model in JAGS and rjags. We use a burnin of 20000 samples per chain. We take 20000 samples on which we will base our results.

###Specify modelfile, data, initial values, and number of chains for compiling the rjags model:
ins = list(list( phi=.6, Ivar = 100,  mu=50),
           list( phi=.3, Ivar = 60,  mu=60),
           list( phi=-.2, Ivar = 80, mu = 70))  ###initial values

jags_AR1 <- jags.model('AR1model.txt',
                   data = list('nt' = nt,
                               'y' = as.numeric(dat_Jane)
                               
                   ),
                   inits=ins,
                   n.chains = 3) ###compile and initialize jags AR model

update(jags_AR1, 2000) ####burn-in samples/iterations, to be discarded

samps_AR1=coda.samples(jags_AR1,
                   c('phi','mu','Ivar'),
                   2000) ### take the samplesiterations we will use for the results

The model has been fitted. Before showing the results, we’ll do some convergence checks, specifically, we check the density plots, trace plots, and the gelman-rubin statistics (plots and summary stats):

plot(samps_AR1,trace=FALSE) ##plot densities, should look smooth, unimodal, usually normal

gelman.diag(samps_AR1) ###plot gelman rubin stats Should be 1 or very very near 1 (smaller than 1.05)

plot(samps_AR1,density=FALSE) ##traceplots should look like a pile of different colored hairy catterpillars

The density plots are nice and smooth, trace plots look like fat catterpillars, gelman-rubin statistics are good. Convergence seems in order. The resulting parameter estimates:

summary(samps_AR1) ##results parameter estimates

Now, interpret the results; in the summary results you find the mean estimate for each parameter and the posterior standard deviation, and the quantiles for the posterior distribution. The first and last quantile in the results can be used to form the lower and upper bound of a 95% credible interval for each parameter.

Rerun the model with a larger number of burnin and iterations. Do the results change (if the model has converged, the changes should be very small)? Are the results similar to those from arima?

Now fit the Bayesian AR(1) model for Mary.

Bayesian n=1 AR2 Analysis

Next, fit the AR2 model for Jane and Mary with the following model code:

AR2model <-"model{
   

#####model######
    for (t in 3:nt) ###model for timepoint 3 onwards
    {
        y[t]~dnorm(muy[t],Ipre)  
        muy[t]<- mu + phi*z[t-1] + phi2*z[t-2] ## compared to the previous AR(1) code we have added the part phi2*z[t-2] where phi2 is the autoregression   coefficient for lag 2 and z[t-2] is the accompanying centered score for lag 2 at time point t-2.
        z[t] <- y[t]-mu     
        
    }

####timepoint 1#######

    z[2] <- y[2]-mu  ###model for time point two.
    z[1] <- y[1]-mu ###model for time point one.
    
            
####priors###########
       
    Ipre <- inverse(Ivar) ## innovation precision
    Ivar ~dunif(0,300)   ##innovation variance

    phi~dunif(-1,1) ##autoregression coefficient lag 1
    phi2~dunif(-1,1) ##autoregression coefficient lag 2, uniform prior ranging from -1 to 1.
    mu~dnorm(50,.001) ##mean
        
              

}

"
AR2model.txt<-textConnection(AR2model)
ins = list(list( phi=.6,phi2=.2, Ivar = 100,  mu=50),
           list( phi=.3,phi2=-.5, Ivar = 60,  mu=60),
           list( phi=-.2,phi2=0, Ivar = 80, mu = 70))  

jags_AR2 <- jags.model('AR2model.txt',
                   data = list('nt' = nt,
                               'y' = as.numeric(dat_Jane)
                               
                   ),
                   inits=ins,
                   n.chains = 3) 

update(jags_AR2, 3000) 

samps_AR2=coda.samples(jags_AR2,
                   c('phi','phi2', 'mu','Ivar'),
                   3000) 

plot(samps_AR2,trace=FALSE) 

gelman.diag(samps_AR2) 

plot(samps_AR2,density=FALSE) 

summary(samps_AR2) 

Has the model converged? If so, interpret the results; in the summary results you find the mean estimate for each parameter and the posterior standard deviation, and the quantiles for the posterior distribution. The first and last quantile in the results can be used to form the lower and upper bound of a 95% credible interval for each parameter.

Does an AR(1) model seem to suit Jane best, or an AR(2) model, based on these results? What about for Mary?

Bayesian AR1 model with a covariate

Next, lets add an exogenous covariate to the AR(1) model. We have information on when each women had her period - we will use this to predict each woman’s mood. For this we will load an external data file with the same mood data, and data on each women’s period (0 of not on period, 1 if on period). For the latter variable, for this example we have filled in scores where there were missing observations (because exogenous variables are not allowed to have missing observations in the analysis). An alternative way to deal with this is to make the covariate an endogenous variable (or impute for these observations in another way), but this is beyond the scope of this workshop.

Load the data for Mary and Jane, and explore their data with (for instance) a time series plot:

dat_predJane = read.table("Ar1pred_Jane.dat", header=TRUE, na.strings="*")
dat_predMary = read.table("Ar1pred_Mary.dat", header=TRUE, na.strings="*")
mood_J = dat_predJane[,1]
period_J =dat_predJane[,2]
mood_M = dat_predMary[,1]
period_M =dat_predMary[,2]

ts.plot(mood_J/10,col="mediumseagreen",lwd=5, ylab= "scores", ylim=c(0,10))
lines(period_J,col="pink",lwd=5)
lines(period_M,col="orange",lwd=5)
lines(mood_M/10,col="blue",lwd=5)

Take a look at the JAGS model code for the AR(1) model with an exogenous predictor. What are coefficients b1 and b2? Further, note that we are estimating an intercept rather than a mean for mood.

AR1predmodel <-"model{
   

#####model######
    for (t in 2:nt) ###model for timepoint 2 onwards
      {
        y[t]~dnorm(muy[t],Ipre)  ###Ipre is innovation precision
        muy[t]<- int + phi*y[t-1] + b1*x[t] + b2*x[t-1] ## mu is mean, phi is autoregressive parameters
        
    }

    #####priors###########
       
    Ipre <- inverse(Ivar) ## innovation precision
    Ivar ~dunif(0,500)   ##innovation variance
    phi~dunif(-1,1) ##autoregression coefficient
      b1~dnorm(0,.001) ##autoregression coefficient
    b2~dnorm(0,.001) ##autoregression coefficient   
    int~dnorm(50,.001) ##mean
        
              

}
"
AR1predmodel.txt<-textConnection(AR1predmodel)

Fit the model to the data for Jane and Mary, check the convergence, and interpret and compare the results.

####Jane#####
jags <- jags.model('AR1predmodel.txt',
                   data = list('nt' = length(mood_J),
                               'y' = as.numeric(mood_J),
                               'x' = as.numeric(period_J)
                               
                   ),
                   n.chains = 3) ###compile and initialize jags AR model

update(jags, 4000) ####burnin

samps=coda.samples(jags,
                   c('phi','b1', 'b2','int', 'Ivar'),
                   10000) ### take samples next to burnin


plot(samps,trace=FALSE) ##plot densities

gelman.diag(samps) ###plot gelman rubin stats

plot(samps,density=FALSE) ##traceplots

summary(samps) ##results parameters estimates 

Basic Bivariate VAR(1) Model

Next we will fit a multivariate, Vector Autoregressive model (of order 1); a VAR(1) model. We will fit the model for two variables, overall mood like before, and nervous tension.

Start by loading the data for both woman, and take a look at their time series data. Do the variables seem to sync up over time? What kind of effects would you expect nervous tension and mood to have on each other over time? What about concurrently?

datvar_Jane=data.frame(mydata_Jane$scale,mydata_Jane$nerveuzegespannenheid)
colnames(datvar_Jane) = c("mood", "nervoustension")

datvar_Mary=data.frame(mydata_Mary$scale,mydata_Mary$nerveuzegespannenheid)
colnames(datvar_Mary) = c("mood", "nervoustension")

mood_J = datvar_Jane[,1]
tense_J = datvar_Jane[,2]
ts.plot(mood_J/10,col="mediumseagreen",lwd=5, ylab= "scores", ylim=c(1,10))
lines(tense_J,col="royalblue3",lwd=5)

Now, take a look at the JAGS model code for the bivariate VAR(1) model.

VAR1model <-"
model{
   

#####model######
    for (t in 2:nt) ###model for timepoint 2 onwards
    {
        y[t,1:2]~dmnorm(muy[t,1:2],Ipre[1:2,1:2])  
        muy[t,1]<- mu1 + phi11*z[t-1,1] + phi12*z[t-1,2] 
        muy[t,2]<- mu2 + phi22*z[t-1,2] + phi21*z[t-1,1] 
        z[t,1] <- y[t,1]-mu1    
        z[t,2] <- y[t,2]-mu2
    }

####timepoint 1#######

    z[1,1] <- y[1,1]-mu1  ###model for time point two.
    z[1,2] <- y[1,2]-mu2 ###model for time point one.
    
            
####priors###########
       
    Ipre <- inverse(Ivar) ## innovation precision
    Ivar[1,1] ~dunif(0,300)   ##innovation variance
    Ivar[2,2] ~dunif(0,300)   ##innovation variance
    Ivar[1,2] <- Icor*sqrt(Ivar[1,1])*sqrt(Ivar[2,2]) ##innovation covariance for y1 and y2
    Ivar[2,1] <- Ivar[1,2]
    Icor~dunif(-1,1)  ###innovation correlation for y1 and y2

    phi11~dnorm(0,.001) 
    phi12~dnorm(0,.001) 
    phi21~dnorm(0,.001) 
    phi22~dnorm(0,.001) 
    
    mu1~dnorm(50,.001) ##mean
    mu2~dnorm(3,.001) ##mean
        
              

}
"
VAR1model.txt<-textConnection(VAR1model)

Fit the models for Mary and Jane.

jags <- jags.model('VAR1model.txt',
                   data = list('nt' = nt,
                               'y' = datvar_Jane
                               
                   ),
                   n.chains = 3) ###compile and initialize jags AR model

update(jags, 2000) ####burnin

samps=coda.samples(jags,
                   c('phi11','phi22', 'mu1','phi12','phi21', 'mu2', 'Ivar', 'Icor'),
                   3000) ### take samples next to burnin

plot(samps,trace=FALSE) ##plot densities

#gelman.diag(samps) ###plot gelman rubin stats

plot(samps,density=FALSE) ##traceplots

summary(samps) ##results parameters estimates 

Evaluate the convergence of the model. Draw a path model based on the results for Jane and for Mary, leaving out paths for which the sign of the effect is unclear based on the 95% credible intervals (see quantiles).

AR(1) Model that accounts for measurement error

The previous models did not account for measurement error, but it is possible to account for measurement error in dynamic models, even with only one indicator variable (see e.g., Schuurman, Houtveen & Hamaker, 2015; Schuurman & Hamaker, in press). The dynamic errors (innovations) and measurement errors can be distinguished because the dynamic errors are carried over through the dynamic process, while the measurement errors are unique to a single time point. The model AR(1) model with measurement error can be specified in JAGS as follows:

MEAR1model <-"
model{
   

#####model######
    for (t in 2:nt)  ###model for timepoint 2 onwards
    {
        y[t]~dnorm(muy[t],Epre)  ###Epre is measurement error precision (the inverse of the measurement error variance)
        muy[t]<- mu + ytilde[t]  ##mu is mean, ytilde is the deviation from the mean (the centered score)
        ytilde[t] ~ dnorm(muytilde[t], Ipre) ### Ipre is innovation precision
        muytilde[t] <- phi*ytilde[t-1]       ##phi is autoregression coefficient
        
    }

####timepoint 1#######

    ytilde[1] <- y[1]-mu
            
####priors###########
       
    Epre <- inverse(Evar) ##Measurement error precision
    Evar ~dunif(0,300)   ## measurement error variance
    Ipre <- inverse(Ivar) ## innovation precision
    Ivar ~dunif(0,300) ##innovation variance
    
    phi~dunif(-1,1) ##autoregression coefficient
    mu~dnorm(50,.001) ##mean
        
              

}
"
MEAR1model.txt<-textConnection(MEAR1model)

Fit the model for Jane and Mary.

ins = list(list( phi=.6, Ivar = 30, Evar = 20, mu=50),
           list( phi=.3, Ivar = 60, Evar = 40, mu=70),
           list( phi=-.2, Ivar = 20, Evar = 30, mu = 80))  ###initial values ARWN Jags

nt=length(scale)
jags <- jags.model('MEAR1model.txt',
                   data = list('nt' = length(as.numeric(mydata_Jane$scale)),
                               'y' = as.numeric(mydata_Jane$scale)
                   ),
                   inits=ins,
                   n.chains = 3) ###compile and initialize jags ARWN model

update(jags, 4000) ####burnin

samps=coda.samples(jags,
                   c('phi','mu','Ivar', 'Evar'),
                   4000) ###samples taken next to burnin

##dic.samps=dic.samples(jags,
##                  20000) take samples fort the DIC
plot(samps,trace=FALSE) ###plot densities
#gelman.plot(samps) ##plot gelman rubin statistics
#gelman.diag(samps) ## table gelman rubin statistics
plot(samps,density=FALSE) ###traceplots

summary(samps) ##results parameter estimates

Be sure to check the convergence of the model; the MEAR model can be tricky to estimate, and needs plenty of iterations. If the model has not converged yet, increase the number of burning and samples to keep.

Compare the results for Jane and Mary’s MEAR(1) model to the results for their AR(1) model. What are the main differences?

Latent AR(1) model or Dynamic Factor Model

Finally, we will model a latent variable, Postive Affect, with an AR(1) process. Both Jane and Mary 8 provided daily evaluations of 8 positive affect items, namely “attentive”, “strong”, “interested”, “enthausiastic”, “proud”, “determined”, “i