How to Posterior
So you’ve made the PMwG sampler work and now you have a sampled
object, what’s next? Well now is where the fun stuff happens - checking, inference, plots and posteriors. In this short blog post, I’ll show several user contributed functions to make plots and output that is a great starting step for analysing your posterior model estimates.
To start with, we’ll go through some checks to make sure the sampler has run as we expected. Next, we’ll go through analysing posterior parameter and random effects (and convariance) estimates. Finally, I’ll provide code to create posterior predictive data so that you can compare the model estimates to the real data.
Checking
The first thing we should do with a PMwG object is check that it sampled correctly. For this, there are two main plots I look at - the parameter ‘chains’ and the subject likelihood ‘chains’. These are shown in the sampler documentation and are great ways to quickly visualise the sampling process.
Chain plots
From the parameter chains, we would expect to see stationarity of parameters. By this I mean that the parameter chains are flat and not trending up or down. If they are moving up or down, we may not have yet reached the posterior. Further, the chains should be thin and not filling all of the prior space. If they are quite wide, this could indicate that the parameter is not informed. In the acceptance rates, if this is happening, it is likely that you would see very high acceptance as well. Often this happens when the likelihood isn’t specified correctly (for example, those pesky factors being read in the wrong way) and as a consequence, the likelihood returned on each step doesn’t change. This will be evident in the chain plot.
Secondly, another good plot to look at is the subject likelihood plot. In this plot, similar to the parameter chains, we are looking for stationarity in the likelihood estimates for each subject (each line). It is likely that the subject likelihood plot will rapidly jump from a low value to a high one and remain in this region. If all estimates are the same for all subjects, this can also indicate a problem in the likelihood function, similar to the one above.
Shown below is code and example outputs for the chain plots.
pmwg_chainPlots <- function(samples, subjectParPlot = F, parameterPlot = T, subjectLLPlot = T){
if (subjectParPlot){
par(mfrow = c(2, ceiling(samples$n_pars/2)))
for (par in samples$par_names){
matplot(t(samples$samples$alpha[par,,]),type="l", main = par, xlab = "samples", ylab = "ParameterValue")
}
}
par(mfrow=c(1,1))
if(parameterPlot) matplot(t(samples$samples$theta_mu), type="l", main = "Paramater chains", ylab = "Parameter Value", xlab = "samples")
if(subjectLLPlot) matplot(t(samples$samples$subj_ll), type="l", main = "LogLikelihood chains per subject", ylab = "LogLikelihood", xlab = "samples")
if(sum(subjectParPlot, parameterPlot, subjectLLPlot) > 1) print('plots presented behind eachother')
}
pmwg_chainPlots(sampled)
## [1] "plots presented behind eachother"
From these outputs we can see that the parameter plots show thin stationary lines. Secondly, we can see the likelihood plots show a rapid rise and then stationary, separated lines. These are indicative that sampling worked as intended.
Parameter histograms
Secondly, it’s important to look at the posterior density of parameter estimates. To do this, I use a function that plots a historgram of each parameter. Additionally, I overlay the prior density to see if our priors were too restrictive or if we haven’t learnt anything new (i.e. prior = posterior).
pmwg_parHist <- function(samples, bins =30, prior = FALSE ){
if (!prior){
chains <- as.array(as_mcmc(samples))
mcmc_hist(chains)
} else{
theta <- t(sampled$samples$theta_mu)
theta<-as.data.frame(theta)
long <- sum(sampled$samples$stage=="sample")
theta <- theta[c((length(theta[,1])-long+1):length(theta[,1])),]
theta <- pivot_longer(theta, cols = everything(), names_to = "pars", values_to = "estimate" )
prior_mean <- sampled$prior$theta_mu_mean
prior_var <- diag(sampled$prior$theta_mu_var)
priors = NULL
for (i in 1:sampled$n_pars){
tmp <- rnorm(n=long, mean=prior_mean[i], sd=prior_var[i])
tmp <- as.data.frame(tmp)
priors<- c(priors, tmp[1:long,])
}
priors<-as.data.frame(priors)
y <- as.factor(sampled$par_names)
theta<-theta[order(factor(theta$pars, levels = y)),]
theta$prior <- priors$priors
theta$pars<- as.factor(theta$pars)
ggplot(theta, aes(estimate))+
geom_histogram(aes(y =..density..), bins = bins)+
geom_density(aes(prior))+
facet_wrap(~pars, scales = "free_y")+
theme_bw()
}
}
pmwg_parHist(sampled, bins=50, prior =T)
In this example, we can see that for t0, the prior may be too restrictive, however, for other estimates, all seems as expected, with posterior samples falling in a thin range.
Posterior Estimates
Now we know the sampler worked as intended, we can look at the posterior parameter estimates. For this part, it is crucial to remember the parameter transformations. For example, in the LBA, all input parameters must be positive, so we take the exponent of the proposed values on each iteration in the likelihood function. This means that when we interpret posterior parameters, we should also take the exponent of the value. For example, t0 in the example above is centered around -2 – an impossible value– and so we should take the exponent of it (0.135) as this is what the model is actually using.
Parameter histograms
From the previous section, we can also use this plot to see where the posterior parameters lie. As mentioned above, we need to take the exponent of these LBA values, so this time I’ll include the transformations in the function.
pmwg_parHist <- function(samples, bins =30, prior = FALSE ){
if (!prior){
chains <- as.array(as_mcmc(samples))
mcmc_hist(chains)
} else{
theta <- exp(t(sampled$samples$theta_mu)) ### exp here
theta<-as.data.frame(theta)
long <- sum(sampled$samples$stage=="sample")
theta <- theta[c((length(theta[,1])-long+1):length(theta[,1])),]
theta <- pivot_longer(theta, cols = everything(), names_to = "pars", values_to = "estimate" )
prior_mean <- exp(sampled$prior$theta_mu_mean) ## exp here
prior_var <- exp(diag(sampled$prior$theta_mu_var)) ## exp here
priors = NULL
for (i in 1:sampled$n_pars){
tmp <- rnorm(n=long, mean=prior_mean[i], sd=prior_var[i])
tmp <- as.data.frame(tmp)
priors<- c(priors, tmp[1:long,])
}
priors<-as.data.frame(priors)
y <- as.factor(sampled$par_names)
theta<-theta[order(factor(theta$pars, levels = y)),]
theta$prior <- priors$priors
theta$pars<- as.factor(theta$pars)
ggplot(theta, aes(estimate))+
geom_histogram(aes(y =..density..), bins = bins)+
geom_density(aes(prior))+
facet_wrap(~pars, scales = "free_y")+
theme_bw()
}
}
pmwg_parHist(sampled, bins=50, prior =T)
Here, we can see that t0 is just above 0, v1 is greater than v2 (correct > error drift rates) and there are three separable threshold values.
Output tables
Whilst the above plot is useful, we may wish to get some more fine grained analysis. For this, again looking at the group level (theta) values, we can create some output tables.
First we look at mean parameter estimates (and variance using 95% credible intervals)
qL=function(x) quantile(x,prob=.05,na.rm = TRUE) #for high and low quantiles
qH=function(x) quantile(x,prob=.95,na.rm = TRUE)
tmp <- exp(apply(sampled$samples$theta_mu[,sampled$samples$stage=="sample"],1,mean))
lower <- exp(apply(sampled$samples$theta_mu[,sampled$samples$stage=="sample"],1,qL))
upper <- exp(apply(sampled$samples$theta_mu[,sampled$samples$stage=="sample"],1,qH))
tmp <- t(rbind(tmp,lower,upper))
kable(tmp)
tmp | lower | upper | |
---|---|---|---|
b1 | 0.8832800 | 0.7672251 | 1.0082430 |
b2 | 1.0173988 | 0.8724713 | 1.1971625 |
b3 | 1.4128471 | 1.2075098 | 1.6497759 |
A | 1.5383791 | 1.3499831 | 1.7429653 |
v1 | 3.3117580 | 2.7409280 | 3.9455836 |
v2 | 1.1322673 | 0.9322809 | 1.3405226 |
t0 | 0.1540344 | 0.1243581 | 0.1915416 |
This is a nice way to view the posterior parameter means and ranges. Next, we can look at the mean covariance structure, which we will transform into a correlation matrix.
cov<-apply(sampled$samples$theta_sig[,,sampled$samples$idx-1000:sampled$samples$idx] ,1:2, mean)
colnames(cov)<-pars
rownames(cov)<-pars
cor<-cov2cor(cov) #transforms covariance to correlation matrix
kable(cor)
b1 | b2 | b3 | A | v1 | v2 | t0 | |
---|---|---|---|---|---|---|---|
b1 | 1.0000000 | 0.2261599 | 0.0722276 | 0.0124526 | 0.2234376 | 0.1664706 | -0.2142574 |
b2 | 0.2261599 | 1.0000000 | 0.1250495 | -0.0768053 | 0.1306218 | -0.0276819 | -0.0462468 |
b3 | 0.0722276 | 0.1250495 | 1.0000000 | 0.0710326 | 0.1006157 | 0.0548435 | 0.1736373 |
A | 0.0124526 | -0.0768053 | 0.0710326 | 1.0000000 | 0.1164379 | 0.1028593 | 0.0674493 |
v1 | 0.2234376 | 0.1306218 | 0.1006157 | 0.1164379 | 1.0000000 | 0.0823956 | 0.0483617 |
v2 | 0.1664706 | -0.0276819 | 0.0548435 | 0.1028593 | 0.0823956 | 1.0000000 | -0.1596716 |
t0 | -0.2142574 | -0.0462468 | 0.1736373 | 0.0674493 | 0.0483617 | -0.1596716 | 1.0000000 |
This is a good first summary, but now lets plot this with corrplot.
corrplot(cor, method="color", type = "lower")
This covariance analysis looks at the correlations between parameter values. Here we can see that b1 and v1 are positively correlated, whereas t0 and b1 are negatively correlated. PMwG is great at dealing with models with high autocorrelation of parameters, and so it is important to check these kinds of outputs to see where correlations exist in the model and what could underpin these (or whether these should NOT be correlated).
Inference
Importantly for psych research, we might wish to do inference on the parameter estimates. Here, we can use any kind of classic inference test to look for differences between parameter values. We can also test this by looking at whether the difference between posterior parameter estimates crosses zero.
First though, lets look to see if there’s a difference between the b parameters using a bayesian anova.
library(BayesFactor)
## Loading required package: coda
## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
##
## Type BFManual() to open the manual.
## ************
tmp <- as.data.frame(apply(sampled$samples$theta_mu[,sampled$samples$stage=="sample"],1,exp))
tmp <- tmp[,c(1:3)]
tmp <- tmp %>% pivot_longer(everything() , names_to = "pars", values_to = "estimate")
tmp <- as.data.frame(tmp)
tmp$pars<-as.factor(tmp$pars)
anovaBF(estimate ~ pars, data=tmp)
## Bayes factor analysis
## --------------
## [1] pars : 7.809663e+1090 ±0%
##
## Against denominator:
## Intercept only
## ---
## Bayes factor type: BFlinearModel, JZS
From this output, there is strong evidence for a difference between these conditions. Next we can plot these estimates to see the difference between b values visually.
plot.data <- tmp %>% group_by(pars) %>% summarise(avg = mean(estimate),
lower = qL(estimate),
upper = qH(estimate)) %>% ungroup()
ggplot(plot.data, aes(x=pars, y=avg)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=lower,ymax=upper),width=0.2) +
theme_bw()
Another method we can use to look for a difference between parameters is to check if the difference between posterior parameters crosses 0. If it does not, then one posterior estimate is reliably greater than the other. Lets check this for b1 and b2. Here, we can set a frequentist style criterion (lets say 95%), and if 95% of the difference between samples are on one side of zero, we can conclude that there is a difference between these parameters. Alternatively, we could run a separate model where these parameters were not separated and then do model comparison methods for these models.
tmp <- as.data.frame(apply(sampled$samples$theta_mu[,sampled$samples$stage=="sample"],1,exp))
tmp <- tmp[,c(1:2)]
tmp$diff <- tmp$b1-tmp$b2
mean(tmp$diff>0)
## [1] 0.062
Here we can see that 93.8% of the samples are greater than zero, so there is not conclusive evidence that these parameters vary at the group level.
Posterior Predictive Data
Finally, an important part of post-model processing is generating posterior predictive data. Using this generated data, we can check to see if the posterior data ‘fits’ the data. To generate posterior predictive data, we first need to randomly sample some individual random effects. We then use these in the log-likelihood function (using sample=TRUE
) to generate data. For this part, it is important that your likelihood function is set up correctly to generate data. However, if this is not the case, you can restructure your likelihood function and change the function below so that it still operates correctly.
pmwg_generatePosterior <- function(sampled, n){
n.posterior=n # Number of parameter samples from posterior distribution.
pp.data=list()
S = sampled$n_subjects
data=sampled$data
sampled_stage = length(sampled$samples$stage[sampled$samples$stage=="sample"])
for (s in 1:S) {
cat(s," ")
iterations=round(seq(from=(sampled$samples$idx-sampled_stage) , to=sampled$samples$idx, length.out=n.posterior))
for (i in 1:length(iterations)) {
x <- sampled$samples$alpha[,s,iterations[i]]
names(x) <- sampled$par_names
tmp=sampled$ll_func(x=x,data=data[as.integer(as.numeric(data$subject))==s,],sample=TRUE) ##change here to your own sampled function if your likelihood was not set up correctly
if (i==1) {
pp.data[[s]]=cbind(i,tmp)
} else {
pp.data[[s]]=rbind(pp.data[[s]],cbind(i,tmp))
}
}
}
return(pp.data)
}
pp.data<-generate.posterior(sampled,20) #i do 20 because that seems fine
tmp=do.call(rbind,pp.data) #binds together
## i subject rt resp condition
## 1 1 1 0.8109699 1 1
## 2 1 1 1.3727252 1 2
## 3 1 1 1.0959211 2 3
## 4 1 1 0.4747878 1 1
## 5 1 1 0.9924096 1 2
## 6 1 1 2.8030712 1 3
Here you can see that I’ve generated 20 samples for each person. The column i
shows the iteration. This does take a little bit of time, but not too long (it depends on your likelihood). When writing your likelihood, we often try and make this step as safe as possible (rather than fast) as it is only called several times.
Now we have posterior predictive data, we can plot this against the real data.
summ <- tmp %>% group_by(i,condition) %>%
summarise(MeanRT = mean(rt,na.rm=TRUE),
ACC = mean(resp),
qL = qL(rt),
qH = qH(rt),
qLA = qL(resp),
qHA = qH(resp))%>%
ungroup()
## `summarise()` has grouped output by 'i'. You can override using the `.groups` argument.
summ$source <- "model"
summD <- data %>% group_by(condition) %>%
summarise(MeanRT = mean(rt,na.rm=TRUE),
ACC = mean(resp),
qL = qL(rt),
qH = qH(rt),
qLA = qL(resp),
qHA = qH(resp))%>%
ungroup()
summD$source <- "data"
ggplot(summ, aes(x=condition, y=ACC, color=source, group=source))+geom_point()+geom_col(data=summD, aes(x=condition, y=ACC), alpha=0.2)+theme_bw()
ggplot(summ, aes(x=condition, y=MeanRT, color=source, group=source))+geom_point()+geom_col(data=summD, aes(x=condition, y=MeanRT), alpha=0.2)+theme_bw()
In these plots, I simply show the mean rt and accuracy for both the data (bars) and each posterior predictive iteration (20 dots). There are a variety of alternative methods to showing the descriptive adequacy of the model, such as Q-Q plots and plots across subjects, however, this is a quick way to check the fit of the model and a stepping stone to better analysis and graphs.
Model Comparison
Finally, an important part of post modelling is often model comparison. There are many methods of model comparison, and they all have advantages and disadvantages - which I’m not going to go into. Here, I show two methods of model comparison - DIC and IS2. If you have the time and computer resources, we recommend using IS2 with PMwG objects. This comes from the paper by Minh-Ngoc Tran and colleagues for robustly estimating the marginal likelihood by importance sampling. An important caveat of this method however, is that it relies heavily on the settings of the prior (which are input before sampling). If the prior is too restrictive, too broad or just wrong, then estimates will be biased. This means that more complex models will be more penalized when the prior is set incorrectly (or uninformed), and why it is crucial to check the posterior parameter estimates (with functions like that above - e.g. pmwg_ParHist) before doing this kind of model comparison.
DIC
DIC is a commonly used information criteria for comparing models. The function below takes in a sampled pmwg object and returns the DIC value.
pmwg_DIC=function(sampled,pD=FALSE){
nsubj=length(unique(sampled$data$subject))
# the mean likelihood of the overall (sampled-stage) model, separately for each subject
mean.like <- apply(sampled$samples$subj_ll[,sampled$samples$stage=="sample"],1,mean)
# the mean of each parameter across iterations. Keep dimensions for parameters and subjects
mean.params <- t(apply(sampled$samples$alpha[,,sampled$samples$stage=="sample"],1:2,mean))
# i name mean.params here so it can be used by the log_like function
colnames(mean.params)<-sampled$par_names
# log-likelihood for each subject using their mean parameter vector
mean.params.like <- numeric(ncol(mean.params))
data <- transform(sampled$data, subject=match(subject, unique(subject)))
for (j in 1:nsubj) {
mean.params.like[j] <- sampled$ll_func(mean.params[j,], data=data[data$subject==j,], sample=FALSE)
}
# Effective number of parameters
pD <- sum(-2*mean.like + 2*mean.params.like)
# Deviance Information Criterion
DIC <- sum(-4*mean.like + 2*mean.params.like)
if (pD){
return(c("DIC"=DIC,"effective parameters"=pD))
}else{
return(DIC)
}
}
pmwg_DIC(sampled)
## DIC effective parameters
## 2135.809 124.878
A DIC value on it’s own is relatively meaningless unless compared against competing models, so we could fit another model where threshold did not vary between condition 1 and condition 2 and then check which of the models had the lower DIC.
IS2
Importance sampling squared is another method of model comparison. IS2 allows for robust and unbiased estimation of the marginal likelihood. The script below will run IS2 on a sampled object once loaded in, however, it can take some time. Typically, we generate around 10,000 IS samples, with 250 particles. The IS2 method works by first generating an importance distribution for the fixed parameters. This importance distribution is constructed by fitting a mixture of normal or Student t distributions to these MCMC samples. We then construct conditional proposal parameters - called particles - for each subject. The marginal likelihood is then estimated unbiasedly which is combined with the importance distribution. From this method, the importance sampling procedure is in itself an importance sampling procedure which can be used to estimate the likelihood.
IS2 Script;
####### IS2 t-distribution Code #########
## set up environment and packages
rm(list=ls())
library(mvtnorm)
library(MCMCpack)
library(rtdists)
library(invgamma)
library(mixtools)
library(condMVNorm)
library(parallel)
load("sampled.Rdata")
cpus = 20
###### set up variables #####
# number of particles, samples, subjects, random effects etc
n_randeffect=sampled$n_pars
n_subjects = sampled$n_subjects
n_iter = length(sampled$samples$stage[sampled$samples$stage=="sample"])
length_draws = sampled$samples$idx #length of the full transformed random effect vector and/or parameter vector
IS_samples = 10000 #number of importance samples
n_particles = 250 #number of particles
v_alpha = 2 #?
pars = sampled$pars
# grab the sampled stage of PMwG
# store the random effects
alpha <- sampled$samples$alpha[,,sampled$samples$stage=="sample"]
# store the mu
theta <- sampled$samples$theta_mu[,sampled$samples$stage=="sample"]
# store the cholesky transformed sigma
sig <- sampled$samples$theta_sig[,,sampled$samples$stage=="sample"]
# the a-hlaf is used in calculating the Huang and Wand (2013) prior.
# The a is a random sample from inv gamma which weights the inv wishart. The mix of inverse wisharts is the prior on the correlation matrix
a_half <- log(sampled$samples$a_half[,sampled$samples$stage=="sample"])
unwind=function(x,reverse=FALSE) {
if (reverse) {
## if ((n*n+n)!=2*length(x)) stop("Wrong sizes in unwind.")
n=sqrt(2*length(x)+0.25)-0.5 ## Dim of matrix.
out=array(0,dim=c(n,n))
out[lower.tri(out,diag=TRUE)]=x
diag(out)=exp(diag(out))
out=out%*%t(out)
} else {
y=t(chol(x))
diag(y)=log(diag(y))
out=y[lower.tri(y,diag=TRUE)]
}
return(out)
}
robust_diwish = function (W, v, S)
{
if (!is.matrix(S))
S <- matrix(S)
if (nrow(S) != ncol(S)) {
stop("S not square in diwish().\n")
}
if (!is.matrix(W))
W <- matrix(W)
if (nrow(W) != ncol(W)) {
stop("W not square in diwish().\n")
}
if (nrow(S) != ncol(W)) {
stop("W and X of different dimensionality in diwish().\n")
}
if (v < nrow(S)) {
stop("v is less than the dimension of S in diwish().\n")
}
p <- nrow(S)
gammapart <- sum(lgamma((v + 1 - 1:p)/2))
ldenom <- gammapart + 0.5 * v * p * log(2) + 0.25 * p * (p -
1) * log(pi)
cholS <- chol(S)
#cholW <- chol(W)
cholW <- tryCatch(chol(W),error= return(1e-10))
halflogdetS <- sum(log(diag(cholS)))
halflogdetW <- sum(log(diag(cholW)))
invW <- chol2inv(cholW)
exptrace <- sum(S * invW)
lnum <- v * halflogdetS - (v + p + 1) * halflogdetW - 0.5 *
exptrace
lpdf <- lnum - ldenom
return(exp(lpdf))
}
#unwound sigma
pts2.unwound = apply(sig,3,unwind)
n.params<- nrow(pts2.unwound)+n_randeffect+n_randeffect
all_samples=array(dim=c(n_subjects,n.params,n_iter))
mu_tilde=array(dim = c(n_subjects,n.params))
sigma_tilde=array(dim = c(n_subjects,n.params,n.params))
for (j in 1:n_subjects){
all_samples[j,,] = rbind(alpha[,j,],theta[,],pts2.unwound[,])
# calculate the mean for re, mu and sigma
mu_tilde[j,] =apply(all_samples[j,,],1,mean)
# calculate the covariance matrix for random effects, mu and sigma
sigma_tilde[j,,] = cov(t(all_samples[j,,]))
}
X <- cbind(t(theta),t(pts2.unwound),t(a_half))
muX<-apply(X,2,mean)
sigmaX<-var(X)
# generates the IS proposals
prop_theta=mvtnorm::rmvt(IS_samples,sigma = sigmaX, df=1, delta=muX)
#prop_theta_compare=rmvnorm(IS_samples,muX,sigmaX)
### main functions
group_dist = function(random_effect = NULL, parameters, sample = FALSE, n_samples = NULL, n_randeffect){
param.theta.mu <- parameters[1:n_randeffect]
param.theta.sig.unwound <- parameters[(n_randeffect+1):(length(parameters)-n_randeffect)]
param.theta.sig2 <- unwind(param.theta.sig.unwound, reverse = TRUE)
if (sample){
return(mvtnorm::rmvnorm(n=n_samples, mean=param.theta.mu,sigma=param.theta.sig2))
}else{
logw_second<-mvtnorm::dmvnorm(random_effect, param.theta.mu,param.theta.sig2,log=TRUE)
return(logw_second)
}
}
prior_dist = function(parameters, prior_parameters = sampled$prior, n_randeffect){ ###mod notes: the sampled$prior needs to be fixed/passed in some other time
param.theta.mu <- parameters[1:n_randeffect]
param.theta.sig.unwound <- parameters[(n_randeffect+1):(length(parameters)-n_randeffect)] ##scott would like it to ask for n(unwind)
param.theta.sig2 <- unwind(param.theta.sig.unwound, reverse = TRUE)
param.a <- exp(parameters[((length(parameters)-n_randeffect)+1):(length(parameters))])
v_alpha=2
log_prior_mu=mvtnorm::dmvnorm(param.theta.mu, mean = prior_parameters$theta_mu_mean, sigma = prior_parameters$theta_mu_var, log =TRUE)
log_prior_sigma = log(robust_diwish(param.theta.sig2, v=v_alpha+ n_randeffect-1, S = 2*v_alpha*diag(1/param.a))) #exp of a-half -> positive only
log_prior_a = sum(invgamma::dinvgamma(param.a,scale = 0.5,shape=1,log=TRUE))
logw_den2 <- sum(log(1/param.a)) # jacobian determinant of transformation of log of the a-half
logw_den3 <- log(2^n_randeffect)+sum((n_randeffect:1+1)*log(diag(param.theta.sig2))) #jacobian determinant of cholesky factors of cov matrix
return(log_prior_mu + log_prior_sigma + log_prior_a + logw_den3 - logw_den2)
}
get_logp=function(prop_theta,data,n_subjects,n_particles,n_randeffect,mu_tilde,sigma_tilde,i, group_dist=group_dist){
#make an array for the density
logp=array(dim=c(n_particles,n_subjects))
# for each subject, get 1000 IS samples (particles) and find log weight of each
for (j in 1:n_subjects){
#generate the particles from the conditional MVnorm AND mix of group level proposals
wmix <- 0.95
n1=rbinom(n=1,size=n_particles,prob=wmix)
if (n1<2) n1=2
if (n1>(n_particles-2)) n1=n_particles-2 ## These just avoid degenerate arrays.
n2=n_particles-n1
# do conditional MVnorm based on the proposal distribution
conditional = condMVNorm::condMVN(mean=mu_tilde[j,],sigma=sigma_tilde[j,,],dependent.ind=1:n_randeffect,
given.ind=(n_randeffect+1):n.params,X.given=prop_theta[i,1:(n.params-n_randeffect)])
particles1 <- mvtnorm::rmvnorm(n1, conditional$condMean,conditional$condVar)
# mix of proposal params and conditional
particles2 <- group_dist(n_samples=n2, parameters = prop_theta[i,],sample=TRUE, n_randeffect=n_randeffect)
particles <- rbind(particles1,particles2)
for (k in 1:n_particles){
x <-particles[k,]
#names for ll function to work
#mod notes: this is the bit the prior effects
names(x)<-sampled$par_names
# do lba log likelihood with given parameters for each subject, gets density of particle from ll func
logw_first=sampled$ll_func(x,data = data[as.numeric(factor(data$subject))==j,]) #mod notes: do we pass this in or the whole sampled object????
# below gets second part of equation 5 numerator ie density under prop_theta
# particle k and big vector of things
logw_second<-group_dist(random_effect = particles[k,], parameters = prop_theta[i,], sample= FALSE, n_randeffect = n_randeffect) #mod notes: group dist
# below is the denominator - ie mix of density under conditional and density under pro_theta
logw_third <- log(wmix*dmvnorm(particles[k,], conditional$condMean,conditional$condVar)+(1-wmix)*exp(logw_second)) #mod notes: fine?
#does equation 5
logw=(logw_first+logw_second)-logw_third
#assign to correct row/column
logp[k,j]=logw
}
}
#we use this part to centre the logw before addign back on at the end. This avoids inf and -inf values
sub_max = apply(logp,2,max)
logw = t(t(logp) - sub_max)
w = exp(logw)
subj_logp = log(apply(w,2,mean))+sub_max #means
# sum the logp and return
if(is.nan(sum(subj_logp))){
return(1e-10)
}else{
return(sum(subj_logp))
}
}
compute_lw=function(prop_theta,data,n_subjects,n_particles,n_randeffect,mu_tilde,sigma_tilde,i, prior_dist=prior_dist, sampled=sampled){
logp.out <- get_logp(prop_theta,data,n_subjects,n_particles,n_randeffect,mu_tilde,sigma_tilde,i, group_dist=group_dist)
##do equation 10
logw_num <- logp.out[1]+prior_dist(parameters = prop_theta[i,], prior_parameters = sampled$prior, n_randeffect)
logw_den <- mvtnorm::dmvt(prop_theta[i,], delta=muX, sigma=sigmaX,df=1, log = TRUE) #density of proposed params under the means
logw <- logw_num-logw_den #this is the equation 10
return(c(logw))
#NOTE: we should leave a note if variance is shit - variance is given by the logp function (currently commented out)
}
##### make it work
#makes an array to store the IS samples
tmp<-array(dim=c(IS_samples))
#do the sampling
if (cpus>1){
tmp <- mclapply(X=1:IS_samples,mc.cores = cpus, FUN = compute_lw, prop_theta = prop_theta,data = data,n_subjects= n_subjects,n_particles = n_particles,
n_randeffect = n_randeffect,mu_tilde=mu_tilde,sigma_tilde = sigma_tilde, prior_dist=prior_dist, sampled=sampled)
} else{
for (i in 1:IS_samples){
cat(i)
tmp[i]<-compute_lw(prop_theta,data,n_subjects,n_particles, n_randeffect,mu_tilde,sigma_tilde,i,prior_dist=prior_dist, sampled=sampled)
}
}
# get the ML value
finished <- tmp
tmp<-unlist(tmp)
max.lw <- max(tmp)
mean.centred.lw <- mean(exp(tmp-max.lw)) #takes off the max and gets mean (avoids infs)
MLE <-log(mean.centred.lw)+max.lw #puts max back on to get the lw
This script returns the maximum likelihood estimate for a model. Similar to DIC, this is relatively meaningless without something to compare to, where the model with the higher MLE is chosen. Further, we can also compute Bayes factors with MLE’s to show the weight of evidence for one model over another.
Conclusions
Ultimately, post sampling analysis is up to the user, but hopefully this blog has provided some insight into what sort of analysis is typically done, some ways of checking the sampling process, methods of posterior analysis and generating posterior data, as well as some model comparison insights. For more info on the PMwG sampler, see the documentation and for some more useful functions, see the PMwG toolkit github.
References
Tran, M. N., Scharth, M., Gunawan, D., Kohn, R., Brown, S. D., & Hawkins, G. E. (2020). Robustly estimating the marginal likelihood for cognitive models via importance sampling. Behavior Research Methods, 1-18.