Power Simulation in a Mixed Effects design using R

Author

Affiliation

Chris Jungerius

In this notebook we’ll go through a quick example of setting up a power analysis, using data from an existing, highly-powered study to make credible parameter estimates. The code for setting up a simulation is inspired by/shamelessly stolen from a great tutorial about this topic by DeBruine & Barr (2021) and Lisa DeBruine’s appendix on its application for sensitivity analysis (DeBruine & Barr, 2020). The aim of this tutorial is two-fold:

To demonstrate this approach for the most basic mixed model (using it only to deal with repeated measures - no nesting, no crossed random effects with stimuli types, etc.) which is a very common use of the technique for researchers in my immediate environment (visual attention research).
To translate this approach to different languages - although I love R and encourage everyone to use it for statistical analysis, Python remains in use by a sizeable number of researchers, and I would also like to introduce Julia as an alternative.

Before we do anything, let’s import all the packages we will need:

library(tidyverse) # Data wrangling, plotting and general awesomeness
library(lmerTest) # Mixed modeling using lme4 with better support for tests
library(broom.mixed) # To make pretty tables
library(knitr) # To print those pretty tables
library(faux) # Much easier random effects simulation

set.seed(90059)

In this example, we will make an estimate of the number of participants we need to replicate a simple and well-established experimental finding: The capture of attention by a colour singleton during visual search for a unique shape singleton. For this example, we are fortunate in that there are many studies of this effect for us to base our parameter estimates on. One recent example is a highly-powered study by Kirsten Adam from the Serences lab purpose-built to be used for sensitivity analysis. First let’s import the data for our specific case from the Adam et al. (2021) study, which is freely available in an OSF repository, and look at the data.

Note that when previous data doesn’t exist (or even if it does, but you don’t trust that it’s sufficient to base your effect estimates on) there are alternative ways of determining such parameters, including formally determining a smallest effect size of interest (Lakens et al., 2018).

The data we chose is from experiment 1c: variable colour singleton search. We are interested in the raw trial data, not the summary data (We are doing a mixed model after all, not an ANOVA) so we have to grab all the raw files and concatenate them.

df <- list.files(
    path = "./Experiment_1c",
    full.names = T
) %>%
    lapply(
        read_csv,
        col_types = cols(
            gender = "c",
            set_size = "f"
        )
    ) %>%
    bind_rows()

Once it’s imported, we can take a look at our data, e.g., looking at subject means between the two conditions:

df %>%
    filter(
        acc == 1,
        set_size == 4
    ) %>%
    mutate(rt = rt * 1000) %>%
    ggplot(
        aes(
            x = distractor,
            y = rt,
            color = as.factor(subject),
            group = as.factor(subject)
        )
    ) +
    guides(color = "none") +
    stat_summary(
        fun.data = "mean_se",
        size = 1,
        linewidth = 1
    ) +
    stat_summary(
      fun = "mean",
      geom="line",
      linewidth=1
    )+
    theme_bw() +
    ggtitle("Reaction time by participant") +
    xlab("Colour singleton") +
    ylab("Reaction time (ms)") +
    theme(text = element_text(size = 20))

We can clearly see typical atttentional capture effects in the data. Now that we have the data, let’s model it:

d <- df %>%
    filter(
        acc == 1,
        set_size == 4
    ) %>%
    mutate(rt = rt * 1000)

# Our model is simple: RT is dependent on distractor presence, with a random slope and intercept for each subject. More complex models are left as an exercise to the reader.

m1 <- lmer(rt ~ distractor + (distractor | subject), data = d)

kable(tidy(m1))

effect	group	term	estimate	std.error	statistic	df	p.value
fixed	NA	(Intercept)	651.5000492	16.66271	39.099285	22.99885	0.0e+00
fixed	NA	distractorpresent	30.8566430	4.75268	6.492473	22.90957	1.3e-06
ran_pars	subject	sd__(Intercept)	80.6315541	NA	NA	NA	NA
ran_pars	subject	cor__(Intercept).distractorpresent	0.3498972	NA	NA	NA	NA
ran_pars	subject	sd__distractorpresent	14.7695907	NA	NA	NA	NA
ran_pars	Residual	sd__Observation	175.8140465	NA	NA	NA	NA

The above model rt ~ distractor + ( distractor | subject) is our putative data generating process, the parameters that we believe underly the generation of observed dependent variables, and the relationship between those parameters. The table shown above gives us parameter estimates for all fixed and random effects in the model. Now let’s plug those parameters into a simulation!

n_subj <- 10      # number of subjects
n_present <- 200  # number of trials with a singleton present
n_absent <- 200   # number of trials without a singleton
beta_0 <- 650     # grand mean
beta_1 <- 30      # effect of distractor presence
tau_0 <- 80       # by-subject random intercept sd
tau_1 <- 15       # by-subject random slope sd
rho <- 0.35       # correlation between intercept and slope
sigma <- 175      # residual nose

# Generate a dataframe with one column per subject
dat_sim <- add_random(subj = n_subj) %>%
    # Each subject does both types of trials
    add_within("subj", singleton = c("absent", "present")) %>%
    # Because these trials are interchangeable, we expand the dataframe to account for trial numbers using the uncount trick
    add_between("singleton", trial_number=c(n_absent, n_present)) %>%
    uncount(trial_number) %>% 
    # Give each participant a random intercept and slope
    add_ranef("subj", T0s = tau_0, T1s = tau_1, .cors = rho) %>%
    # Account for residual variance with the sigma term - random noise on each trial
    mutate(sigma = rnorm(nrow(.), mean = 0, sd = sigma)) %>%
    # Contrast code singleton so we can do the little summation at the bottom
    add_contrast("singleton", "treatment", colnames = "s") %>%
    # Give each trial the right mix of Intercept, slope, random effects, and residual noise
    mutate(rt = beta_0 + T0s + (beta_1 + T1s) * s + sigma)

Data generated! Does it look like we’d expect?

dat_sim %>% 
    ggplot(
        aes(
            x = singleton,
            y = rt,
            color = subj,
            group = subj
        )
    ) +
    guides(color = "none") +
    stat_summary(
        fun.data = "mean_se",
        size = 1,
        linewidth = 1
    ) +
    stat_summary(
      fun = "mean",
      geom="line",
      linewidth=1
    )+
    theme_bw() +
  ggtitle("Reaction time by participant (simulated)")+
  xlab("Colour singleton")+
  ylab("Reaction time (ms)")+
  theme(text=element_text(size=20))

Looks comparable to the original data! Now let’s fit a model to see if we recover the parameters:

m_sim <- lmer(rt ~ singleton + (singleton | subj), dat_sim)
kable(tidy(m_sim))

effect	group	term	estimate	std.error	statistic	df	p.value
fixed	NA	(Intercept)	633.7980977	21.010593	30.165645	9.001584	0.0000000
fixed	NA	singleton.present-absent	33.1292455	7.275868	4.553305	9.000460	0.0013792
ran_pars	subj	sd__(Intercept)	65.3130861	NA	NA	NA	NA
ran_pars	subj	cor__(Intercept).singleton.present-absent	0.6406591	NA	NA	NA	NA
ran_pars	subj	sd__singleton.present-absent	15.2341893	NA	NA	NA	NA
ran_pars	Residual	sd__Observation	172.4244744	NA	NA	NA	NA

my_sim_data <- function(
    n_subj = 5, # number of subjects
    n_absent = 200, # number of trials per condition
    n_present = 200,
    beta_0 = 650, # grand mean
    beta_1 = 30, # effect of distractor presence
    tau_0 = 80, # by-subject random intercept sd
    tau_1 = 15, # by-subject random slope sd
    rho = 0.35, # correlation between intercept and slope
    sigma = 175) {
    dat_sim <- add_random(subj = n_subj) %>%
    add_within("subj", singleton = c("absent", "present")) %>%
    add_between("singleton", trial_number=c(n_absent, n_present)) %>%
    uncount(trial_number) %>% 
    add_ranef("subj", T0s = tau_0, T1s = tau_1, .cors = rho) %>%
    mutate(sigma = rnorm(nrow(.), mean = 0, sd = sigma)) %>%
    add_contrast("singleton", "treatment", colnames = "s") %>%
    mutate(rt = beta_0 + T0s + (beta_1 + T1s) * s + sigma)

    dat_sim
}

The above function simulates data. The function below combines it with a model fit so we have a function that can be repeatedly called during our power analysis.

single_run <- function(filename = NULL, ...) {
    dat_sim <- my_sim_data(...)
      # run lmer and capture any warnings
  ww <- ""
  suppressMessages(suppressWarnings(
    mod_sim <- withCallingHandlers({
      lmer(rt ~ singleton + (singleton | subj),
           dat_sim, REML = FALSE)},
      warning = function(w) { ww <<- w$message }
    )
  ))
  
  # get results table and add rep number and any warnings
  sim_results <- broom.mixed::tidy(mod_sim) %>%
    mutate(warnings = ww)
  
  # add columns for the specified parameters
  params <- list(...)
  for (name in names(params)) {
    sim_results[name] <- params[name]
  }

  # append the results to a file if filename is set
  if (!is.null(filename)) {
    append <- file.exists(filename) # append if the file exists
    write_csv(sim_results, filename, append = append)
  }
  
  sim_results
}

Now let’s run our sensitivity analysis - we will run our simulation 1000 times for each combination of parameters, and record how often the fixed effect estimates reach significance:

nreps <- 1000

params <- crossing(
  rep        = 1:nreps,   # number of runs
  n_subj     = 10,        # number of subjects
  n_absent   = 150,       # number of trials per condition
  n_present  = 150,
  beta_0     = 650,       # grand mean
  beta_1     = 30,        # effect of distractor presence
  tau_0      = 80,        # by-subject random intercept sd
  tau_1      = 15,        # by-subject random slope sd
  rho        = 0.35,      # correlation between intercept and slope
  sigma      = 175        # residual (standard deviation)
) %>%
  select(-rep)
  
sims <- purrr::pmap_df(params,single_run,filename=NULL)

# calculate mean estimates and power for specified alpha
alpha <- 0.05
sims %>% 
  filter(effect == "fixed") %>%
  group_by(term) %>%
  summarize(
    mean_estimate = mean(estimate),
    mean_se = mean(std.error),
    power = mean(p.value < alpha),
    .groups = "drop"
  )

# A tibble: 2 × 4
  term                     mean_estimate mean_se power
  <chr>                            <dbl>   <dbl> <dbl>
1 (Intercept)                      650.    23.9  1    
2 singleton.present-absent          29.9    7.81 0.929

If we want to run our sensitivity analysis across a given parameter space, we’ll have to map the function single_run to generate data across this space, for example, over a varying number of participants:

filename1 <- "sens_faux.csv"
nreps <- 1000                # number of replications per parameter combo

params <- crossing(
  rep         = 1:nreps,     # repeats each combo nreps times
  n_subj      = seq(2, 15),  # number of subjects
  n_present   = 150,         # number of distractor present trials
  n_absent    = 150,         # number of distractor absent trials
  beta_0      = 650,         # Intercept
  beta_1      = 30,          # effect of distractor presence
  tau_0       = 80,          # by-subject random intercept sd
  tau_1       = 15,          # by-subject random slope sd
  rho         = 0.35,        # correlation between intercept and slope
  sigma       = 175          # residual (standard deviation)
) %>%
  select(-rep)

if (!file.exists(filename1)) {
  # run a simulation for each row of params
  # and save to a file on each rep
  sims1 <- purrr::pmap_df(params, single_run, filename = filename1)
}

Note that the above could obviously also be run over other dimensions of our parameter space, e.g. for different estimates of the fixed effects, amount of noise, number of trials, etc. etc., by changing the params list. How did we do? Let’s take a look at our power curve.

# read saved simulation data
# NB: col_types is set for warnings in case 
#     the first 1000 rows don't have any
ct <- cols(warnings = col_character(),
           # makes sure plots display in this order
           group = col_factor(ordered = TRUE),
           term = col_factor(ordered = TRUE))
sims1 <- read_csv(filename1, col_types = ct)

power1 <- sims1 %>% 
  filter(effect == "fixed", term == "singleton.present-absent") %>%
  group_by(n_subj) %>%
  summarise(
    mean_estimate = mean(estimate),
    mean_se = mean(std.error),
    power = mean(p.value < alpha),
    .groups = "drop"
  ) 

power1 %>%
  ggplot(aes(n_subj, power)) +
  geom_point() +
  geom_smooth(se = FALSE) +
  ylim(0, 1) +
  geom_hline(yintercept=0.8,linetype="dashed")+
  scale_x_continuous(name = "Effect of number of participants") +
  ggtitle("Power for designs varying in sample size") +
  theme_bw()

Our power analysis has determined that, with the parameters established above, we need ~8 or more participants to reliably detect an effect!

The code used above is specific to power analysis for mixed models, but the approach generalises to other methods too, of course! The above code can easily be wrangled to handle different model types (simply change the model definition in single_run and make sure to capture the right parameters), and even Bayesian approaches. (For a thorough example of doing power analysis with Bayesian methods and the awesome bayesian regression package brms, see Kurz, 2021.)

Even if the above code is spaghetti to you (~~I was originally planning on also converting it to python/matlab, but there are only so many hours in the day~~click here for a python version or here for a julia version), I hope you will take away a few things from this tutorial:

Power analysis is nothing more than testing whether we can recover the parameters of a hypothesised data-generating process reliably using our statistical test of choice.
We can determine the parameters for such a data-generating process in the same way we formulate hypotheses (and indeed, in some ways these two things are one and the same): we use our knowledge, intuition, and previous work to inform our decision-making.
If you have a hypothetical data-generating process, you can simulate data by simply formalising that process as code and letting it simulate a dataset
Simulation can help you answer questions about your statistical approach that are difficult to answer with other tools

References

Adam, K. C. S., Patel, T., Rangan, N., & Serences, J. T. (2021). Classic Visual Search Effects in an Additional Singleton Task: An Open Dataset. 4(1), 34. https://doi.org/10.5334/joc.182

DeBruine, L. M., & Barr, D. J. (2020). Appendix 1c: Sensitivity Analysis. https://debruine.github.io/lmem_sim/articles/appendix1c_sensitivity.html.

DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119. https://doi.org/10.1177/2515245920965119

Kurz, A. S. (2021). Bayesian power analysis: Part I. Prepare to reject ‘\(H_0\)‘ with simulation. In A. Solomon Kurz. https://solomonkurz.netlify.app/blog/bayesian-power-analysis-part-i/.

Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence Testing for Psychological Research: A Tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963