Power Simulation in a Mixed Effects design using Julia

Author
Affiliation

Chris Jungerius

In this notebook we’ll go through a quick example of setting up a power analysis, using data from an existing, highly-powered study to make credible parameter estimates. The code for setting up a simulation is inspired by/shamelessly stolen from a great tutorial about this topic by DeBruine & Barr (2021) and Lisa DeBruine’s appendix on its application for sensitivity analysis (DeBruine & Barr, 2020). The aim of this tutorial is two-fold:

  1. To demonstrate this approach for the most basic mixed model (using it only to deal with repeated measures - no nesting, no crossed random effects with stimuli types, etc.) which is a very common use of the technique for researchers in my immediate environment (visual attention research).
  2. To translate this approach to different languages - although I love R and encourage everyone to use it for statistical analysis, Python remains in use by a sizeable number of researchers, and I would also like to introduce Julia as an alternative.

Before we do anything, let’s import all the packages we will need:

using DataFrames
using DataFramesMeta
using CSV
using MixedModels
using Gadfly
using Statistics
using Distributions
using Random

Random.seed!(90059)
TaskLocalRNG()

In this example, we will make an estimate of the number of participants we need to replicate a simple and well-established experimental finding: The capture of attention by a colour singleton during visual search for a unique shape singleton. For this example, we are fortunate in that there are many studies of this effect for us to base our parameter estimates on. One recent example is a highly-powered study by Kirsten Adam from the Serences lab purpose-built to be used for sensitivity analysis. First let’s import the data for our specific case from the Adam et al. (2021) study, which is freely available in an OSF repository, and look at the data.

Note that when previous data doesn’t exist (or even if it does, but you don’t trust that it’s sufficient to base your effect estimates on) there are alternative ways of determining such parameters, including formally determining a smallest effect size of interest (Lakens et al., 2018).

The data we chose is from experiment 1c: variable colour singleton search. We are interested in the raw trial data, not the summary data (We are doing a mixed model after all, not an ANOVA) so we have to grab all the raw files and concatenate them.

all_files = readdir("Experiment_1c", join = true)
dfs = CSV.read.(all_files,DataFrame)
df = vcat(dfs...)
38400×50 DataFrame
38375 rows omitted
Row experiment subject uniqueID block trial trial_cumulative age gender color_cond_blocking color_history shape_type set_size distractor rt acc target_orient response_key response_made iti target_color target_item distractor_item target_X target_Y distractor_X distractor_Y item1_X item1_Y item2_X item2_Y item3_X item3_Y item4_X item4_Y item5_X item5_Y item6_X item6_Y item1_shape item2_shape item3_shape item4_shape item5_shape item6_shape item1_line item2_line item3_line item4_line item5_line item6_line
String3 Int64 Int64 Int64 Int64 Int64 Int64 Any String3 String15 String15 Int64 String7 Float64 Int64 String15 String15 Int64 Float64 String7 Int64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Int64 Int64 Int64 Float64 Float64 Float64 String15 String15 String15 String15 String15 String15
1 1c 10 100148 1 1 1 21 M NA variable homogeneous 5 absent 1.09181 1 horizontal horizontal 1 0.929414 red 1 NaN 721.201 842.519 NaN NaN 721.201 842.519 1006.3 749.885 1006.3 450.115 721.201 357.481 545.0 600.0 0.0 0.0 4 0 0 0.0 0.0 NaN horizontal horizontal vertical horizontal vertical none
2 1c 10 100148 1 2 2 21 M NA variable homogeneous 6 absent 1.42371 1 vertical vertical 1 0.705883 red 3 NaN 938.883 386.139 NaN NaN 915.768 827.207 1054.65 613.346 938.883 386.139 684.232 372.793 545.349 586.654 661.117 813.861 0 0 4 0.0 0.0 0.0 vertical horizontal vertical horizontal horizontal vertical
3 1c 10 100148 1 3 3 21 M NA variable homogeneous 5 present 1.57339 1 vertical vertical 1 1.04706 red 3 5.0 853.017 350.572 610.498 770.628 903.718 832.954 1053.6 573.345 853.017 350.572 579.164 472.5 610.498 770.628 0.0 0.0 0 0 4 0.0 0.0 NaN horizontal vertical vertical vertical vertical none
4 1c 10 100148 1 4 4 21 M NA variable homogeneous 5 present 1.82955 1 horizontal horizontal 1 0.62353 red 2 4.0 1051.13 555.72 570.808 488.215 919.715 825.152 1051.13 555.72 835.489 347.482 570.808 488.215 622.862 783.432 0.0 0.0 0 4 0 0.0 0.0 NaN horizontal horizontal vertical vertical horizontal none
5 1c 10 100148 1 5 5 21 M NA variable homogeneous 4 absent 1.21572 1 vertical vertical 1 0.682353 green 3 NaN 746.983 350.572 NaN NaN 853.017 849.428 1049.43 546.983 746.983 350.572 550.572 653.017 0.0 0.0 0.0 0.0 0 0 4 0.0 NaN NaN vertical vertical vertical vertical none none
6 1c 10 100148 1 6 6 21 M NA variable homogeneous 5 absent 0.852387 1 vertical vertical 1 0.811766 green 4 NaN 800.0 345.0 NaN NaN 650.115 806.299 949.885 806.299 1042.52 521.201 800.0 345.0 557.481 521.201 0.0 0.0 0 0 0 4.0 0.0 NaN horizontal horizontal horizontal vertical horizontal none
7 1c 10 100148 1 7 7 21 M NA variable homogeneous 4 absent 0.622422 1 horizontal horizontal 1 0.811766 green 3 NaN 804.45 345.039 NaN NaN 795.55 854.961 1054.96 604.45 804.45 345.039 545.039 595.55 0.0 0.0 0.0 0.0 0 0 4 0.0 NaN NaN vertical vertical horizontal vertical none none
8 1c 10 100148 1 8 8 21 M NA variable homogeneous 4 absent 1.52177 1 horizontal horizontal 1 0.752942 red 2 NaN 1029.19 488.215 NaN NaN 911.785 829.192 1029.19 488.215 688.215 370.808 570.808 711.785 0.0 0.0 0.0 0.0 0 4 0 0.0 NaN NaN horizontal horizontal horizontal horizontal none none
9 1c 10 100148 1 9 9 21 M NA variable homogeneous 5 present 0.662533 1 horizontal horizontal 1 0.705883 red 2 3.0 1045.12 529.712 808.899 345.155 942.594 811.405 1045.12 529.712 808.899 345.155 560.378 512.785 643.006 800.943 0.0 0.0 0 4 0 0.0 0.0 NaN horizontal horizontal vertical horizontal horizontal none
10 1c 10 100148 1 10 10 21 M NA variable homogeneous 3 absent 0.733259 1 vertical vertical 1 0.658824 red 1 NaN 604.659 763.911 NaN NaN 604.659 763.911 1039.62 687.215 755.72 348.874 0.0 0.0 0.0 0.0 0.0 0.0 4 0 0 NaN NaN NaN vertical horizontal vertical none none none
11 1c 10 100148 1 11 11 21 M NA variable homogeneous 6 absent 0.769892 1 horizontal horizontal 1 0.894119 green 3 NaN 1042.52 521.201 NaN NaN 746.983 849.428 989.502 770.628 1042.52 521.201 853.017 350.572 610.498 429.372 557.481 678.799 0 0 4 0.0 0.0 0.0 vertical horizontal horizontal horizontal vertical vertical
12 1c 10 100148 1 12 12 21 M NA variable homogeneous 5 present 1.04978 1 vertical vertical 1 0.823531 green 5 1.0 549.685 648.656 768.923 853.099 768.923 853.099 1031.11 707.768 973.91 413.505 676.374 376.972 549.685 648.656 0.0 0.0 0 0 0 0.0 4.0 NaN vertical vertical vertical vertical vertical none
13 1c 10 100148 1 13 13 21 M NA variable homogeneous 4 absent 0.662078 1 vertical vertical 1 0.752942 green 4 NaN 546.397 626.655 NaN NaN 826.655 853.603 1053.6 573.345 773.345 346.397 546.397 626.655 0.0 0.0 0.0 0.0 0 0 0 4.0 NaN NaN horizontal horizontal horizontal vertical none none
38389 1c 9 100149 25 53 1589 18 false NA variable homogeneous 5 absent 0.681741 1 horizontal horizontal 1 0.835291 green 2 NaN 1054.65 613.346 NaN NaN 865.999 846.311 1054.65 613.346 891.384 361.937 601.828 439.523 586.139 738.883 0.0 0.0 0 4 0 0.0 0.0 NaN horizontal horizontal horizontal horizontal horizontal none
38390 1c 9 100149 25 54 1590 18 false NA variable homogeneous 6 absent 0.63758 1 vertical vertical 1 1.09411 red 2 NaN 1020.84 727.5 NaN NaN 800.0 855.0 1020.84 727.5 1020.84 472.5 800.0 345.0 579.164 472.5 579.164 727.5 0 4 0 0.0 0.0 0.0 vertical vertical horizontal horizontal vertical horizontal
38391 1c 9 100149 25 55 1591 18 false NA variable homogeneous 5 present 0.604192 1 horizontal horizontal 1 1.08235 red 2 3.0 1046.31 534.001 813.346 345.349 938.883 813.861 1046.31 534.001 813.346 345.349 561.937 508.616 639.523 798.172 0.0 0.0 0 4 0 0.0 0.0 NaN horizontal horizontal horizontal horizontal vertical none
38392 1c 9 100149 25 56 1592 18 false NA variable homogeneous 3 absent 0.670488 1 horizontal horizontal 1 0.905879 red 3 NaN 653.738 391.116 NaN NaN 692.232 831.108 1054.03 577.775 653.738 391.116 0.0 0.0 0.0 0.0 0.0 0.0 0 0 4 NaN NaN NaN horizontal vertical horizontal none none none
38393 1c 9 100149 25 57 1593 18 false NA variable homogeneous 4 absent 0.66062 1 horizontal horizontal 1 1.02352 green 1 NaN 643.006 800.943 NaN NaN 643.006 800.943 1000.94 756.994 956.994 399.057 599.057 443.006 0.0 0.0 0.0 0.0 4 0 0 0.0 NaN NaN horizontal horizontal vertical vertical none none
38394 1c 9 100149 25 58 1594 18 false NA variable homogeneous 4 absent 0.618374 1 horizontal horizontal 1 0.811762 green 2 NaN 1027.21 484.232 NaN NaN 915.768 827.207 1027.21 484.232 684.232 372.793 572.793 715.768 0.0 0.0 0.0 0.0 0 4 0 0.0 NaN NaN vertical horizontal horizontal vertical none none
38395 1c 9 100149 25 59 1595 18 false NA variable homogeneous 5 absent 0.573223 1 vertical vertical 1 0.717646 red 3 NaN 998.172 439.523 NaN NaN 734.001 846.311 1013.86 738.883 998.172 439.523 708.616 361.937 545.349 613.346 0.0 0.0 0 0 4 0.0 0.0 NaN vertical vertical vertical vertical horizontal none
38396 1c 9 100149 25 60 1596 18 false NA variable homogeneous 5 present 0.54497 1 horizontal horizontal 1 0.811762 red 3 2.0 887.215 360.378 1054.84 608.899 870.288 845.122 1054.84 608.899 887.215 360.378 599.057 443.006 588.595 742.594 0.0 0.0 0 0 4 0.0 0.0 NaN vertical horizontal horizontal horizontal vertical none
38397 1c 9 100149 25 61 1597 18 false NA variable homogeneous 4 absent 0.552039 1 vertical vertical 1 0.799998 green 1 NaN 887.215 839.622 NaN NaN 887.215 839.622 1039.62 512.785 712.785 360.378 560.378 687.215 0.0 0.0 0.0 0.0 4 0 0 0.0 NaN NaN vertical vertical horizontal vertical none none
38398 1c 9 100149 25 62 1598 18 false NA variable homogeneous 3 absent 0.551724 1 horizontal horizontal 1 0.635294 green 2 NaN 973.91 413.505 NaN NaN 874.555 843.858 973.91 413.505 551.536 542.637 0.0 0.0 0.0 0.0 0.0 0.0 0 4 0 NaN NaN NaN horizontal horizontal horizontal none none none
38399 1c 9 100149 25 63 1599 18 false NA variable homogeneous 5 absent 0.575129 1 vertical vertical 1 0.788233 green 1 NaN 708.616 838.063 NaN NaN 708.616 838.063 998.172 760.477 1013.86 461.117 734.001 353.689 545.349 586.654 0.0 0.0 4 0 0 0.0 0.0 NaN vertical horizontal vertical horizontal vertical none
38400 1c 9 100149 25 64 1600 18 false NA variable homogeneous 3 absent 0.599166 1 horizontal horizontal 1 0.811762 red 2 NaN 1023.03 476.374 NaN NaN 795.55 854.961 1023.03 476.374 581.422 468.665 0.0 0.0 0.0 0.0 0.0 0.0 0 4 0 NaN NaN NaN vertical horizontal vertical none none none

Once it’s imported, we can take a look at our data, e.g., looking at subject means between the two conditions:

@chain df begin
  @subset(
    :acc.==1, 
    :set_size.==4)
  @transform(
    :rt = :rt .* 1000,
    :subject = string.(:subject)
  )
  groupby([:subject, :distractor])
  @combine(
    :rt = mean(:rt))
  plot(
    layer(
    x=:distractor,
    y=:rt,
    color=:subject,
    Geom.point
    ),
    layer(
    x=:distractor,
    y=:rt,
    color=:subject,
    Geom.line
    ),
    Theme(key_position=:none)
  )
  end
distractor absent present 2664.3486083770285 1634.315013885498 1596.5403305987517 2597.4775025719091 1746.2136491301096 2771.2027612997562 2846.6547551907992 1776.7736798241026 2662.5892441682141 1641.7468761911198 2696.1439915889285 1676.9537345115384 2772.5133920453259 1750.2685938126002 1490.89349233187164 2514.1337838089257 2562.5185446536289 1539.6714730465666 1592.9642759050643 2627.69085949004 2663.0655116722234 1601.2144048562211 1671.9927034879986 2694.1944373281377 2634.0596158668478 1616.8303174316568 2886.5406752894164 1850.874905733718 2570.451065437081 1554.8148142794768 1589.2501879961063 2612.9027163982391 1611.0502781012119 2628.3144865717206 1708.5064026304912 2737.689866920827 2693.1869350584213 1658.5283425933156 2676.5534727985322 1649.1643690692329 2676.7373449948369 1639.2413690836743 2672.1931854072882 1666.3701227030804 2780.9169453879198 1749.8016505661406 2735.6533577567652 1619.996827095747 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 400 500 600 700 800 900 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800 820 840 860 880 900 400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468 470 472 474 476 478 480 482 484 486 488 490 492 494 496 498 500 502 504 506 508 510 512 514 516 518 520 522 524 526 528 530 532 534 536 538 540 542 544 546 548 550 552 554 556 558 560 562 564 566 568 570 572 574 576 578 580 582 584 586 588 590 592 594 596 598 600 602 604 606 608 610 612 614 616 618 620 622 624 626 628 630 632 634 636 638 640 642 644 646 648 650 652 654 656 658 660 662 664 666 668 670 672 674 676 678 680 682 684 686 688 690 692 694 696 698 700 702 704 706 708 710 712 714 716 718 720 722 724 726 728 730 732 734 736 738 740 742 744 746 748 750 752 754 756 758 760 762 764 766 768 770 772 774 776 778 780 782 784 786 788 790 792 794 796 798 800 802 804 806 808 810 812 814 816 818 820 822 824 826 828 830 832 834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868 870 872 874 876 878 880 882 884 886 888 890 892 894 896 898 900 300 600 900 rt

We can clearly see typical atttentional capture effects in the data. Now that we have the data, let’s model it:

d = @chain df begin
  @subset(
    :acc.==1,
    :set_size.==4)
  @transform(
    :rt = :rt.*1000,
    :subject = string.(:subject))
end

# Our model is simple: RT is dependent on distractor presence, with a random slope and intercept for each subject. More complex models are left as an exercise to the reader.

formula = @formula(rt ~ 1 + distractor + (1 + distractor | subject))

fm1 = fit(MixedModel, formula, d, REML=true)
Est. SE z p σ_subject
(Intercept) 651.5000 16.6627 39.10 <1e-99 80.6316
distractor: present 30.8566 4.7527 6.49 <1e-10 14.7696
Residual 175.8140
# check random effects
VarCorr(fm1)
Column Variance Std.Dev Corr.
subject (Intercept) 6501.4475 80.6316
distractor: present 218.1408 14.7696 +0.35
Residual 30910.5789 175.8140

We can also check whether the addition of the distractor term is even useful in describing the data: does it explain significantly more variance than the intercept only model?

fm1 = fit(MixedModel, formula, d)
formula_null = @formula(rt ~ 1 + (1 + distractor | subject))
fm1_null = fit(MixedModel, formula_null, d)
lrt = MixedModels.likelihoodratiotest(fm1,fm1_null)
model-dof deviance χ² χ²-dof P(>χ²)
rt ~ 1 + (1 + distractor | subject) 5 120892
rt ~ 1 + distractor + (1 + distractor | subject) 6 120867 25 1 <1e-06

The above model rt ~ distractor + ( distractor | subject) is our putative data generating process, the parameters that we believe underly the generation of observed dependent variables, and the relationship between those parameters. The table shown above gives us parameter estimates for all fixed and random effects in the model. Now let’s plug those parameters into a simulation!

n_subj     = 10     # number of subjects
n_present  = 200    # number of distractor present trials
n_absent   = 200    # number of distractor absent
beta_0     = 650    # Intercept
beta_1     = 30     # effect of category
tau_0      = 80     # by-subject random intercept sd
tau_1      = 15     # by-subject random slope sd
rho        = 0.35   # correlation between intercept and slope
sigma      = 175    # residual nose
175

Generate trials with their fixed effects:

# simulate a sample of items
# total number of items = n_ingroup + n_outgroup

items = @chain DataFrame(
  :item_id => range(1,n_absent+n_present),
  :category => [repeat(["absent"],n_absent)..., repeat(["present"], n_present)...]
) begin
  @rtransform :X_i = :category == "present" ? 1 : 0
end

# items.describe()
400×3 DataFrame
375 rows omitted
Row item_id category X_i
Int64 String Int64
1 1 absent 0
2 2 absent 0
3 3 absent 0
4 4 absent 0
5 5 absent 0
6 6 absent 0
7 7 absent 0
8 8 absent 0
9 9 absent 0
10 10 absent 0
11 11 absent 0
12 12 absent 0
13 13 absent 0
389 389 present 1
390 390 present 1
391 391 present 1
392 392 present 1
393 393 present 1
394 394 present 1
395 395 present 1
396 396 present 1
397 397 present 1
398 398 present 1
399 399 present 1
400 400 present 1

And generate participants with their random intercepts and slopes:

#simulate a sample of subjects

#calculate random intercept / random slope covariance
covar = rho * tau_0 * tau_1

#put values into variance-covariance matrix
cov_mx = [tau_0^2 covar; covar tau_1^2]

#generate the by-subject random effects
dist = MvNormal([0, 0], cov_mx)
subject_rfx = rand(dist, n_subj)

# # combine with subject IDs

subjects = DataFrame(
  :subj_id => string.(range(1,n_subj)),
  :T_0s => subject_rfx[1,:],
  :T_1s => subject_rfx[2,:]
)
10×3 DataFrame
Row subj_id T_0s T_1s
String Float64 Float64
1 1 -67.9967 -4.29486
2 2 105.907 -1.89717
3 3 73.7466 2.99368
4 4 156.603 14.0334
5 5 40.9051 -2.73222
6 6 34.1562 11.3813
7 7 -74.5877 -5.30155
8 8 -80.6553 21.5753
9 9 126.907 34.0272
10 10 57.3057 17.3225

Now combine and add residual noise to create a complete dataframe:

#cross items and subjects, add noise

dat_sim = @chain crossjoin(subjects,items) begin
  @rtransform @astable begin
    :e_si = rand(Normal(0, sigma), 1)[1]
    #calculate response variable
    :RT = beta_0 + :T_0s + (beta_1 + :T_1s) * :X_i + :e_si
    end
  @select(:subj_id, :item_id, :category, :X_i, :RT)
end
4000×5 DataFrame
3975 rows omitted
Row subj_id item_id category X_i RT
String Int64 String Int64 Float64
1 1 1 absent 0 516.535
2 1 2 absent 0 757.313
3 1 3 absent 0 544.622
4 1 4 absent 0 655.981
5 1 5 absent 0 691.763
6 1 6 absent 0 523.838
7 1 7 absent 0 510.876
8 1 8 absent 0 690.3
9 1 9 absent 0 792.548
10 1 10 absent 0 713.432
11 1 11 absent 0 825.375
12 1 12 absent 0 627.622
13 1 13 absent 0 667.263
3989 10 389 present 1 531.323
3990 10 390 present 1 800.412
3991 10 391 present 1 527.266
3992 10 392 present 1 787.222
3993 10 393 present 1 664.524
3994 10 394 present 1 701.211
3995 10 395 present 1 562.528
3996 10 396 present 1 677.546
3997 10 397 present 1 527.31
3998 10 398 present 1 686.81
3999 10 399 present 1 736.427
4000 10 400 present 1 667.098

Data generated! Does it look like we’d expect?

@chain dat_sim begin
  groupby([:subj_id, :category])
  @combine(
    :RT = mean(:RT))
  plot(
    layer(
    x=:category,
    y=:RT,
    color=:subj_id,
    Geom.point
    ),
    layer(
    x=:category,
    y=:RT,
    color=:subj_id,
    Geom.line
    ),
    Theme(key_position=:none)
  )
  end
category absent present 2747.7810949908085 1698.5580646874738 2843.3112150118571 1770.4761976702373 2607.4369942269565 1577.7904606931501 2602.7471673021282 1582.7455560915758 2728.6172898775102 1680.1664765350131 2707.034138874286 1680.856472088064 2857.060791938592 1795.6516553027382 2759.42905154613 1763.5041656052523 2792.2599495859052 1744.266231182607 2626.1312264583478 1602.2390899766315 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 500 600 700 800 900 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800 820 840 860 880 900 500 502 504 506 508 510 512 514 516 518 520 522 524 526 528 530 532 534 536 538 540 542 544 546 548 550 552 554 556 558 560 562 564 566 568 570 572 574 576 578 580 582 584 586 588 590 592 594 596 598 600 602 604 606 608 610 612 614 616 618 620 622 624 626 628 630 632 634 636 638 640 642 644 646 648 650 652 654 656 658 660 662 664 666 668 670 672 674 676 678 680 682 684 686 688 690 692 694 696 698 700 702 704 706 708 710 712 714 716 718 720 722 724 726 728 730 732 734 736 738 740 742 744 746 748 750 752 754 756 758 760 762 764 766 768 770 772 774 776 778 780 782 784 786 788 790 792 794 796 798 800 802 804 806 808 810 812 814 816 818 820 822 824 826 828 830 832 834 836 838 840 842 844 846 848 850 852 854 856 858 860 862 864 866 868 870 872 874 876 878 880 882 884 886 888 890 892 894 896 898 900 500 1000 RT

Looks comparable to the original data! Now let’s fit a model to see if we recover the parameters:

simformula = @formula(RT ~ 1 + category + (1 + category | subj_id))
fm2 = fit(MixedModel, simformula, dat_sim, REML=true)
Est. SE z p σ_subj_id
(Intercept) 689.6254 25.3951 27.16 <1e-99 79.3484
category: present 37.5555 7.1469 5.25 <1e-06 14.3157
Residual 174.8844
# check random effects
VarCorr(fm2)
Column Variance Std.Dev Corr.
subj_id (Intercept) 6296.1711 79.3484
category: present 204.9394 14.3157 +0.81
Residual 30584.5494 174.8844

And as a quick test: does the full model explain the data significantly better than an intercept only model?

fm2 = fit(MixedModel, simformula, dat_sim)
simformula_null = @formula(RT ~ 1 + (category | subj_id))
fm2_null = fit(MixedModel, simformula_null, dat_sim)
lrt_sim = MixedModels.likelihoodratiotest(fm2,fm2_null)
model-dof deviance χ² χ²-dof P(>χ²)
RT ~ 1 + (1 + category | subj_id) 5 52724
RT ~ 1 + category + (1 + category | subj_id) 6 52710 14 1 0.0002

Great, our simulation works - we recover our ground truth for the different coefficients (allowing for differences due to noise and limited sample size). Now for a power analysis, we’d put the above in functions and run the code many times for a given combination of parameters. See below:

function my_sim_data(;
  n_subj     = 5,       # number of subjects
  n_present  = 200,     # number of distractor present trials
  n_absent   = 200,     # number of distractor absent
  beta_0     = 650,     # Intercept
  beta_1     = 30,      # effect of category
  tau_0      = 80,      # by-subject random intercept sd
  tau_1      = 15,      # by-subject random slope sd
  rho        = 0.35,    # correlation between intercept and slope
  sigma      = 175      # residual noise
)
  
  # simulate a sample of items
  # total number of items = n_ingroup + n_outgroup

  items = @chain DataFrame(
    :item_id => range(1,n_absent+n_present),
    :category => [repeat(["absent"],n_absent)..., repeat(["present"], n_present)...]
  ) begin
    @rtransform :X_i = :category == "present" ? 1 : 0
  end

  #simulate a sample of subjects

  #calculate random intercept / random slope covariance
  covar = rho * tau_0 * tau_1

  #put values into variance-covariance matrix
  cov_mx = [tau_0^2 covar; covar tau_1^2]

  #generate the by-subject random effects
  dist = MvNormal([0, 0], cov_mx)
  subject_rfx = rand(dist, n_subj)

  # # combine with subject IDs

  subjects = DataFrame(
    :subj_id => string.(range(1,n_subj)),
    :T_0s => subject_rfx[1,:],
    :T_1s => subject_rfx[2,:]
  )

  dat_sim = @chain crossjoin(subjects,items) begin
    @rtransform @astable begin
      :e_si = rand(Normal(0, sigma), 1)[1]
      #calculate response variable
      :RT = beta_0 + :T_0s + (beta_1 + :T_1s) * :X_i + :e_si
      end
    @select(:subj_id, :item_id, :category, :X_i, :RT)
  end
  
  dat_sim
end
my_sim_data (generic function with 1 method)

The above function simulates data. The function below combines it with a model fit so we have a function that can be repeatedly called during our power analysis.

function single_run(filename = nothing, args...; kwargs...)
  
  dat_sim = my_sim_data(;kwargs...)
  simformula = @formula(RT ~ 1 + category + (1 + category | subj_id))
  simformula_null = @formula(RT ~ 1 + (category | subj_id))
  fm = fit(MixedModel, simformula, dat_sim) 
  fm_null = fit(MixedModel, simformula_null, dat_sim)
  lrt = MixedModels.likelihoodratiotest(fm,fm_null)
  sim_results = DataFrame(coeftable(fm))
  # the p-values calculated by MixedModels are uncorrected and therefore inflated - we use the best (reasonable) alternative method available to us, the likelihood ratio test vs the intercept-only model, to find a better estimate (note that we leave the intercept p value since it isn't of interest here)
  sim_results[!,"Pr(>|z|)"][2] = lrt.pvalues[1]
  if length([kwargs...]) != 0
    sim_results = crossjoin(sim_results,DataFrame(kwargs...))
  end
  
  if !isnothing(filename)
    append = isfile(filename)
    CSV.write(filename, sim_results, append = append)
  end  
  sim_results
end
single_run (generic function with 2 methods)

Now let’s run our sensitivity analysis - we will run our simulation many times (1000+ times, which we can show here because of Julia’s blazing speed) for each combination of parameters, and record how often the fixed effect estimates reach significance:

nreps = 1000

params = allcombinations(
  DataFrame,
  :rep        => 1:nreps, # number of runs
  :n_subj     => 8,       # number of subjects
  :n_present  => 150,     # number of distractor present trials
  :n_absent   => 150,     # number of distractor absent
  :beta_0     => 650,     # Intercept
  :beta_1     => 30,      # effect of category
  :tau_0      => 80,      # by-subject random intercept sd
  :tau_1      => 15,      # by-subject random slope sd
  :rho        => 0.35,    # correlation between intercept and slope
  :sigma      => 175      # residual (standard deviation)
)

select!(params,Not("rep"))

alpha = .05

sims = vcat([single_run(;r...) for r in Tables.rowtable(params)]...)

@chain sims begin
  @select(:Name, $"Pr(>|z|)")
  @transform :p = $"Pr(>|z|)" .< alpha
  groupby(:Name)
  @combine :power = mean(:p)
end
2×2 DataFrame
Row Name power
String Float64
1 (Intercept) 1.0
2 category: present 0.874

If we want to run our sensitivity analysis across a given parameter space, we’ll have to map the function single_run to generate data across this space, for example, over a varying number of participants:

filename1 = "sens_jl.csv"

nreps = 1000

params = allcombinations(
  DataFrame,
  :rep        => 1:nreps, # number of runs
  :n_subj     => 2:15,    # number of subjects
  :n_present  => 150,     # number of distractor present trials
  :n_absent   => 150,     # number of distractor absent
  :beta_0     => 650,     # Intercept
  :beta_1     => 30,      # effect of category
  :tau_0      => 80,      # by-subject random intercept sd
  :tau_1      => 15,      # by-subject random slope sd
  :rho        => 0.35,    # correlation between intercept and slope
  :sigma      => 175      # residual (standard deviation)
)

select!(params,Not("rep"))

alpha = 0.05

if !isfile(filename1)
  sims = vcat([single_run(filename1; r...) for r in Tables.rowtable(params)]...)
end

Note that the above could obviously also be run over other dimensions of our parameter space, e.g. for different estimates of the fixed effects, amount of noise, number of trials, etc. etc., by changing the params list. How did we do? Let’s take a look at our power curve.

sims1 = CSV.read("sens_jl.csv", DataFrame)

power1 = @chain sims1 begin
  @subset :Name .== "category: present"
  @transform :p = $"Pr(>|z|)" .< alpha
  groupby([:Name, :n_subj])
  @combine begin
  :mean_estimate = mean($"Coef.")
  :mean_se = mean($"Std. Error")
  :power = mean(:p)  
  end
end
14×5 DataFrame
Row Name n_subj mean_estimate mean_se power
String31 Int64 Float64 Float64 Float64
1 category: present 2 30.2635 17.9024 0.205
2 category: present 3 29.526 14.6008 0.356
3 category: present 4 29.9583 12.6264 0.5
4 category: present 5 30.1647 11.1628 0.647
5 category: present 6 29.9987 10.2055 0.743
6 category: present 7 30.1957 9.33679 0.823
7 category: present 8 29.9215 8.80461 0.874
8 category: present 9 30.7667 8.2574 0.924
9 category: present 10 30.071 7.75013 0.956
10 category: present 11 29.4491 7.43248 0.945
11 category: present 12 30.256 7.10302 0.971
12 category: present 13 29.7838 6.819 0.983
13 category: present 14 29.8746 6.62874 0.983
14 category: present 15 29.8489 6.42804 0.997
plot(
  power1,
  layer(
    x=:n_subj,
    y=:power,
    Geom.smooth,
    yintercept=[0.8],
    Geom.hline
  ),
  layer(
    x=:n_subj,
    y=:power,
    Geom.point
  )
)
n_subj 0 5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 6.55 6.60 6.65 6.70 6.75 6.80 6.85 6.90 6.95 7.00 7.05 7.10 7.15 7.20 7.25 7.30 7.35 7.40 7.45 7.50 7.55 7.60 7.65 7.70 7.75 7.80 7.85 7.90 7.95 8.00 8.05 8.10 8.15 8.20 8.25 8.30 8.35 8.40 8.45 8.50 8.55 8.60 8.65 8.70 8.75 8.80 8.85 8.90 8.95 9.00 9.05 9.10 9.15 9.20 9.25 9.30 9.35 9.40 9.45 9.50 9.55 9.60 9.65 9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25 10.30 10.35 10.40 10.45 10.50 10.55 10.60 10.65 10.70 10.75 10.80 10.85 10.90 10.95 11.00 11.05 11.10 11.15 11.20 11.25 11.30 11.35 11.40 11.45 11.50 11.55 11.60 11.65 11.70 11.75 11.80 11.85 11.90 11.95 12.00 12.05 12.10 12.15 12.20 12.25 12.30 12.35 12.40 12.45 12.50 12.55 12.60 12.65 12.70 12.75 12.80 12.85 12.90 12.95 13.00 13.05 13.10 13.15 13.20 13.25 13.30 13.35 13.40 13.45 13.50 13.55 13.60 13.65 13.70 13.75 13.80 13.85 13.90 13.95 14.00 14.05 14.10 14.15 14.20 14.25 14.30 14.35 14.40 14.45 14.50 14.55 14.60 14.65 14.70 14.75 14.80 14.85 14.90 14.95 15.00 0 20 15,0.997 14,0.983 13,0.983 12,0.971 11,0.945 10,0.956 9,0.924 8,0.874 7,0.823 6,0.743 5,0.647 4,0.5 3,0.356 2,0.205 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 0.0 0.5 1.0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 0.310 0.315 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360 0.365 0.370 0.375 0.380 0.385 0.390 0.395 0.400 0.405 0.410 0.415 0.420 0.425 0.430 0.435 0.440 0.445 0.450 0.455 0.460 0.465 0.470 0.475 0.480 0.485 0.490 0.495 0.500 0.505 0.510 0.515 0.520 0.525 0.530 0.535 0.540 0.545 0.550 0.555 0.560 0.565 0.570 0.575 0.580 0.585 0.590 0.595 0.600 0.605 0.610 0.615 0.620 0.625 0.630 0.635 0.640 0.645 0.650 0.655 0.660 0.665 0.670 0.675 0.680 0.685 0.690 0.695 0.700 0.705 0.710 0.715 0.720 0.725 0.730 0.735 0.740 0.745 0.750 0.755 0.760 0.765 0.770 0.775 0.780 0.785 0.790 0.795 0.800 0.805 0.810 0.815 0.820 0.825 0.830 0.835 0.840 0.845 0.850 0.855 0.860 0.865 0.870 0.875 0.880 0.885 0.890 0.895 0.900 0.905 0.910 0.915 0.920 0.925 0.930 0.935 0.940 0.945 0.950 0.955 0.960 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000 0 1 power

Our power analysis has determined that, with the parameters established above, we need ~8 or more participants to reliably detect an effect!

The code used above is specific to power analysis for mixed models, but the approach generalises to other methods too, of course! The above code can easily be wrangled to handle different model types (simply change the model definition in single_run and make sure to capture the right parameters), and even Bayesian approaches. (For a thorough example of doing power analysis with Bayesian methods and the awesome bayesian regression package brms, see Kurz, 2021.)

Even if the above code is spaghetti to you (Perhaps you prefer R? Or Python?), I hope you will take away a few things from this tutorial:

References

Adam, K. C. S., Patel, T., Rangan, N., & Serences, J. T. (2021). Classic Visual Search Effects in an Additional Singleton Task: An Open Dataset. 4(1), 34. https://doi.org/10.5334/joc.182
DeBruine, L. M., & Barr, D. J. (2020). Appendix 1c: Sensitivity Analysis. https://debruine.github.io/lmem_sim/articles/appendix1c_sensitivity.html.
DeBruine, L. M., & Barr, D. J. (2021). Understanding Mixed-Effects Models Through Data Simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119. https://doi.org/10.1177/2515245920965119
Kurz, A. S. (2021). Bayesian power analysis: Part I. Prepare to reject ‘\(H_0\)‘ with simulation. In A. Solomon Kurz. https://solomonkurz.netlify.app/blog/bayesian-power-analysis-part-i/.
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence Testing for Psychological Research: A Tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963