On The Use of Meta-Analysis Techniques for Multi-Lab Experiments

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
There has been a growing concern in different branches of science regarding the replicability of scientific findings. Recent research has shown that results from a large fraction of experimental studies are not able to be replicated by follow-up studies—a phenomenon known as the replicability crisis. In response, multi-lab studies have grown substantially in popularity, both to confirm that an original scientific claim can withstand rigorous follow up testing and to test the validity of published, controversial scientific findings. As a recent example of the latter kind, Many Labs 4 (Klein et al., 2019) was a large-scale, pre-registered study designed to replicate an experiment by Greenberg et al. (1994) testing a claim regarding terror management theory—a theory in Psychology that a person’s awareness of their own mortality will alter their behavior in a myriad of ways. In this replication study, the original experiment was repeated across 21 labs, with some labs even involving the original authors from Greenberg et al. (1994) in their experiment. Despite strict protocols being followed, the results from the original study were not able to be replicated. Chatard et al. (2020) rebutted against this finding, and claims that by imposing a strong restriction on the studies to be included in the analysis, and by performing a meta-analysis on this subset of studies, the original result is replicated. This rebuttal leads to several key questions. First, are meta-analytic techniques, which estimate an overall effect size by aggregating effect sizes across studies, as-or-more effective than, linear mixed models, which use individual responses rather than effect sizes, when analyzing data from multi-lab studies? Second, since both mixed models and meta-analysis approaches rely on estimating the variability in effect sizes across labs, and since estimation of this parameter is typically unreliable, are sensitivity analysis approaches—which consider a range of potential values of this between-lab variance—preferable when analyzing multi-lab data? Finally, is the finding from Many Labs 4 accurate, or does the rebuttal by Chatard et al. (2020) hold significant merit? We aim to find answers to all of these questions in this dissertation. First, we perform an extensive literature review on random-effects meta-analysis methods, and perform an extensive simulation to evaluate the best practices for performing meta-analysis when using individual participant data (IPD) instead of aggregate data from multi lab studies. We then compare the best performing meta-analysis estimators to those obtained from a mixed model. Overall, we consider a total of 5,760 combinations of estimators and multi-lab experimental settings for meta-analysis, and another 80 for mixed models. We find that the meta-analysis and linear mixed model approaches yield similar results, with meta-analysis methods performing slightly better. Additionally, we find that both methods for estimation suffer from the same significant pitfall—estimation of across-lab variability in treatment effects is often inconsistent and unreliable when the number of labs included in the study are small. Second, to overcome issues with estimation of across´lab variability, we develop a sensitivity analysis approach for determining the significance of effect size estimates for both meta-analysis and mixed-model estimators. These methods allow for researchers to consider a range of different values for the across´lab variance, and helps them determine for what values the estimate of the effect size is statistically significant. While effective, we find that these approaches can be misleading when the claimed or estimated across´lab variance is much lower than the actual across´lab variance, which can be possible when sample sizes and/or the number of labs under study is small. Finally, we apply our methods to re-analyze the data from Many Labs 4 (Klein et al., 2019). Our analysis corroborates the original finding of (Klein et al., 2019) that the original experiment is unable to be replicated with any kind of consistency. We also identify issues with the analysis in the rebuttal by Chatard et al. (2020) that would lead them to obtain inaccurate results.
Many Labs 4, Meta-analysis, Mixed model, Sensitive analysis