top of page
Computer with Graph

Improving Statistical Misconceptions

Background:

When it comes to psychological research, use of p-values (or null hypothesis significance testing) continues to dominate the published literature. In light of high rates of misunderstandings, some have deemed these misconceptions "impervious to correction" [1], while others have advocated for use of alternative methods of inference, such as confidence intervals or Bayes Factors [2,3]. Little work, however, has explored whether these misconceptions can be systematically improved, and the potential benefits of instructional interventions.

Research Question:

Are statistical misconceptions truly 'impervious' to change? Or is it possible to improve how well individuals draw inferences from statistical indices (e.g., p-values)? How effective are instructional training materials at improving statistical literacy in the short-term (i.e. 1 week later) and the long-run (i.e. 8 weeks later)?

Current Study:

First, we measured baseline misconception rates amid online learners (N = 2,320) for p-value, confidence interval, and Bayes factor interpretations, and found support for improvements in accuracy over an 8-week massive open online course (MOOC). Next, we investigated the effects of additional instructional training on learning for users in the experimental group (n = 2,028) versus those in the control group (n = 2,133). 

Publication Preprint:

OSF Project Page:

Research Approach

Design:

  • Scale development and validation:
     

    • Through extensive pilot tests, we first developed a 14-item True/False scale to measure misconception rates for p-values, confidence intervals, and Bayes factors (ω = .78), implemented in phase 1 of the study.
       

    • In phase 2, this scale was refined to 9 True/False items, measuring specific p-value fallacies (ω = .76):

      • Inverse probability fallacy ("IP")

      • Replication fallacy ("R")

      • Effect size fallacy ("ES")

      • Clinical or practical significance fallacy ("CPS")

      • Correct p-value definition ("correct")
         

    • In both phase 1 and phase 2, two versions of the scales (alternate phrasings) were implemented, alternating between pretest and post-tests, and counterbalanced across participants. Participants were given "True", "False", and "I don't know" response options.​
      ​​

  • Scale implementation:
     

    • ​Longitudinal (phase 1)

    • A/B Testing (phase 2)

      • Individuals were randomly assigned to experimental or control conditions.

      • Experimental group received added instructional training (extra assignment) in week 1 of the MOOC.
         

    • In both phase 1 and phase 2, items were administered once at pretest (to establish baseline misconception rates), again at post-test 1 (to assess immediate learning), and a final time at post-test 2 (to assess retained learning). Phase 2 design is displayed below (see Fig 1).

Coursera_study2_design.PNG

Fig 1. Phase 2 Scale Implementation across the 8-week MOOC. 
Scale items were administered to online users in form of Pop Quizzes at three measurement periods (pretestpost-test 1, and post-test 2). Post-test 1 items were staggered across weeks 1 to 4 in order to occur immediately after the relevant learning module. Subset 1 items measured inverse probability (IP) and replication (R) fallacies. Subset 2 items measured effect size (ES) and clinical or practical significance (CPS) fallacies and the correct p-value definition. 'Lag' represents the time elapsed between any two measurement points.

Analyses:

  • Binary logistic regression analyses

  • Odds ratios

  • Linear mixed models (LMM)

Results

Fig 2. Baseline Accuracy Rates. 
Barplots display phase 1 accuracy rates at pretest for each of the p-value fallacies and statistical concepts (i.e. confidence intervals and Bayes factors). Accuracy is computed in two ways to account for "I don't know" responses: coded as incorrect (striped bars) vs. omitted (solid bars). Baseline rates corroborated past work [4]. Users also systematically scored better on statements about non-significant p-values as compared to significant p-values.  

Coursera_study1_baselineresults.PNG

Fig 3. Rates of Improvement. 
Graphs display mean improvement rates:  Accuracy (i.e. proportion of learners who scored correct; y-axis) is plotted across the 8 weeks of time (x-axis). Solid lines represent immediate learning. Dashed lines represent retained learning.

​

Visual comparison between TOP plot (control group) and BOTTOM plot (experimental group) demonstrates the benefits of added instructional training on improvements in learning. Steeper rates of improvement, for those who received added training in week 1, are especially evident from pretest to post-test 1 (significant interaction between time and condition). Findings lend support for the efficacy of instructional interventions, and the highlight the value in explicitly clarifying statistical misconceptions.

Coursera_study2_exptresults.PNG

Key Insights

  • It is possible to effectively improve statistical misconceptions.

  • Online instructional platforms can serve as a tool to elicit positive short-term and long-term effects (i.e. immediate and retained learning).

  • When teaching statistical concepts, there is value in explicitly clarifying specific fallacies.

  • Emphasis should be made both how and how not to draw inferences from statistical data.
     

  • Practical recommendation: Learners should undergo regular training that directly addresses sources of misunderstanding. 

White Room

References

  1. Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research, 7(1), 1-20.

  2. Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance.

  3. Wagenmakers, E. J., Wetzels, R., Borsboom, D., & Van Der Maas, H. L. (2011). Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011).

  4. Badenes-Ribera, L., Frías-Navarro, D., Monterde-i-Bort, H., & Pascual-Soler, M. (2015). Interpretation of the p value: A national survey study in academic psychologists from Spain. Psicothema, 27(3), 290-295. https://doi.org/10.7334/psicothema2014.283

© 2022 by Arianne Herrera-Bennett, PhD

bottom of page