Arbitrary P-value for binary statements?

Binary null-hypothesis significance testing with an arbitrary P-value cutoff still is the standard way to report scientific findings due to missing alternatives. Muff et al. 2022 suggest in an #opinion in Trends in Ecology & Evolution a different language of evidence that allows for a more nuanced approach, previously been presented in medical statistics, and consistent with practices in international research networks, like the Intergovernmental Panel on Climate Change.

We all know the P-value Definition: The probability of observing a specific data summary (e.g., an average) that is at least as extreme as the one observed, given that the null hypothesis (H0) is correct.

We rely on P-values and Null Hypothesis Significant Testing (NHST) to assess the statistical significance of effects, mistaking statistical significance for scientific importance.

The issues arise from the way the P-value is used in NHST to make binary decisions (significant versus nonsignificant, there is an effect versus there is no effect) based on a sharp, arbitrary cutoff, typically P= 0.05 (though recent arguments speak for lower limits)

The proposal

  • eliminate binary decision-making from most scientific papers
  • regard P-values as continuous measures of statistical evidence
  • the notion of evidence is more appropriate than significance. Most prominently, the idea of (accumulated) evidence is the central concept behind meta-analyses.
  • existing effect sizes and standard errors are integrated into a new estimator with a respective uncertainty
  • attempt to interpret the biological meaning and implications of their quantitative findings

Next questions

  • Is NHST still useful? NHST and similar methods of asymmetric binary decision-making might be justified when decisions are needed based on only one or a few studies. Still, the underlying studies must fulfil very high-quality standards, etc.
  • How do we make decisions based on evidence? First, we are required to conduct meta-analyses. The responsibility for making decisions may then be returned to the ‘practical decision makers’.
  • How will we measure scientific success in the future? When publishing ‘significant’ results disappear, and science is instead seen as a joint effort to accumulate knowledge?
  • How do we break our habits?

Source: Rewriting results sections in the language of evidence, Muff et al. 2022

Leave a comment