Opinion: Never use the word ‘significant’ in a scientific paper

LETTER TO THE EDITOR

Opinion: Never use the word ‘significant’ in a scientific paper

Published: 25 September 2014

Advances in Regenerative Biology 2014. © 2014 Harvey Motulsky. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 Unported (CC BY 4.0) License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.

Citation: Advances in Regenerative Biology 2014, 1: 25155 - http://dx.doi.org/10.3402/arb.v1.25155

 

The word ‘significant’ is widely used in scientific papers. A PubMed search for ‘significant’ in June 2014 yielded more than two million results! But the word ‘significant’ has several meanings, and so is often misunderstood. There is a simple way to avoid this confusion: Avoid the word ‘significant’ and use alternative wording.

Case 1. ‘Scientifically significant’ or ‘clinically significant’

In some cases, the term ‘significant’ is used in the sense of remarking that a difference or association is large enough to matter. Often, the phrases ‘clinically significant’ or ‘scientifically significant’ are used. But, it is too easy to mix up this use of the term with the statistical use of ‘significant’.

My recommendation to authors is to use other words to describe their opinion about whether or not a difference or effect is large enough to matter, and ideally to include some scientific or clinical context. Examples:

  1. The 83% decrease in disease incidence in the vaccinated group is huge, and likely to have a huge impact on public health.
  2. Six hours after the injection, the fasting insulin level was as high as typically seen in diabetics.
  3. The change in the numbers of beta-adrenergic receptors is trivial (<10%), so is unlikely to have any physiological impact.

Case 2. Comparing the fits of two models

There are several approaches to compare the fits of alternative models, but the most common method is to use the extra sum-of-squares F-test. If the P-value is larger than a preset threshold value, accept the simpler model. If the P-value is smaller than the threshold value, choose the more complex model. The word ‘significant’ is often used to explain this comparison in a phrase such as ‘the two-phase model fit the data significantly better than the one-phase model’. But that word implies (to some) that the difference between the fits of the two models is substantial, or that the two models make vastly different predictions. Not so. With many data points, a model may fit ‘significantly better’ than the alternative model even with a tiny difference in goodness-of-fit (usually assessed by the sum of squared residuals) of the two models.

My recommendation is to simply state what you did without using the term ‘significant’. Examples:

  1. We fit both a one-phase and a two-phase model to the data and compare the goodness-of -fit (sum-of-squares) using the extra sum of squares F-test. The P-value (0.0342) is less than our preset threshold of 0.05, so we used the two-phase model as a basis for interpreting our data and designing future experiments.
  2. The two-phase model fits the data only slightly better than did the one-phase model (the sum-of-squares was about 10% smaller). The P-value from the extra sum of squares F-test was large (0.65), so we used the simpler one-phase model when interpreting the data.

Case 3. Comparing treatment groups

When comparing two treatments or two genotypes, scientists often compute a P-value, and then report whether or not the difference is statistically significant. This leads to many misunderstandings.

The whole point of statistical hypothesis testing and deciding upon ‘significance’ is to help you make a decision based on a single finding. This is helpful, for example, in choosing between models (see above), in quality control, and in some clinical studies. In these cases, you can show the P-value, the preset threshold value, and state the decision you made based on whether the P-value is greater or less than the threshold. The word ‘significant’ is not needed or helpful.

In basic science, it is rare to base a decision on one finding, so the whole concept of statistical hypothesis testing is not helpful, and can be misleading. A treatment effect can be statistically significant yet scientifically trivial (because of large sample size and/or little variability). And, another treatment effect can be scientifically huge but not statistically significant (because of small sample size and/or a lot of variability).

When comparing treatments, I suggest that you avoid the word significant, and instead report the important result (usually a difference, ratio, or correlation coefficient) along with its confidence interval that quantifies how precisely you know that value. You may wish to also report a P-value. There is no need to comment about whether the P-value is greater or less than a threshold, and no need to use the misleading term significant. Instead comment on the size of the effect and its precision in a scientific context. Examples:

  1. Renal denervation lowered the mean blood pressure by 15.6 mm Hg with a 95% confidence interval ranging from 12.1 to 19.0 mm Hg. Because the drop in blood pressure is substantial with a reasonably narrow confidence interval, we conclude that renal denervation worked as a therapy for hypertension.
  2. Pretreatment with steroids shifted the EC50 of the agonist by a factor of 2.5 (95% CI: 2.1 to 2.9; P=0.0023). A two to threefold shift is large enough to anticipate a substantial physiological effect.
  3. Among six rats, the correlation between thyroid hormone concentration and T cell count was substantial (r=0.8335; P=0.0393). However, because the 95% confidence interval is very wide (0.0677–0.9813), we can only conclude with 95% confidence that there is a positive correlation, but we do not have any evidence about how strong the correlation really is.

Discussion

The often misunderstood word significant is not needed to clearly present scientific conclusions. I suggest you never use that word.

Harvey Motulsky
CEO and Founder, GraphPad Software, Inc.
La Jolla, USA
hmotulsky@graphpad.com

About The Author

Harvey Motulsky

United States