Comment by simsla
Comment by simsla a day ago
This relates to one of my biggest pet peeves.
People interpret "statistically significant" to mean "notable"/"meaningful". I detected a difference, and statistics say that it matters. That's the wrong way to think about things.
Significance testing only tells you the probability that the measured difference is a "good measurement". With a certain degree of confidence, you can say "the difference exists as measured".
Whether the measured difference is significant in the sense of "meaningful" is a value judgement that we / stakeholders should impose on top of that, usually based on the magnitude of the measured difference, not the statistical significance.
It sounds obvious, but this is one of the most common fallacies I observe in industry and a lot of science.
For example: "This intervention causes an uplift in [metric] with p<0.001. High statistical significance! The uplift: 0.000001%." Meaningful? Probably not.
You're spot on that significant ≠ meaningful effect. But I'd push back slightly on the example. A very low p-value doesn't always imply a meaningful effect, but it's not independent of effect size either. A p-value comes from a test statistic that's basically:
(effect size) / (noise / sqrt(n))
Note that bigger test statistic means smaller p-value.
So very low p-values usually come from bigger effects or from very large sample sizes (n). That's why you can technically get p<0.001 with a microscopic effect, but only if you have astronomical sample sizes. In most empirical studies, though, p<0.001 does suggest the effect is going to be large because there are practical limits on the sample size.