kqr a day ago

This has been known ever since the beginning of frequentist hypothesis testing. Fisher warned us not to place too much emphasis on the p-value he asked us to calculate, specifically because it is mainly a measure of sample size, not clinical significance.

  • ants_everywhere a day ago

    Yes the whole thing has been a bit of a tragedy IMO. A minor tragedy all things considered, but still one nonetheless.

    One interesting thing to keep in mind is that Ronald Fisher did most of his work before the publication of Kolmogorov's probability axioms (1933). There's a real sense in which the statistics used in social sciences diverged from mathematics before the rise of modern statistics.

    So there's a lot of tradition going back to the 19th century that's misguided, wrong, or maybe just not best practice.

energy123 a day ago

It's not, that would be quite the misunderstanding of statistical power.

N being big means that small real effects can plausibly be detected as being statistically significant.

It doesn't mean that a larger proportion of measurements are falsely identified as being statistically significant. That will still occur at a 5% frequency or whatever your alpha value is, unless your null is misspecified.

  • ants_everywhere a day ago

    It's standard to set the null hypothesis to be a measure zero set (e.g. mu = 0 or mu1 = mu2). So the probability of the null hypothesis is 0 and the only question remaining is whether your measurement is good enough to detect that.

    But even though you know the measurement can't be exactly 0.000 (with infinitely many decimal places) a priori, you don't know if your measurement is any good a priori or whether you're measuring the right thing.

    • energy123 14 hours ago

      The probability is only zero a.s., it's not zero. That's a very big difference. And hypothesis tests aren't estimating the probability of the null being true, they're estimating the probability of rejecting the null if the null was true.

      • ants_everywhere 3 hours ago

        It's less of a big difference than it might seem, because it takes infinitely long to specify a real number to infinite precision. If you think about something like trying to tell if you hit the exact center of the bullseye, you eventually get down to the quantum mechanical scale and you find that the idea of an atom being in the exact center isn't even that well defined.

        In a finite or countable number of trials you won't see a measure zero event.

        > they're estimating the probability of rejecting the null if the null was true.

        Right, but the null hypothesis is usually false and so it's a weird thing to measure. It's a proxy for the real thing you want, which is the probability of your hypothesis being true given the data. These are just some of the reasons why many statisticians consider the tradition of null hypothesis testing to be a mistake.