lern_too_spel 6 months ago

If you mod by anything other than a power of two, it won't be. https://lemire.me/blog/2019/06/06/nearly-divisionless-random...

  • np_tedious 6 months ago

    That article is mostly about speed. The following seems like the one thing that might be relevant:

    > Naively, you could take the random integer and compute the remainder of the division by the size of the interval. It works because the remainder of the division by D is always smaller than D. Yet it introduces a statistical bias

    That's all it says. Is the point here just that 2^31 % 17 is not zero, so 1,2,3 are potentially happening slightly more than 15,16? If so, this is not terribly important

    • lern_too_spel 6 months ago

      > If so, this is not terribly important

      It is not uniformly random, which is the whole point.

      > That article is mostly about speed

      The article is about how to actually achieve uniform random at high speed. Just doing mod is faster but does not satisfy the uniform random requirement.

      • np_tedious 6 months ago

        If your number of AB testing combos cohorts is fewer then 100 then yeah this passes for being uniform

        • lern_too_spel 5 months ago

          It doesn't, mathematically. It might be good enough for some cases, but it is not good enough for cases that actually require uniformity.

s1mplicissimus 6 months ago

additional to the other excellent comments they will become non-uniform once you start deleting records. that will break all hopes you might have had in modulo and percentages being reliable partitions because the "holes" in your ID space could be maximally bad for whatever usecase you thought up.