Comment by tjbai

Comment by tjbai 5 days ago

6 replies

I agree that there's an exploration-exploitation tradeoff, but for what you specifically suggest wouldn't you presumably just normalize by sample size? You wouldn't allocate based off total conversions, but rather a percentage.

jvans 5 days ago

Imagine a scenario where option B does 10x better than option A during the morning hours but -2x worse the rest of the day. If you start the multi armed bandit in the morning it could converge to option B quickly and dominate the rest of the day even though it performs worse then.

Or in the above scenario option B performs a lot better than option A but only with the sale going, otherwise option B performs worse.

  • hinkley 5 days ago

    One of the problems we caught only once or twice: mobile versus desktop shifting with time of day, and what works on mobile may work worse than on desktop.

    We weren’t at the level of hacking our users, just looking at changes that affect response time and resource utilizations, and figuring out why a change actually seems to have made things worse instead of better. It’s easy for people to misread graphs. Especially if the graphs are using Lying with Statistics anti patterns.

sweezyjeezy 5 days ago

Yes but here's a exaggerated version - say were to sample for a week at 50/50 when the base conversion rate was at 4%, then we sample at 25/75 for a week with the base conversion rate bumped up to 8% due to a sale.

The average base rate for the first variant is 5.3%, the second is 6.4%. Generally the favoured variant's average will shift faster because we are sampling it more.

  • [removed] 5 days ago
    [deleted]
  • necovek 5 days ago

    Uhm, this still sounds like just bad math.

    While it's non-obvious this is the effect, anyone analyzing the results should be aware of it and should only compare weighted averages, or per distinct time periods.

    And therein is the largest problem with A/B testing: it's mostly done by people not understanding the math subtleties, thus they will misinterpret results in either direction.

    • sweezyjeezy 5 days ago

      Agreed, and articles like this don't help. That's the only point I was trying to make really.