Comment by iforgot22

Comment by iforgot22 6 days ago

7 replies

I don't like how this dismisses the old approach as "statistics are hard for most people to understand." This algo beats A/B testing in terms of maximizing how many visitors get the best feature. But is that really a big enough concern IRL that people are interested in optimizing it every time? Every little dynamic lever adds complexity to a system.

recursivecaveat 5 days ago

Indeed perhaps we should applaud people for choosing statistical tools that are relatively easy to use and interpret, rather than deride them for not stepping up to the lathe that they didn't really need and we admit has lots of sharp edges.

rerdavies 6 days ago

I think you missed the point. It's not about which visitors get the best feature. It's about how to get people to PUSH THE BUTTON!!!!! Which is kind of the opposite of the best feature. The goal is to make people do something they don't want to do.

Figuring out best features is a completely different problem.

  • iforgot22 6 days ago

    I didn't say it was the best for the user. Really the article misses this by comparing a new UI feature to a life-saving drug, but it doesn't matter. The point is, whatever metric you're targeting, do you use this algo or fixed group sizes?

randomcatuser 6 days ago

Yeah basically. The idea is that somehow this is the data-optimal way of determining which one is the best (rather than splitting your data 50/50 and wasting a lot of samples when you already know)

The caveats (perhaps not mentioned in the article) are: - Perhaps you have many metrics you need to track/analyze (CTR, conversion, rates on different metrics), so you can't strictly do bandit! - As someone mentioned below, sometimes the situation is dynamic (so having evenly sized groups helps with capturing this effect) - Maybe some other ones I can't think of?

But you can imagine this kind of auto-testing being useful... imagine AI continually pushes new variants, and it just continually learns which one is the best

  • cle 6 days ago

    It still misses the biggest challenge though--defining "best", and ensuring you're actually measuring it and not something else.

    It's useful as long as your definition is good enough and your measurements and randomizations aren't biased. Are you monitoring this over time to ensure that it continues to hold? If you don't, you risk your MAB converging on something very different from what you would consider "the best".

    When it converges on the right thing, it's better. When it converges on the wrong thing, it's worse. Which will it do? What's the magnitude of the upside vs downside?

    • desert_rue 5 days ago

      Are you saying that it may do something like improve click-the-button conversion but lead to less sales overall?

  • iforgot22 6 days ago

    Facebook or YouTube might already be using an algo like this or AI to push variants, but for each billion user product, there are probably thousands of smaller products that don't need something this automated.