timr 6 days ago

Just a warning to those people who are potentially implementing it: it doesn't really matter. The blog author addresses this, obliquely (says that the simplest thing is best most of the time), but doesn't make it explicit.

In my experience, obsessing on the best decision strategy is the biggest honeypot for engineers implementing MAB. Epsilon-greedy is very easy to implement and you probably don't need anything more. Thompson sampling is a pain in the butt, for not much gain.

  • blagie 6 days ago

    "Easy to implement" is a good reason to use bubble sort too.

    In a normal universe, you just import a different library, so both are the same amount of work to implement.

    Multiarmed bandit seems theoretically pretty, but it's rarely worth it. The complexity isn't the numerical algorithm but state management.

    * Most AB tests can be as simple as a client-side random() and a log file.

    * Multiarmed bandit means you need an immediate feedback loop, which involves things like adding database columns, worrying about performance (since each render requires another database read), etc. Keep in mind the database needs to now store AB test outcomes and use those for decision-making, and computing those is sometimes nontrivial (if it's anything beyond a click-through).

    * Long-term outcomes matter more than short-term. "Did we retain a customer" is more important than "did we close one sale."

    In most systems, the benefits aren't worth the complexity. Multiple AB tests also add testing complexity. You want to test three layouts? And three user flows? Now, you have nine cases which need to be tested. Add two color schemes? 18 cases. Add 3 font options? 54 cases. The exponential growth in testing is not fun. Fire-and-forget seems great, but in practice, it's fire-and-maintain-exponential complexity.

    And those conversion differences are usually small enough that being on the wrong side of a single AB test isn't expensive.

    Run the test. Analyze the data. Pick the outcome. Kill the other code path. Perhaps re-analyze the data a year later with different, longer-term metrics. Repeat. That's the right level of complexity most of the time.

    If you step up to multiarm, importing a different library ain't bad.

    • ted_dunning 5 days ago

      Multi-armed bandit approaches do not imply an immediate feedback loop. They do the best you can do with delayed feedback or with episodic adjustment as well.

      So if you are doing A/B tests, it is quite reasonable to use Thompson sampling at fixed intervals to adjust the proportions. If your response variable is not time invariant, this is actually best practice.

      • orasis 2 days ago

        Having significant experience with bandits in production, I strongly recommend only using them for immediate feedback. If the rewards are at all disconnected from the action you likely won’t be happy with the results.

    • bartread 5 days ago

      Sorry but bubble sort is a terrible example here. You implement a more difficult sorting algorithm, like quicksort, because the benefits of doing so, versus using bubble sort, are in many cases huge. I.e., the juice is worth the squeeze.

      Whereas the comment you’re responding to is rightly pointing out that for most orgs, the marginal gains of using an approach more complex than Epsilon greedy probably aren’t worth it. I.e., the juice isn’t worth the squeeze.

      • blagie 5 days ago

        You can use FFT, if you prefer. There's no reason to not use optimized numerical code, since it's just a different import.

        The difference in performance is smaller and the difference in complexity is much greater. Optimized FFTs are... hairy. But now that someone wrote them, free.

      • zeroCalories 5 days ago

        Bubble sort is a great example because it can out perform quicksort if the input is small.

    • timr 5 days ago

      You've either missed the point of what I wrote, or you're arguing with someone else.

      I'm talking about the difference between epsilon-greedy vs. a more complex optimization scheme within the context of implementing MAB. You're making arguments about A/B testing vs MAB.

  • orasis 2 days ago

    Thompson Sampling is trivial to implement, especially with binary rewards. ChatGPT can do it reliably from scratch.