Comment by blagie
"Easy to implement" is a good reason to use bubble sort too.
In a normal universe, you just import a different library, so both are the same amount of work to implement.
Multiarmed bandit seems theoretically pretty, but it's rarely worth it. The complexity isn't the numerical algorithm but state management.
* Most AB tests can be as simple as a client-side random() and a log file.
* Multiarmed bandit means you need an immediate feedback loop, which involves things like adding database columns, worrying about performance (since each render requires another database read), etc. Keep in mind the database needs to now store AB test outcomes and use those for decision-making, and computing those is sometimes nontrivial (if it's anything beyond a click-through).
* Long-term outcomes matter more than short-term. "Did we retain a customer" is more important than "did we close one sale."
In most systems, the benefits aren't worth the complexity. Multiple AB tests also add testing complexity. You want to test three layouts? And three user flows? Now, you have nine cases which need to be tested. Add two color schemes? 18 cases. Add 3 font options? 54 cases. The exponential growth in testing is not fun. Fire-and-forget seems great, but in practice, it's fire-and-maintain-exponential complexity.
And those conversion differences are usually small enough that being on the wrong side of a single AB test isn't expensive.
Run the test. Analyze the data. Pick the outcome. Kill the other code path. Perhaps re-analyze the data a year later with different, longer-term metrics. Repeat. That's the right level of complexity most of the time.
If you step up to multiarm, importing a different library ain't bad.
Multi-armed bandit approaches do not imply an immediate feedback loop. They do the best you can do with delayed feedback or with episodic adjustment as well.
So if you are doing A/B tests, it is quite reasonable to use Thompson sampling at fixed intervals to adjust the proportions. If your response variable is not time invariant, this is actually best practice.