Comment by LPisGood
Comment by LPisGood 6 days ago
Multi-arm bandit does beat A/B testing in the sense that standard A/B testing does not seek to maximize reward during the testing period, MAB does. MAB also generalizes better to testing many things than A/B testing.
This is a double-edged sword. There are often cases in real-world systems where the "reward" the MAB maximizes is biased by eligibility issues, system caching, bugs, etc. If this happens, your MAB has the potential to converge on the worst possible experience for your users, something a static treatment allocation won't do.