Comment by LPisGood
> you can't have your cake and eat it too
I disagree. There is a vast array of literature on solving the MAB problem that may as well be grouped into a bin called “how to optimally strike a balance between having one’s cake and eating it too.”
The optimization techniques to solve MAB problem seek to optimize reward by giving the right balance of exploration and exploitation. In other words, these techniques attempt to determine the optimal way to strike a balance between exploring if another option is better and exploiting the option currently predicted to be best.
There is a strong reason this literature doesn’t start and end with: “just do A/B testing, there is no better approach”
I'm not talking about the literature -- I'm talking about the extremely simplistic and sub-optimal procedure described in the post.
If you want to get sophisticated, MAB properly done is essentially just A/B testing with optimal strategies for deciding when to end individual A/B tests, or balancing tests optimally for a limited number of trials. But again, it doesn't "beat" A/B testing -- it is A/B testing in that sense.
And that's what I mean. You can't magically increase your reward while simultaneously getting statistically significant results. Either your results are significant to a desired level or not, and there's no getting around the number of samples you need to achieve that.