Comment by cle
This is a double-edged sword. There are often cases in real-world systems where the "reward" the MAB maximizes is biased by eligibility issues, system caching, bugs, etc. If this happens, your MAB has the potential to converge on the worst possible experience for your users, something a static treatment allocation won't do.
I haven’t seen these particular shortcomings before, but I certainly agree that if your data is bad, this ML approach will also be bad.
Can you share some more details about your experiences with those particular types of failures?