Comment by isoprophlex
Comment by isoprophlex 6 days ago
As one of the comments below the article states, the probabilistic alternative to epsilon-greedy is worth exploring ad well. Take the "bayesian bandit", which is not much more complex but a lot more powerful.
If you crave more bandits: https://jamesrledoux.com/algorithms/bandit-algorithms-epsilo...
Just a warning to those people who are potentially implementing it: it doesn't really matter. The blog author addresses this, obliquely (says that the simplest thing is best most of the time), but doesn't make it explicit.
In my experience, obsessing on the best decision strategy is the biggest honeypot for engineers implementing MAB. Epsilon-greedy is very easy to implement and you probably don't need anything more. Thompson sampling is a pain in the butt, for not much gain.