Comment by isoprophlex

Comment by isoprophlex 6 months ago

As one of the comments below the article states, the probabilistic alternative to epsilon-greedy is worth exploring ad well. Take the "bayesian bandit", which is not much more complex but a lot more powerful.

If you crave more bandits: https://jamesrledoux.com/algorithms/bandit-algorithms-epsilo...

timr 6 months ago

Just a warning to those people who are potentially implementing it: it doesn't really matter. The blog author addresses this, obliquely (says that the simplest thing is best most of the time), but doesn't make it explicit.

In my experience, obsessing on the best decision strategy is the biggest honeypot for engineers implementing MAB. Epsilon-greedy is very easy to implement and you probably don't need anything more. Thompson sampling is a pain in the butt, for not much gain.

Reply View 9 replies

blagie 6 months ago

"Easy to implement" is a good reason to use bubble sort too.
In a normal universe, you just import a different library, so both are the same amount of work to implement.
Multiarmed bandit seems theoretically pretty, but it's rarely worth it. The complexity isn't the numerical algorithm but state management.
* Most AB tests can be as simple as a client-side random() and a log file.
* Multiarmed bandit means you need an immediate feedback loop, which involves things like adding database columns, worrying about performance (since each render requires another database read), etc. Keep in mind the database needs to now store AB test outcomes and use those for decision-making, and computing those is sometimes nontrivial (if it's anything beyond a click-through).
* Long-term outcomes matter more than short-term. "Did we retain a customer" is more important than "did we close one sale."
In most systems, the benefits aren't worth the complexity. Multiple AB tests also add testing complexity. You want to test three layouts? And three user flows? Now, you have nine cases which need to be tested. Add two color schemes? 18 cases. Add 3 font options? 54 cases. The exponential growth in testing is not fun. Fire-and-forget seems great, but in practice, it's fire-and-maintain-exponential complexity.
And those conversion differences are usually small enough that being on the wrong side of a single AB test isn't expensive.
Run the test. Analyze the data. Pick the outcome. Kill the other code path. Perhaps re-analyze the data a year later with different, longer-term metrics. Repeat. That's the right level of complexity most of the time.
If you step up to multiarm, importing a different library ain't bad.

Reply View | 7 replies
- ted_dunning 6 months ago
  
  Multi-armed bandit approaches do not imply an immediate feedback loop. They do the best you can do with delayed feedback or with episodic adjustment as well.
  So if you are doing A/B tests, it is quite reasonable to use Thompson sampling at fixed intervals to adjust the proportions. If your response variable is not time invariant, this is actually best practice.
  
  Reply View | 1 reply
  
  orasis 6 months ago
  
  Having significant experience with bandits in production, I strongly recommend only using them for immediate feedback. If the rewards are at all disconnected from the action you likely won’t be happy with the results.
  
  Reply View | 0 replies
- bartread 6 months ago
  
  Sorry but bubble sort is a terrible example here. You implement a more difficult sorting algorithm, like quicksort, because the benefits of doing so, versus using bubble sort, are in many cases huge. I.e., the juice is worth the squeeze.
  Whereas the comment you’re responding to is rightly pointing out that for most orgs, the marginal gains of using an approach more complex than Epsilon greedy probably aren’t worth it. I.e., the juice isn’t worth the squeeze.
  
  Reply View | 2 replies
  
  blagie 6 months ago
  
  You can use FFT, if you prefer. There's no reason to not use optimized numerical code, since it's just a different import.
  The difference in performance is smaller and the difference in complexity is much greater. Optimized FFTs are... hairy. But now that someone wrote them, free.
  
  Reply View | 0 replies
  
  zeroCalories 6 months ago
  
  Bubble sort is a great example because it can out perform quicksort if the input is small.
  
  Reply View | 0 replies
- timr 6 months ago
  
  You've either missed the point of what I wrote, or you're arguing with someone else.
  I'm talking about the difference between epsilon-greedy vs. a more complex optimization scheme within the context of implementing MAB. You're making arguments about A/B testing vs MAB.
  
  Reply View | 1 reply
  
  blagie 6 months ago
  
  One of us definitely misread something.
  Let me spell out what I wrote for you:
  * In most systems, the cost of implementing MAB >> AB
  * The cost if implementing a more complex optimization scheme is only marginally higher than the cost of MAB
  Therefore, if you're already going up to MAB, you might as well go all the way.
  
  Reply View | 0 replies
orasis 6 months ago

Thompson Sampling is trivial to implement, especially with binary rewards. ChatGPT can do it reliably from scratch.

Reply View | 0 replies

hruk 6 months ago

We've been happy using Thompson sampling in production with this library https://github.com/bayesianbandits/bayesianbandits

Reply View 0 replies

krackers 6 months ago

There's a good derivation of EXP3 algorithm from standard multiplicative weights which is fairly intuitive. The transformation between the two is explained a bit in https://nerva.cs.uni-bonn.de/lib/exe/fetch.php/teaching/ws18.... Once you have the intuition, then the actual choice of parameters is just cranking out the math

Reply View 0 replies