Comment by sarpdag

Comment by sarpdag 5 days ago

I really like multi armed bandit approach, but struggles with common scenarios involving delayed rewards or multiple success criteria, such as testing ecommerce search with number of orders and GMV guardrails.

For simple, immediate-feedback cases like button clicks, the specific implementation becomes less critical.

orasis 2 days ago

It’s best for immediate rewards. If you have delayed rewards there is a paper on sampling from the “delay distribution” that solves this.

Reply View 0 replies