Comment by tjbai

From a purely technical definition of bias (difference in expected value of the estimator and the true value), MAB is not biased because "changing the experiment parameters" is just dynamically allocating a different sample size to each of the estimators, so the estimator still converges to the correct value.

You are correct that this setup can potentially mislead you, but this is because you might end up getting estimators with high variance. So, you might mistakenly see some early promising results for experiment group A and greedily assign all the requests to that group, even though it is not guaranteed that A is actually better than B.

This is the famous exploration-exploitation dilemma—should you maximize conversions by diverting everyone to group A or still try to collect more data from group B?