Comment by MichaelDickens
Comment by MichaelDickens 5 days ago
# for each lever,
# calculate the expectation of reward.
# This is the number of trials of the lever divided by the total reward
# given by that lever.
# choose the lever with the greatest expectation of reward.
If I'm not mistaken, this pseudocode has a bug that will result in choosing the expected worst option rather than the expected best option. I believe it should read "total reward given by the lever divided by the number of trials of that lever".
Correct, that's why I don't trust reading code comments