Comment by ertdfgcvb
Isn't that the point of testing (to not maximize reward but rather wait and collect data)? It sounds like maximizing reward during the experiment period can bias the results
Isn't that the point of testing (to not maximize reward but rather wait and collect data)? It sounds like maximizing reward during the experiment period can bias the results
The great thing is that you can do both.