Comment by ertdfgcvb

Comment by ertdfgcvb 3 months ago

1 reply

Isn't that the point of testing (to not maximize reward but rather wait and collect data)? It sounds like maximizing reward during the experiment period can bias the results