Comment by ertdfgcvb

Comment by ertdfgcvb 5 days ago

1 reply

Isn't that the point of testing (to not maximize reward but rather wait and collect data)? It sounds like maximizing reward during the experiment period can bias the results