Comment by bee_rider
It is the weekend, so let’s anthropomorphize.
The idea of sub-optimal play to increase learning is interesting. We can notice the human social phenomenon of being annoyed at players who play games “too mechanically” or boringly, and admiration of players (even in an overly studied game like chess) with an “intuitive” style.
I wonder how AI training strategies would change if the number of games they were allowed to play was fairly limited, to the handful of thousands of matches of a game that a person might play over the course of their lives. And perhaps if their “rank” was evaluated over the course of their training, like it is for humans.
Its simpler than that- if you always play what you believe is optimal, you will never explore strategies that may in fact perform better.