Comment by elchananHaas
Comment by elchananHaas 16 hours ago
This is correct. I will add that sampling from the distribution thereafter is equivalent to on policy learning.
Comment by elchananHaas 16 hours ago
This is correct. I will add that sampling from the distribution thereafter is equivalent to on policy learning.