Comment by apophis-ren
Comment by apophis-ren a day ago
It's mentioned in the article. But for really, really long-horizon tasks, it might be reasonable that you don't want to have a small discount factor.
For example, if you have really sparse rewards in a long-horizon task (say, a reward appears 1000 timesteps after the action), then a discount factor of even 0.99 won't help to capture that: 0.99 ^ 1000 ≈ 4e^-5.
Essentially, if your discount factor is too small for an environment, it will be near impossible to learn certain credit assignments.