Comment by apophis-ren

Comment by apophis-ren a day ago

0 replies

It's mentioned in the article. But for really, really long-horizon tasks, it might be reasonable that you don't want to have a small discount factor.

For example, if you have really sparse rewards in a long-horizon task (say, a reward appears 1000 timesteps after the action), then a discount factor of even 0.99 won't help to capture that: 0.99 ^ 1000 ≈ 4e^-5.

Essentially, if your discount factor is too small for an environment, it will be near impossible to learn certain credit assignments.