Comment by isaacimagine
Comment by isaacimagine a day ago
No mention of Decision Transformers or Trajectory Transformers? Both are offline approaches that tend to do very well at long-horizon tasks, as they bypass the credit assignment problem by virtue of having an attention mechanism.
Most RL researchers consider these approaches not to be "real RL", as they can't assign credit outside the context window, and therefore can't learn infinite-horizon tasks. With 1m+ context windows, perhaps this is less of an issue in practice? Curious to hear thoughts.
TFP cites decision transformers. Just using a transformer does not bypass the credit assignment problem. Transformers are an architecture for solving sequence modeling problems, e.g. the credit assignment problem as arises in RL. There have been many other such architectures.
The hardness of the credit assignment problem is a statement about data sparsity. Architecture choices do not "bypass" it.