Comment by highd

Comment by highd a day ago

TFP cites decision transformers. Just using a transformer does not bypass the credit assignment problem. Transformers are an architecture for solving sequence modeling problems, e.g. the credit assignment problem as arises in RL. There have been many other such architectures.

The hardness of the credit assignment problem is a statement about data sparsity. Architecture choices do not "bypass" it.

isaacimagine a day ago

TFP: https://arxiv.org/abs/2506.04168

The DT citation [10] is used on a single line, in a paragraph listing prior work, as an "and more". Another paper that uses DTs [53] is also cited in a similar way. The authors do not test or discuss DTs.

> hardness of the credit assignment ... data sparsity.

That is true, but not the point I'm making. "Bypassing credit assignment", in the context of long-horizon task modeling, is a statement about using attention to allocate long-horizon reward without horizon-reducing discount, not architecture choice.

To expand: if I have an environment with a key that unlocks a door thousands of steps later, Q-Learning may not propagate the reward signal from opening the door to the moment of picking up the key, because of the discount of future reward terms over a long horizon. A decision transformer, however, can attend to the moment of picking up the key while opening the door, which bypasses the problem of establishing this long-horizon causal connection.

(Of course, attention cannot assign reward if the moment the key was picked up is beyond the extent of the context window.)

Reply View 3 replies

highd a day ago

You can do Q-Learning with a transformer. You simply define the state space as the observation sequence. This is in fact natural to do in partially observed settings. So your distinction does not make sense.

Reply View | 1 reply
- isaacimagine a day ago
  
  DT's reward-to-go vs. QL's Bellman incl. discount, not choice of architecture for policy. You could also do DTs with RNNs (though own problems w/ memory).
  Apologies if we're talking past one another.
  
  Reply View | 0 replies
[removed] a day ago

[deleted]

Reply View | 0 replies