Comment by euleriancon

Comment by euleriancon 10 hours ago

Diffusion LMs do seem to be able to get more out of the same data. In a world where we are already training transformer based LLMs on all text available, diffusion LMs ability to continue learning on a fixed set of data may be able to outperform transformers

https://arxiv.org/abs/2511.03276

nbardy 8 hours ago

There’s another paper that shows you can get the same effect by training auto regression on Fill in the middle data.

So it’s more about the mask modeling objective than Diffusion.

Reply View 2 replies

albertzeyer 3 hours ago

Which paper is that?

Reply View | 0 replies
[removed] 6 hours ago

[deleted]

Reply View | 0 replies