Comment by euleriancon
Comment by euleriancon 10 hours ago
Diffusion LMs do seem to be able to get more out of the same data. In a world where we are already training transformer based LLMs on all text available, diffusion LMs ability to continue learning on a fixed set of data may be able to outperform transformers
There’s another paper that shows you can get the same effect by training auto regression on Fill in the middle data.
So it’s more about the mask modeling objective than Diffusion.