Comment by alyxya

There's a difference between what a model is trained on and the inductive biases a model uses to generalize. It isn't as simple as saying training natively on everything. All existing models have certain things they generalize better and certain things they don't generalize due to their model architecture, and the architecture of world models I've seen don't seem as capable of universally generalizing as LLMs.