Comment by omneity

Two phenomena at play, correct spellings tend to be the most common on aggregate in a large enough dataset so there’s a bias, and the finetuning step (Instruct SFT) helps the model hone down on what it should use from the set of all possible formulations it saw in pretraining.

This is why LLMs can still channel typos or non/standard writing when you ask them to write in such a style for example.