Comment by svachalek
This is an interesting paper, it postulates that the ability of an LLM to perform tasks correlates mostly to the number of layers it has, and that reasoning creates virtual layers in the context space. https://arxiv.org/abs/2412.02975