Comment by bdcravens
Text being a challenge is a symptom of the bigger problem: most people have a hard time thinking spatially, and so struggle to communicate their ideas (and that's before you add on modeling vocabulary like "extrude", "chamfer", etc)
LLMs struggle because I think there's a lot of work to be done with translating colloquial speech. For example, someone might describe a creating a tube is fairly ambiguous language, even though they can see it in their head: "Draw a circle and go up 100mm, 5mm thick" as opposed to "Place a circle on the XY plane, offset the circle by 5mm, and extrude 100mm in the z-plane"
I don't get the text obsession beyond LLMs being immensely useful that you might as well use LLM for <insert tasks here>. I believe that some things live in text, some in variable size n-dimensional array, or in fixed set of parameters, and so on - I mean, our brains don't run on text alone.