Is it? By far the majority of code the LLMs are trained on is going to be from Git repositories. So the idea that stack overflow question and answer sections with buggy code is dominating the training sets seems unlikely. Perhaps I'm misunderstanding?
The questions it asks are usually domain specific and pertaining to the problem, like modeling or „where do I get this data from ideally“.