Comment by impossiblefork

For programming it's okay, for maths it's almost okay. For things like stories and actually dealing with reality, the models aren't even close to okay.

I didn't understand how bad it was until this weekend when I sat down and tried GPT-5, first without the thinking mode and then with the thinking mode, and it misunderstood sentences, generated crazy things, lost track of everything-- completely beyond how bad I thought it could possibly be.

I've fiddled with stories because I saw that LLMs had trouble, but I did not understand that this was where we were in NLP. At first I couldn't even fully believe it because the things don't fail to follow instructions when you talk about programming.

This extends to analyzing discussions. It simply misunderstands what people say. If you try to do this kind of thing you will realise the degree to which these things are just sequence models, with no ability to think, with really short attention spans and no ability to operate in a context. I experimented with stories set in established contexts, and the model repeatedly generated things that were impossible in those contexts.

When you do this kind of thing their character as sequence models that do not really integrate things from different sequences becomes apparent.