Comment by chrsw
One vague definition I see tossed around a lot "something can replace almost any human knowledge/white collar worker".
What does that mean in concrete terms? I'm not sure. Many of these models can already pass bar exams but how many can be lawyers? Probably none. What's missing?
Tests designed for humans are not good at testing LLMs because the failure modes are different. Humans don’t have eidetic memories, so they can’t just ingest random facts and recall them at will. Memorizing the relevant facts and figures in a subject and recalling them on a test shows at least some sort of study and likely some sort of overall understanding of the subject. AI, not so much.
A driving test that shows a red octagon with white letters and asks what it means, is a good indicator whether a driver will know that out in the real world, regardless of if it is half covered by a tree, has graffiti on it, is hanging upside down, etc. We found that self driving cars can’t “generalize” like that and failed to recognize a stop sign when it was upside down (and had to be retrained). What is missing is a well functions "judgement" or "reasoning" part that can generalize knowledge, experience, and context. Some models can do something like that for a specific task, but nothing that works long term.