Comment by zdc1
And for better or worse it feels like the errors are being "pushed down" into smaller, more subtle spaces.
I asked ChatGPT a question about a made up character in a made up work and it came back with "I don’t actually have a reliable answer for that". Perfect.
On the other hand, I can ask it about varnishing a piece of wood and it will give a lovely table with options, tradeoffs, and Good/Ok/Bad ratings for each option, except the ratings can be a little off the mark. Same thing when asking what thickness cable is required to carry 15A in AU electrical work. Depending on the journey and line of questioning, you would either get 2.5mm^2 or 4mm^2.
Not wrong enough to kill someone, but wrong enough that you're forced to use it as a research tool rather than a trusted expert/guru.
I asked ChatGPT, Gemini, Grok and DeepSeek to tell me about a contemporary Scottish indie band that hasn’t had a lot of press coverage. ChatGPT, Gemini and Grok all gave good answers based on the small amount of press coverage they have had.
DeepSeek however hallucinated a completely fictional band from 30 years ago, right down to album names, a hard luck story about how they’d been shafted by the industry (and by whom), made up names of the members and even their supposed subsequent collaborations with contemporary pop artists.
I asked if it was telling the truth or making it up and it doubled down quite aggressively on claiming it was telling the truth. The whole thing was very detailed and convincing yet complete and utter bollocks.
I understand the difference in the cost/parameters etc. but it was miles behind the other 3, in fact it wasn’t just behind it was hurtling in the opposite direction, while being incredibly plausible.