Comment by HEmanZ
What are you working on that they are so knowledgeable?Even the best models absolutely make stuff up, even to this day. I literally spend all day every day working with them (all latest ChatGPT models) and it’s still 10-15% BS.
I had ChatGPT 5.2 thinking straight up make up an api after I pasted the full api spec to it earlier today. And built its whole response around a public api that did not exist. And Claude cli with sonnet 4.5 made up the craziest reason why my curl command wasn’t working (that curl itself was bugged, not the obvious it can’t resolve the dn it tried to use) and almost went down a path of installing a bunch of garbage tools.
These are not ready to be unsupervised. Yet.
Just today I had Claude Opus 4.5 try to write to a fictional Mac user account on my computer during a coding session. It was pretty weird - the name was very specific and unique enough that it was clear it was likely bleed through from training data. It wasn’t like “John Smith” or something.
That’s the kind of thing that on a large scale could be catastrophic.