HEmanZ 4 days ago

What are you working on that they are so knowledgeable?Even the best models absolutely make stuff up, even to this day. I literally spend all day every day working with them (all latest ChatGPT models) and it’s still 10-15% BS.

I had ChatGPT 5.2 thinking straight up make up an api after I pasted the full api spec to it earlier today. And built its whole response around a public api that did not exist. And Claude cli with sonnet 4.5 made up the craziest reason why my curl command wasn’t working (that curl itself was bugged, not the obvious it can’t resolve the dn it tried to use) and almost went down a path of installing a bunch of garbage tools.

These are not ready to be unsupervised. Yet.

  • falkensmaize 4 days ago

    Just today I had Claude Opus 4.5 try to write to a fictional Mac user account on my computer during a coding session. It was pretty weird - the name was very specific and unique enough that it was clear it was likely bleed through from training data. It wasn’t like “John Smith” or something.

    That’s the kind of thing that on a large scale could be catastrophic.

  • simianwords 4 days ago

    for coding, if you have not hooked up your workflow to a test -> code feedback loop, then you are doing it incorrectly. i agree it doesn't get things right all the time but this loop is important to correct it.

    for other things like normal question answering in the chatgpt window, it hasn't really said anything incorrect.. very very few instances.

  • HEmanZ 4 days ago

    But maybe your point is that it isn’t gibberish, it’s “seems correct but isn’t” which is honestly more dangerous

    • simianwords 4 days ago

      you are incorrect. "seems correct but isn't" is fine as long as the other times it is accurate at high enough levels.

      "seems correct but isn't" is like the most common mode of humans getting things wrong.