Comment by simianwords

Comment by simianwords 4 days ago

but llm's dont speak gibberish 10 times a day even now. from my usage, chatgpt has not said one obviously strange thing since o3 came out.

HEmanZ 4 days ago

What are you working on that they are so knowledgeable?Even the best models absolutely make stuff up, even to this day. I literally spend all day every day working with them (all latest ChatGPT models) and it’s still 10-15% BS.

I had ChatGPT 5.2 thinking straight up make up an api after I pasted the full api spec to it earlier today. And built its whole response around a public api that did not exist. And Claude cli with sonnet 4.5 made up the craziest reason why my curl command wasn’t working (that curl itself was bugged, not the obvious it can’t resolve the dn it tried to use) and almost went down a path of installing a bunch of garbage tools.

These are not ready to be unsupervised. Yet.

Reply View 4 replies

falkensmaize 4 days ago

Just today I had Claude Opus 4.5 try to write to a fictional Mac user account on my computer during a coding session. It was pretty weird - the name was very specific and unique enough that it was clear it was likely bleed through from training data. It wasn’t like “John Smith” or something.
That’s the kind of thing that on a large scale could be catastrophic.

Reply View | 0 replies
simianwords 4 days ago

for coding, if you have not hooked up your workflow to a test -> code feedback loop, then you are doing it incorrectly. i agree it doesn't get things right all the time but this loop is important to correct it.
for other things like normal question answering in the chatgpt window, it hasn't really said anything incorrect.. very very few instances.

Reply View | 0 replies
HEmanZ 4 days ago

But maybe your point is that it isn’t gibberish, it’s “seems correct but isn’t” which is honestly more dangerous

Reply View | 1 reply
- simianwords 4 days ago
  
  you are incorrect. "seems correct but isn't" is fine as long as the other times it is accurate at high enough levels.
  "seems correct but isn't" is like the most common mode of humans getting things wrong.
  
  Reply View | 0 replies