Comment by tsoukase

Comment by tsoukase 4 days ago

8 replies

If an LLM hallucinates in 1% of occasions and gives subpar output in 5%, this kills his effectiveness to replace anyone. Imagine a support guy on the other side of the phone to speak gibberish 10 times a day. Now, imagine a doctor. These will never lose their jobs.

Bratmon 4 days ago

> Imagine a support guy on the other side of the phone to speak gibberish 10 times a day.

A massive improvement?

simianwords 4 days ago

but llm's dont speak gibberish 10 times a day even now. from my usage, chatgpt has not said one obviously strange thing since o3 came out.

  • HEmanZ 4 days ago

    What are you working on that they are so knowledgeable?Even the best models absolutely make stuff up, even to this day. I literally spend all day every day working with them (all latest ChatGPT models) and it’s still 10-15% BS.

    I had ChatGPT 5.2 thinking straight up make up an api after I pasted the full api spec to it earlier today. And built its whole response around a public api that did not exist. And Claude cli with sonnet 4.5 made up the craziest reason why my curl command wasn’t working (that curl itself was bugged, not the obvious it can’t resolve the dn it tried to use) and almost went down a path of installing a bunch of garbage tools.

    These are not ready to be unsupervised. Yet.

    • falkensmaize 4 days ago

      Just today I had Claude Opus 4.5 try to write to a fictional Mac user account on my computer during a coding session. It was pretty weird - the name was very specific and unique enough that it was clear it was likely bleed through from training data. It wasn’t like “John Smith” or something.

      That’s the kind of thing that on a large scale could be catastrophic.

    • simianwords 4 days ago

      for coding, if you have not hooked up your workflow to a test -> code feedback loop, then you are doing it incorrectly. i agree it doesn't get things right all the time but this loop is important to correct it.

      for other things like normal question answering in the chatgpt window, it hasn't really said anything incorrect.. very very few instances.

    • HEmanZ 4 days ago

      But maybe your point is that it isn’t gibberish, it’s “seems correct but isn’t” which is honestly more dangerous

      • simianwords 4 days ago

        you are incorrect. "seems correct but isn't" is fine as long as the other times it is accurate at high enough levels.

        "seems correct but isn't" is like the most common mode of humans getting things wrong.