Comment by amluto

Comment by amluto 19 hours ago

The best part when a “thinking” model carefully thinks and then says something that is obviously illogical, when the model clearly has both the knowledge and context to know it’s wrong. And then you ask it to double check and you give it a tiny hint about how it’s wrong, and it profusely apologizes, compliments you on your wisdom, and then says something else dumb.

I fully believe that LLMs encode enormous amounts of knowledge (some of which is even correct, and much of which their operator does not personally possess), are capable of working quickly and ingesting large amounts of data and working quickly, and have essentially no judgment or particularly strong intelligence of the non-memorized sort. This can still be very valuable!

Maybe this will change over the next few years, and maybe it won’t. I’m not at all convinced that scraping the bottom of the barrel for more billions and trillions of low-quality training tokens will help much.

dimitri-vs 14 hours ago

I feel like one coding benchmark should be just telling it to double check or fix something that's actually perfectly fine repeatedly and watch how bad it deep fries your code base.

Reply View 0 replies

brookst 17 hours ago

They key difference between that and humans, if course, is that most humans will double down on their error and insist that your correction is wrong, throwing a kitchen sink of appeals to authority, motte/bailey, and other rhetorical techniques at you.

Reply View 1 reply

TheOtherHobbes 16 hours ago

That's not any different in practice to the LLM "apologising" to placate you and then making a similar mistake again.
It's not even a different strategy. It's just using rhetoric in a more limited way, and without human emotion.
These are style over substance machines. Their cognitive abilities are extremely ragged and unreliable - sometimes brilliant, sometimes useless, sometimes wrong.
But we give them the benefit of the doubt because they hide behind grammatically correct sentences that appear to make sense, and we're primed to assume that language = sentience = intelligence.

Reply View | 0 replies

NetRunnerSu 18 hours ago

True "interruption" requires continuous learning, and the current model is essentially a dead frog, and frozen weights cannot be truly grounded in real time.

https://news.ycombinator.com/item?id=44488126

Reply View 0 replies