Comment by og

Terr_ 4 days ago

> If Alice had concluded that this occasional mistake NN calculator was 'not really performing algebra', then Bob would be well within his rights to ask Alice what on earth she was going on about.

No, your burden of proof here is totally bass-ackwards.

Bob's the one who asked for blind trust that his magical auto-learning black-box would be made to adhere to certain rules... but the rules and trust are broken. Bob's the one who has to start explaining the discrepancy, and whether the failure is (A) a fixable bug or (B) an unfixable limitation that can be reliably managed or (C) an unfixable problem with no good mitigation.

> It's not irrelevant, because this is an argument about whether the machine can be said to be reasoning or not.

Bringing up "b-b-but homo sapiens" is only "relevant" if you're equivocating the meaning of "reasoning", using it in a broad, philosophical, and kinda-unprovable sense.

In contrast, the "reasoning" we actually wish LLMs would do involves capabilities like algebra, syllogisms, deduction, and the CS-classic boolean satisfiability.

However the track-record of LLMs on such things is long and clear: They fake it, albeit impressively.

The LLM will finish the popular 2+2=_, and we're amazed, but when we twiddle the operands too far, it gives nonsense. It answers "All men are mortal. Socrates is a man. Therefore, Socrates is ______", but reword the situation enough and it breaks again.

Reply View 5 replies

og_kalu 4 days ago

>Bob's the one who asked for blind trust that his magical auto-learning black-box would be made to adhere to certain rules... but the rules and trust are broken.
This is the problem with analogies. Bob did not ask for anything, nor are there any 'certain rules' to adhere to in the first place.
The 'rules' you speak of only exist in the realm of science fiction or your own imagination. Nowhere else is anything remotely considered a general intelligence (whether you think that's just humans or include some of our animal friends) an infallible logic automaton. It literally does not exist. Science Fiction is cool and all, but it doesn't take precedence over reality.
>Bringing up "b-b-but homo sapiens" is only "relevant" if you're equivocating the meaning of "reasoning", using it in a broad, philosophical, and kinda-unprovable sense.
You mean the only sense that actually exists ? Yes. It's also not 'unprovable' in the sense I'm asking about. Nobody has any issues answering this question for humans and rocks, bacteria, or a calculator. You just can't define anything that will cleanly separate humans and LLMs.
>In contrast, the "reasoning" we actually wish LLMs would do involves capabilities like algebra, syllogisms, deduction, and the CS-classic boolean satisfiability.
Yeah, and they're capable of doing all of those things. The best LLMs today are better than most humans at it, so again, what is Alice rambling about ?
>The LLM will finish the popular 2+2=_, and we're amazed, but when we twiddle the operands too far, it gives nonsense.
Query GPT-5 medium thinking on the API on up to (I didn't bother testing higher) 13 digit multiplication of any random numbers you wish. Then watch it get it exactly right.
Weeks ago, I got Gemini 2.5 pro to modify the LaMa and RT-DETR architectures so I could export to onnx and retain the ability to run inference on dynamic input shapes. This was not a trivial exercise.
>It answers "All men are mortal. Socrates is a man. Therefore, Socrates is ______", but reword the situation enough and it breaks again.
Do you actual have an example of a reword SOTA models fail at ?

Reply View | 4 replies
- Terr_ 4 days ago
  
  > Query GPT-5 medium thinking on the API on up to (I didn't bother testing higher) 13 digit multiplication of any random numbers you wish. Then watch it get it exactly right.
  I'm not sure if "on the API" here means "the LLM and nothing else." This is important because it's easy to overestimate the algorithm when you give it credit for work it didn't actually do.
  In general, human developers have taken steps to make the LLM transcribe the text you entered into a classically-made program, such as a calculator app, python, or Wolfram Alpha. Without that, the LLM would have to use its (admittedly strong) powers of probabilistic fakery [0].
  Why does it matter? Suppose I claimed I had taught a chicken to do square roots. Suspicious, you peer behind the curtain, and find that the chicken was trained to see symbols on a big screen and peck the matching keys on pocket calculator. Wouldn't you call me a fraud for that?
  _____________
  Returning to the core argument:
  1. "Reasoning" that includes algebra, syllogisms, deduction, etc. involves certain processes for reaching an answer. Getting a "good" answer through another route (like an informed guess) is not equivalent.
  2. If an algorithm cannot do the algebra process, it is highly unlikely that it can do the others.
  3. If an algorithm has been caught faking the algebra process through other means, any "good" results for other forms of logic should be considered inherently suspect.
  4. LLMs are one of the algorithms in points 2 and 3.
  _____________
  [0] https://www.mindprison.cc/p/why-llms-dont-ask-for-calculator...
  
  Reply View | 3 replies
  
  og_kalu 3 days ago
  
  >I'm not sure if "on the API" here means "the LLM and nothing else." This is important because it's easy to overestimate the algorithm when you give it credit for work it didn't actually do.
  That's what I mean yes. There is no tool use for I what I mentioned.
  >1. "Reasoning" that includes algebra, syllogisms, deduction, etc. involves certain processes for reaching an answer. Getting a "good" answer through another route (like an informed guess) is not equivalent.
  Again if you cannot confirm that these 'certain processes' are present when humans do it but not when LLMs do it then your 'processes' might as well be made up.
  And unless you concede humans are also not performing 'true algebra' or 'true reasoning', then your position is not even logically consistent. You can't eat your cake and have it.
  
  Reply View | 2 replies

Comment by og_kalu