Comment by kuruczgy

Comment by kuruczgy 20 hours ago

20 replies

> but language models frequently outperform us in reasoning

what

99% of the time their reasoning is laughable. Or even if their reasoning is on the right track, they often just ignore it in the final answer, and do the stupid thing anyway.

Shorel an hour ago

Yes, if a LLM outperforms you, you have never reasoned in your life.

I will assume you passed high-school based on your looks and not on your abilities.

kubb 19 hours ago

There are 2 kinds of people. Those who are outperformed on their most common tasks by LLMs and those who aren’t.

  • avs733 17 hours ago

    there are also two kinds of people - those who are excited by that and those who are not.

    The result is a 2x2 matrix where several quadrants are deeply concerning to me.

    • brookst 17 hours ago

      There are also two kinds of people - those who are objective enough to tell when it happens and those who will never even see when they’re outperformed because of their cognitive biases.

      I give you a 2x2x2 matrix.

      • magicalhippo 12 hours ago

        > I give you a 2x2x2 matrix.

        That'd be a tensor, no?

      • kubb 16 hours ago

        Sure, but if a person can find an easier way to do their job, they’ll usually do it. Usually the bias is towards less energy expenditure.

        • brookst 16 hours ago

          For many people, yes. For people who have their identity invested in being the smartest person in the room, life is considerably harder.

      • avs733 17 hours ago

        I'm sure if we work hard enough we can add a meta-meta-cognition level. Cognition is just 2^n series of binary states right?

amluto 19 hours ago

The best part when a “thinking” model carefully thinks and then says something that is obviously illogical, when the model clearly has both the knowledge and context to know it’s wrong. And then you ask it to double check and you give it a tiny hint about how it’s wrong, and it profusely apologizes, compliments you on your wisdom, and then says something else dumb.

I fully believe that LLMs encode enormous amounts of knowledge (some of which is even correct, and much of which their operator does not personally possess), are capable of working quickly and ingesting large amounts of data and working quickly, and have essentially no judgment or particularly strong intelligence of the non-memorized sort. This can still be very valuable!

Maybe this will change over the next few years, and maybe it won’t. I’m not at all convinced that scraping the bottom of the barrel for more billions and trillions of low-quality training tokens will help much.

  • dimitri-vs 14 hours ago

    I feel like one coding benchmark should be just telling it to double check or fix something that's actually perfectly fine repeatedly and watch how bad it deep fries your code base.

  • brookst 17 hours ago

    They key difference between that and humans, if course, is that most humans will double down on their error and insist that your correction is wrong, throwing a kitchen sink of appeals to authority, motte/bailey, and other rhetorical techniques at you.

    • TheOtherHobbes 16 hours ago

      That's not any different in practice to the LLM "apologising" to placate you and then making a similar mistake again.

      It's not even a different strategy. It's just using rhetoric in a more limited way, and without human emotion.

      These are style over substance machines. Their cognitive abilities are extremely ragged and unreliable - sometimes brilliant, sometimes useless, sometimes wrong.

      But we give them the benefit of the doubt because they hide behind grammatically correct sentences that appear to make sense, and we're primed to assume that language = sentience = intelligence.

copypaper 14 hours ago

Yea I don't understand how people are "leaving it running overnight" to successfully implement features. There just seems to be a large disconnect between people who are all in on AI development and those who aren't. I have a suspicion that the former are using Python/JS and the features they are implementing are simple CRUD APIs while the latter are using more than simple systems/languages.

I think the problem is that despite feeding it all the context and having all the right MCPs agents hooked up, is that there isn't a human-in-loop. So it will just reason against itself causing these laughable stupid decisions. For simple boilerplate tasks this isn't a problem. But as soon as the scope is outside of a CRUD/boilerplate problem, the whole thing crumbles.

  • physix 9 hours ago

    I'd really like to know which use cases work and which don't. And when folks say they use agentic AI to churn through tokens to automate virtually the entire SDLC, are they just cherry picking the situations that turned out well, or do they really have prompting and workflow approaches that indeed increase their productivity 10-fold? Or, as you mention, is it possibly a niche area which works well?

    My personal experience the past five months has been very mixed. If I "let 'er rip" it's mostly junk I need to refactor or redo by micro-managing the AI. At the moment, at least for what I do, AI is like a fantastic calculator that speeds up your work, but where you still should be pushing the buttons.

    • orderone_ai 6 hours ago

      Or - crazy idea here - they're just full of it.

      I haven't seen an LLM stay on task anywhere near that long, like...ever. The only thing that works better left running overnight that has anything to do with ML, in my experience, is training.