Comment by gcanyon

Comment by gcanyon a day ago

11 replies

The "GPT-3 moment" framing is a bit hype-y I think? GPT-3 eliminated the need for task-specific fine-tuning, but from the article RL wouldn't replace LLM-style pretraining. So this is more of an incremental advance than the paradigm shift GPT-3 represented. That said, if it unlocks RL generalization that would be huge.

The core claim that massive-scale RL will unlock generalization doesn't seem that surprising since we've seen the scaling hypothesis play out across ML. But "replication training" on software is interesting: learning by copying existing programs potentially unlocks a ton of complex training data with objective evaluation criteria.

To me, the big unanswered question is whether skills learned from replicating software would generalize to other reasoning tasks. That's a significant "if" - great if it works, pointless if it doesn't.

kevindamm a day ago

It's a very big "if" because other fields are comparatively underspecified. There's no equivalent to a compiler or interpreter in most cases (with spreadsheets being the lingua franca that comes even close for most industries).

It would "work" but I think it will need even more scrutiny by experts to confirm what's correct and what needs to be re-generated. Please please no vibe accounting.

  • jasim 6 hours ago

    Accounting, specifically book-keeping, really plays to the strengths of LLMs - pattern matching within a bounded context.

    The primary task in book-keeping is to classify transactions (from expense vouchers, bank transactions, sales and purchase invoices and so on) and slot them into the Chart of Accounts of the business.

    LLMs can already do this well without any domain/business specific context. For example - a fuel entry is so obvious that they can match it into a similar sounding account in the CoA.

    And for others where human discretion is required, we can add a line of instruction in the prompt, and that classification is permanently encoded. A large chunk of these kind of entries are repetitive in nature, and so each such custom instruction is a long-term automation.

    You might have not been speaking about simple book-keeping. If so, I'm curious to learn.

    • xnorswap 5 hours ago

      Audit is at the heart of accounting, and LLMs are the antithesis of an audit trail.

      • jasim 3 hours ago

        I'm sorry I don't follow. The fact that you use an LLM to classify a transaction does not mean there is no audit trail for the fact. There should also be a manual verifier who's ultimately responsible for the entries, so that we do not abdicate responsibility to black boxes.

        • xnorswap 3 minutes ago

          If you mark data as "Processed by LLM", that in turn taints all inference from it.

          Requirements for a human in the loop devolve to ticking a box by someone who doesn't realise the responsibility they have been burdened with.

          Mark my words, some unfortunate soul will be then be thrown under the bus once a major scandal arises from such use of LLMs.

          As an example, companies aren't supposed to use AI for hiring, they are supposed to have all decisions made by a human-in-the-loop. Inevitably this just means presenting a massive grid of outcomes to someone who never actually goes against the choices of the machine.

          The more junior the employee, the "better". They won't challenge the system, and they won't realise the liability they're setting themselves up with, and the company will more easily shove them under the proverbial bus if there ever is an issue.

          Hiring is too nebulous, too hard to get concrete data for, and too hard to inspect outcomes to properly check.

          Financial auditing however is the opposite of that. It's hard numbers. Inevitably when discrepancies arise, people run around chasing other people to get all their numbers close enough to something that makes sense. There's enough human wiggle-room to get away with chaotic processes that still demand accountability.

          This is possibly the worst place you could put LLMs, if you care about actual outcomes:

          1. Mistakes aren't going to get noticed.

          2. If they are noticed, people aren't going to be empowered to actually challenge them, especially once they're used to the LLM doing the work.

          3. People will be held responsible for the LLM's mistakes, despite pressure (And the general sense of time-pressure in audit is already immense ) to sign-off.

          4. It's a black-box, so any faults cannot be easily diagnosed, the best you can do is try to re-prompt in a way that doesn't happen.

  • cjblomqvist 21 hours ago

    > Please please no vibe accounting.

    Funny you mention; There are multiple companies in Sweden working on AI/ML based accounting. It's not so different from AI/ML based automated driving.

    • kevindamm 19 hours ago

      I've seen some of those but all of the ones I've looked at also had a panel of experts who could give it a once-over (or re-work) before sending it back to the client. I'd compare it more to cruise control or driver-assist but not quite automated driving.

mcbuilder a day ago

This article stands as complete hype. They just seem to offer an idea of "replication training" which is just some vague agentic distributed RL. Multi-agent distributed reinforcement learning algorithms have been in the actual literature for a while. I suggest studying what DeepMind is doing for current state of the art in agentic distributed RL.

  • janalsncm 20 hours ago

    I didn’t think it was vague. Given an existing piece of software, write a detailed spec on what it does and then reward the model for matching its performance.

    The vague part is whether this will generalize to other non software domains.

    • intrasight 12 hours ago

      > write a detailed spec on what it does

      A much harder task than writing said software

make3 14 hours ago

Arguably LLM with RL has already had its GPT-3 moment with DeepSeek R1 doing so well that it deleted a trillion $ + of stock value in big tech. If you see the GPT-3 moment as the moment where people definitely took notice, this was one of those moment.