Comment by motoboi

Comment by motoboi 2 days ago

23 replies

Models are not AGI. They are text generators forced to generate text in a way useful to trigger a harness that will produce effects, like editing files or calling tools.

So the model won’t “understand” that you have a skill and use it. The generation of the text that would trigger the skill usage is made via Reinforcement Learning with human generated examples and usage traces.

So why don’t the model use skills all the time? Because it’s a new thing, there is not enough training samples displaying that behavior.

They also cannot enforce that via RL because skills use human language, which is ambiguous and not formal. Force it to use skills always via RL policy and you’ll make the model dumber.

So, right now, we are generating usage traces that will be used to train the future models to get a better grasp of when to use skills not. Just give it time.

AGENTS.md, on the other hand, is context. Models have been trained to follow context since the dawn of the thing.

vidarh 2 days ago

> AGENTS.md, on the other hand, is context. Models have been trained to follow context since the dawn of the thing.

The skills frontmatter end up in context as well.

If AGENTS.md outperform skills in a given agent, it is down to specifically how the skills frontmatter is extracted and injected into the context, because that is the only difference between the two approaches.

EDIT: I haven't tried to check this so this is pure speculation, but I suppose there is the possibility that some agents might use a smaller model to selectively decide what skills frontmatter to include in context for a bigger model. E.g. you could imagine Claude passing the prompt + skills frontmatter to Haiku to selectively decide what to include before passing to Sonnet or Opus. In that case, depending on approach, putting it directly in AGENTS.md might simply be a question of what information is prioritised in the ouput passed to the full model. (Again: this is pure speculation of a possible approach; though it is one I'd test if I were to pick up writing my own coding agent again)

But really the overall point is that AGENTS.md vs. skills here still is entirely a question of what ends up in the "raw" context/prompt that gets passed to the full model, so this is just nuance to my original answer with respect to possible ways that raw prompt could be composed.

  • OJFord 2 days ago

    No it's more than that - they didn't just put the skills instructions directly in AGENTS.md, they put the whole index for the docs (the skill in this case being a docs lookup) in there, so there's nothing to 'do', the skill output is already in context (or at least pointers to it, the index, if not the actual file contents) not just the front matter.

    Hence the submission's conclusion:

    > Our working theory [for why this performs better] comes down to three factors.

    > No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present.

    > Consistent availability. Skills load asynchronously and only when invoked. AGENTS.md content is in the system prompt for every turn.

    > No ordering issues. Skills create sequencing decisions (read docs first vs. explore project first). Passive context avoids this entirely.

    • vidarh 2 days ago

      > No it's more than that - they didn't just put the skills instructions directly in AGENTS.md, they put the whole index for the docs (the skill in this case being a docs lookup) in there, so there's nothing to 'do', the skill output is already in context (or at least pointers to it, the index, if not the actual file contents) not just the front matter.

      The point remains: That is still just down to how you compose the context/prompt that actually goes to the model.

      Nothing stops an agent from including logic to inline the full set of skills if the context is short enough. The point of skills is to provide a mechanism for managing context to reduce the need for summarization/compaction or explicit management, and so allowing you to e.g. have a lot of them available.

      (And this kind of makes the article largely moot - it's slightly neat to know it might be better to just inline the skills if you have few enough that they won't seriously fill up your context, but the main value of skills comes when you have enough of them that this isn't the case)

      Conversely, nothing prevents the agent from using lossy processing with a smaller, faster model on AGENTS.md either before passing it to the main model e.g. if context is getting out of hand, or if the developer of a given agent think they have a way of making adherence better by transforming them.

      These are all tooling decisions, not features of the models.

      • OJFord 2 days ago

        However you compose the context for the skill, the model has to generate output like 'use skill docslookup(blah)' vs. just 'according to the docs in context' (or even 'read file blah.txt mentioned in context') which training can affect.

        • vidarh 2 days ago

          This is assuming you make the model itself decide whether the skill is relevant, and that is one way of doing it, but there is no reason that needs to be the case.

          Of course training can affect it, but the point is that there is nothing about skills that need to be different to just sending all the skill files as part of the context, because that is a valid way of implementing skills, though it looses the primary benefit of skills, namely the ability to have more documentation of how to do things than fits in context.

          Other options that also do not require the main model to know what to include ranges from direct string matching (e.g. against /<someskill>) via embeddings, to passing a question to a smaller model (e.g "are any of these description relevant to this prompt: ...").

    • seunosewa 2 days ago

      What if they used the same compressed documentation in the skill? That would be just fine too.

      • OJFord 2 days ago

        Sure but it would be a trivial comparison then, this is really about context vs tool-calling.

js8 2 days ago

> Models are not AGI.

How do you know? What if AGI can be implemented as a reasonably small set of logic rules, which implement what we call "epistemology" and "informal reasoning"? And this set of rules is just being run in a loop, producing better and better models of reality. It might even include RL, for what we know.

And what if LLMs already know all these rules? So they are AGI-complete without us knowing.

To borrow from Dennett, we understand LLMs from the physical stance (they are neural networks) and the design stance (they predict next token of language), but do we understand them from an intentional stance, i.e. what rules they employ when they running chain-of-thought for example?

  • blueprint 2 days ago

    It's very simple. The model itself doesn't know and can't verify it. It knows that it doesn't know. Do you deny that? Or do you think that a general intelligence would be in the habit of lying to people and concealing why? At the end of the day, that would be not only unintelligent, but hostile. So it's very simple. And there is such a thing as "the truth", and it can be verified by anyone repeatably in the requisite (fair, accurate) circumstances, and it's not based in word games.

    • js8 a day ago

      All I asked for was the OP to substantiate their claim that LLMs are not AGI. I am agnostic on that - either way seems plausible.

      I don't think there even is an agreed criterion of what AGI is. Current models can easily pass the Turing test (except some gotchas, but these don't really test intelligence).

    • coldtea 2 days ago

      None of the above are even remotely epistemologically sound.

      "Or do you think that a general intelligence would be in the habit of lying to people and concealing why?"

      First, why couldn't it? "At the end of the day, that would be not only unintelligent, but hostile" is hardly an argument against it. We ourselves are AGI, but we do both unintelligent and hostile actions all the time. And who said it's unintelligent to begin with? As in AGI it might very well be in my intelligent self-interests to lie about it.

      Second, why is "knows it and can verify" a necessary condition? An AGI could very well not know it's one.

      >And there is such a thing as "the truth", and it can be verified by anyone repeatably in the requisite (fair, accurate) circumstances, and it's not based in word games.

      Epistemologically speaking, this is hardly the slam-dunk argument you think it is.

      • blueprint 2 days ago

        no, you missed some of my sentences. you have to take the whole picture together. and I was not making an argument to you to prove the existence of the truth. You are clearly bent on arguing against its existence, which tells me enough about you. We were talking about agents that operate in good faith that know that they are safe. When you're ready to have a discussion in good faith rather than attempting to find counterarguments, then you will find that what I said is verifiable. The question is not whether you think you can come up with a way to make an argument that sounds like it contradicts what I said.

        The question is not whether an AGI knows that it is an AGI. The question is whether it knows that it is not one. And you're missing the fact that there's no such thing as it here.

        If you go around acting hostile to good people that's still not very intelligent. In fact, I would question if you have any concept of why you're doing it at all. chances are you're doing it to run from yourself not because you know what you're doing.

        Anyway, you're just speculating and the fact of the matter is that you don't have to speculate. If you actually wanted to verify what I said, it would be very easy to do so. it's not a surprise that someone who doesn't want to know something will have deaf ears. so I'm not going to pretend that I stand a chance of convincing you when I already know that my argument is accurate.

        don't be so sure that you meet the criteria for AGI.

        and as for my slam dunk, any attempt to argue against the existence of truth, automatically validates your assumption of its existence. so don't make the mistake of assuming I had to argue about it. I was merely stating a fact.

themoose8 2 days ago

Indeed, they're not AGI. They're basically autocomplete on steroids.

They're very useful, but as we all know - they're far from infallible.

We're probably plateauing on the improvement of the core GPT technology. For these models and APIs to improve, it's things like Skills that need to be worked on and improved, to reduce those mistakes that it makes and produce better output.

So it's pretty disappointing to see that the 'Skills' feature set as implemented, as great of a concept as it is, is pretty bogus compared to just front loading the AGENTS.md file. This is not obvious and valuable to know.

  • coldtea 2 days ago

    >Indeed, they're not AGI. They're basically autocomplete on steroids.

    This makes the assumption that AGI is not autocomplete of steroids, which even before LLMs was a very plausible suggested mechanism for what intelligence is.

  • whattheheckheck a day ago

    They haven't even released the full complete retrain on the entire corpus of what they have in the training data. They have billions of chats detailing precisely a high fidelity map of the inner workings of millions of people psychologically. The next ones gonna be a banger + the non lobotimized one for the military

baby 2 days ago

I was thinking about that these says and experimenting like so: a system prompt that asks the agent to load any skills that seem relevant early, and a user prompt that asks the agent to do that later when a skill becomes relevant