Comment by jstummbillig

Comment by jstummbillig 3 days ago

11 replies

> Obviously directly including context in something like a system prompt will put it in context 100% of the time.

How do you suppose skills get announced to the model? It's all in the context in some way. The interesting part here is: Just (relatively naively) compressing stuff in the AGENTS.md seems to work better than however skills are implemented.

cortesoft 3 days ago

Isn't the difference that a skill means you just have to add the script name and explanation to the context instead of the entire script plus the explanation?

  • majormajor 2 days ago

    Their non-skill based "compressed index" is just similarly "Each line maps a directory path to the doc files it contains" but without "skillification." They didn't load all those things into context directly, just pointers.

    They also didn't bother with any more "explanation" beyond "here are paths for docs."

    But this straightforward "here are paths for docs" produced better results, and IMO it makes sense since the more extra abstractions you add, the more chance of a given prompt + situational context not connecting with your desired skill.

  • sevg 3 days ago

    You could put the name and explanation in CLAUDE.md/AGENTS.md, plus the path to the rest of the skill that Claude can read if needed.

    That seems roughly equivalent to my unenlightened mind!

  • verdverm 3 days ago

    I like to think about it this way, you want to put some high level, table of contents, sparknotes like stuff in the system prompt. This helps warm up the right pathways. In this, you also need to inform that there are more things it may need, depending on "context", through filesystem traversal or search tools, the difference is unimportant, other than most things outside of coding typically don't do filesystem things the same way

    • imiric 2 days ago

      The amount of discussion and "novel" text formats that accomplish the same thing since 2022 is insane. Nobody knows how to extract the most value out of this tech, yet everyone talks like they do. If these aren't signs of a bubble, I don't know what is.

      • stevenhuang 2 days ago

        It's a new technology under active development so people are simply sharing what works for them in the given moment.

        > If these aren't signs of a bubble, I don't know what is.

        This conclusion is incoherent and doesn't follow from any of your premises.

        • imiric 2 days ago

          Sure it does. Many people are jumping on ideas and workflows proposed by influencer personalities and companies, without actually evaluating how valid or useful they actually are. TFA makes this clear by saying that they were "betting on skills" and only later determined that they get better performance from a different workflow.

          This is very similar to speculative valuations around the web in the late 90s, except this bubble is far larger, more mainstream and personal.

          The fact that this is a debate about which Markdown file to put prompt information in is wild. It ultimately all boils down to feeding context to the model, which hasn't fundamentally changed since 2022.

      • verdverm 2 days ago

        1. There is nothing novel in my text formats, I'm just deciding what content and what files

        2. I've actually done these things, seen the difference, and share it with others

        Yes there are a lot of unknowns and a lot of people speaking from ignorance, but it is a mistake, perhaps even bigotry by definition, to make such blanket statements and judgemental about people

jmathai 2 days ago

Skills have frontmatter which includes a name and description. The description is what determines if the llm finds the skill useful for the task at hand.

If your agent isn’t being used, it’s not as simple as “agents aren’t getting called”. You have to figure out how to get the agent invoked.

  • Spivak 2 days ago

    Sure, but then you're playing a very annoying and boring game of model-whispering to specific versions of models that are ever changing as well as trying to hopefully get it to respond correctly with who knows what user input surrounds it.

    I really only think the game is worth playing when it's against a fixed version of a specific model. The amount of variance we observe between different releases of the same model is enough to require us to update our prompts and re-test. I don't envy anyone who has to try and find some median text that performs okay on every model.

    • bonesss 2 days ago

      About a year ago I made an ChatGPT and Claude based hobo RAG-alike solution for exploring legal cases, using document creation and LLMs to craft a rich context window for interrogation in the chat.

      Just maintaining a basic interaction framework, consistent behaviours in chat when starting up, was a daily whack-a-mole where well-tested behaviours shift and alter without rhyme or reason. “Model whispering” is right. Subjectively it felt like I could feel Anthropic/OpenAI engineers twiddling dials on the other side.

      Writing code that executes the same every time has some minor benefits.