Comment by jgmedr

Comment by jgmedr 12 hours ago

3 replies

Our team has found success in treating skills more like re-usable semi-deterministic functions and less like fingers-crossed prompts for random edge-cases.

For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.

mooreds 11 hours ago

How do you test these skills for consistency over time, or is that not needed?

  • theshrike79 10 hours ago

    The same way you'd test a human following written instructions over time.

    Check the results.

  • pizzafeelsright 9 hours ago

    My experience has been that if the skill is broken down into a function, possibly paired with a validator in another stage, you're at 99.9% deterministic.

    I have not yet tested this at scale but give me six months.