Comment by jgmedr

Comment by jgmedr 12 hours ago

Our team has found success in treating skills more like re-usable semi-deterministic functions and less like fingers-crossed prompts for random edge-cases.

For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.

mooreds 11 hours ago

How do you test these skills for consistency over time, or is that not needed?

Reply View 2 replies

theshrike79 10 hours ago

The same way you'd test a human following written instructions over time.
Check the results.

Reply View | 0 replies
pizzafeelsright 9 hours ago

My experience has been that if the skill is broken down into a function, possibly paired with a validator in another stage, you're at 99.9% deterministic.
I have not yet tested this at scale but give me six months.

Reply View | 0 replies