theshrike79 10 hours ago

The same way you'd test a human following written instructions over time.

Check the results.

pizzafeelsright 9 hours ago

My experience has been that if the skill is broken down into a function, possibly paired with a validator in another stage, you're at 99.9% deterministic.

I have not yet tested this at scale but give me six months.