Comment by tomp
For me as well… with constant human supervision. But if you try to build a business service, you need autonomy and exact rule following. We’re not there yet.
For me as well… with constant human supervision. But if you try to build a business service, you need autonomy and exact rule following. We’re not there yet.
In my company, LLMs replaced something we used to use humans for. Turned out LLMs are better than humans at following rules.
If you need a way to perform complicated tasks with autonomy and exact rule following, your problem simply won't be solved right now.
Autonomy and rule following are at odds. Humans have the same problem. The solutions we use for ourselves work amazingly for LLMs (because they're trained on human data).
Examples: Give an LLM an effective identity (prompt engineering), a value system (Constitutional AI), make it think about these things before it acts (CoT + system prompt), have a more capable [more expensive / higher inference] agent review the LLMs work from time to time (multi-agent), have a more capable agent iterate on prompts to improve results in a test environment (EvoAgents), etc.
We can't simply provide an off the shelf LLM with a paragraph or two and expect it to reliably fulfill an arbitrary task without supervision any more than we can expect the same from a random nihilist going through an identity crisis. They both need identity, values, time to think, social support, etc. before they can be reliable workers.