Comment by verdverm

The core issue is likely not with the LLM itself. Given sufficient context, instructions, and purposeful agents, a DAG of these will not produce such consistently wrong results with good grounding context

There are a lot of devils in the details and there are few in the story