Comment by schmuhblaster

> But my bet is that the proposed program-of-thought is too specific

This is my impression as well, having worked with this type of stuff for the past two years. It works great for very well defined uses case and if user queries do not stray to far from what you optimized your framework/system prompt/agent for. However, once you move too far away from that, it quickly breaks down.

Nevertheless, as this problem has been bugging me for a while, I still haven't given up (although I probably should ;-). My latest attempt is a Prolog-based DSL (http://github.com/deepclause/deepclause.ai) that allows for part of the logic to be handled by LLMs again, so that it retains some of the features of pure LLM_based systems. As a side effect, this gives additional features such as graceful failures, auditability and increased (but not full) reproducibility.