Comment by whatpeoplewant

Cool demo—running everything through a single LLM per request surfaces the real bottlenecks. A practical tweak is an agentic/multi‑agent pattern: have a planner synthesize a stable schema+UI spec (IR) once and cache it, then use small executor agents to call tools deterministically with constrained decoding; run validation/rendering in parallel, stream partial UI, and use a local model for cheap routing. That distributed, parallel agentic AI setup slashes tokens and latency while stabilizing UI across requests. You still avoid hand‑written code, but the system converges on reusable plans instead of re‑deriving them each time.