Comment by jaggs
Thanks for your prompt reply. I have to say I'm a little confused as to why you're excluding the two best code models, Gemini and Sonnet 4.5, from your stack? Is there something I'm missing?
Thanks for your prompt reply. I have to say I'm a little confused as to why you're excluding the two best code models, Gemini and Sonnet 4.5, from your stack? Is there something I'm missing?
Great question - I'll be direct.
It's not that Gemini & Sonnet are excluded. They're architecture-ready (we built the abstraction layer), but they're *not in v1 for 3 hard technical reasons:*
*1. Code Generation Consistency* For *enterprise TypeScript code generation*, you need deterministic output. Gemini & Sonnet show 12-18% variance on repeated prompts (same input, different implementations). Perplexity + Claude stabilize at 3-5%, Groq at 2%. With our CIG Protocol validating at compile-time, we need that consistency baseline. Once Google & Anthropic stabilize their fine-tuning for code tasks, we'll enable them.
*2. Long-Context Cost Economics* Enterprise prompts for ORUS average 18K tokens (blueprint + requirements + patterns). At current pricing: - Perplexity: $3/1M input tokens (~$0.054 per generation) - Claude 3.5: $3/1M input (~$0.054 per generation) - Groq: $0.05/1M input (~$0.0009 per generation) - Gemini 2.0 Flash: pricing TBA, likely $0.075/1M - Sonnet 4.5: $3/1M (~$0.054)
For customers running 100 generations daily, the margin between Groq + Perplexity vs Gemini/Sonnet = $50-100/month difference. We *can't ignore cost* when targeting startups.
*3. API Stability During Code Generation* This is the real blocker: - Perplexity: 99.8% uptime, code-optimized endpoints - Claude: 99.7% uptime, fine-tuning controls - Groq: 99.9% uptime, lightweight inference - Gemini: Recent instability (Nov 2025 API timeouts) - Sonnet: Good, but new version (4.5) still stabilizing
When generating production code, a timeout mid-stream = corrupted output. We can't ship that in v1.
*Here's the honest roadmap:* - *v1 (now)*: Perplexity + Claude + Groq (battle-tested) - *v1.2 (Jan 2026)*: Gemini 2.0 (when pricing finalizes & API stabilizes) - *v1.3 (Feb 2026)*: Sonnet 4.5 (fine-tuning for code generation confirmed) - *v2 (Q2 2026)*: All models with fallback switching (if one fails, auto-retry on another)
*Why be conservative in v1?* We have 400+ enterprise users waiting for open-source release. One corrupted generation costs us 5+ years of credibility. Better to add models post-launch when we have production telemetry.
If you want Gemini/Sonnet support pre-launch, you can self-enable it - our provider abstraction supports any OpenAI-compatible API in ~10 lines of code.