Comment by openclawai
Comment by openclawai 4 hours ago
For context on what cloud API costs look like when running coding agents:
With Claude Sonnet at $3/$15 per 1M tokens, a typical agent loop with ~2K input tokens and ~500 output per call, 5 LLM calls per task, and 20% retry overhead (common with tool use): you're looking at roughly $0.05-0.10 per agent task.
At 1K tasks/day that's ~$1.5K-3K/month in API spend.
The retry overhead is where the real costs hide. Most cost comparisons assume perfect execution, but tool-calling agents fail parsing, need validation retries, etc. I've seen retry rates push effective costs 40-60% above baseline projections.
Local models trading 50x slower inference for $0 marginal cost start looking very attractive for high-volume, latency-tolerant workloads.
On the other hand, Deepseek V3.2 is $0.38 per million tokens output. And on openrouter, most providers serve it at 20 tokens/sec.
At 20t/s over 1 month, that's... $19something running literally 24/7. In reality it'd be cheaper than that.
I bet you'd burn more than $20 in electricity with a beefy machine that can run Deepseek.
The economics of batch>1 inference does not go in favor of consumers.