Comment by pton_xd
Comment by pton_xd 5 hours ago
As long as I have a Claude subscription, why do they care what harness I use to access their very profitable token inference business?
Comment by pton_xd 5 hours ago
As long as I have a Claude subscription, why do they care what harness I use to access their very profitable token inference business?
> Your subscription is supposed to use that free capacity. Hence, the token costs are not that high, hence you could buy that. But it needs careful management that you dont overload the system. There is a claude code telemetry which identifies the request as lower priority than API (and probably decide on queueing + caching too). If your harness makes 10 parallel calls everytime you query, and not manage context as well as claude code, its overwhelming the system, degrading the performance for others too. And if everyone just wants to use subscription and you have no api takers, the price of subscription is not sustainable anyway. In a way you are relying on others' generosity for the cheap usage you get.
I understand what you mean but outright removing the ability for other agents to use the claude code subscription is still really harsh
If telemetry really is a reason (Note: I doubt it is, I think the marketing/lock-ins aspect might matter more but for the sake of discussion, lets assume so that telemetry is in fact the reason)
Then, they could've simply just worked with co-ordination with OpenCode or other agent providers. In fact this is what OpenAI is doing, they recently announced a partnership/collaboration with OpenCode and are actively embracing it in a way. I am sure that OpenCode and other agents could generate telemetry or atleast support such a feature if need be
From what i have read on twitter. People were purchasing max subs and using it as a substitute for API keys for their startups. Typical scrappy startup story but this has the same bursty nature as API in temrs of concurrency and parallel requests. They used the Opencode implementation. This is probably one of the triggers because it screws up everything.
Telemetry is a reason. And its also the mentioned reason. Marketing is a plausible thing and likely part of the reason too, but lock-in etc. would have meant this would have come way sooner than now. They would not even be offering an API in that case if they really want to lock people in. That is not consistent with other actions.
At the same time, the balance is delicate. if you get too many subs users and not enough API users, then suddenly the setup is not profitable anymore. Because there is less underused capacity available to direct subs users to. This probably explains a part of their stance too, and why they havent done it till now. Openai never allowed it, and now when they do, they will make more changes to the auth setup which claude did not. (This episode tells you how duct taped whole system was at ant. They used the auth key to generate a claude code token, and just used that to hit the API servers).
Demand-aggregation allows the aggregator to extract the majority of the value. ChatGPT the app has the biggest presence, and therefore model improvements in Claude will only take you so far. Everyone is terrified of that. Cursor et al. have already demonstrated to model providers that it is possible to become commoditized. Therefore, almost all providers are seeking to push themselves closer to the user.
This kind of thing is pretty standard. Nobody wants to be a vendor on someone else's platform. Anthropic would likely not complain too much about you using z.ai in Claude Code. They would prefer that. They would prefer you use gpt-5.2-high in Claude Code. They would prefer you use llama-4-maverick in Claude Code.
Because regardless of how profitable inference is, if you're not the closest to the user, you're going to lose sooner or later.
I've seen dozens of "experts" all over the internet claim that they're "subsidizing costs" with the coding plans despite no evidence whatsoever. Despite the fact that various sources from OpenAI, Deepseek, model inference providers have suggested the contrary, that inference is very profitable with very high margins.
Because your subscription depends on the very API business.
Anthropic's cogs is rent of buying x amount of h100s. cost of a marginal query for them is almost zero until the batch fills up and they need a new cluster. So, API clusters are usually built for peak load with low utilization (filled batch) at any given time. Given AI's peak demand is extremely spiky they end up with low utilization numbers for API support.
Your subscription is supposed to use that free capacity. Hence, the token costs are not that high, hence you could buy that. But it needs careful management that you dont overload the system. There is a claude code telemetry which identifies the request as lower priority than API (and probably decide on queueing + caching too). If your harness makes 10 parallel calls everytime you query, and not manage context as well as claude code, its overwhelming the system, degrading the performance for others too. And if everyone just wants to use subscription and you have no api takers, the price of subscription is not sustainable anyway. In a way you are relying on others' generosity for the cheap usage you get.
Its reasonable for a company to unilaterally decide how they monetize their extra capacity, and its not unjustified to care. You are not purchasing the promise of X tokens with a subscription purchase for that you need api.