Comment by pton_xd

Comment by pton_xd 5 hours ago

10 replies

As long as I have a Claude subscription, why do they care what harness I use to access their very profitable token inference business?

ankit219 4 hours ago

Because your subscription depends on the very API business.

Anthropic's cogs is rent of buying x amount of h100s. cost of a marginal query for them is almost zero until the batch fills up and they need a new cluster. So, API clusters are usually built for peak load with low utilization (filled batch) at any given time. Given AI's peak demand is extremely spiky they end up with low utilization numbers for API support.

Your subscription is supposed to use that free capacity. Hence, the token costs are not that high, hence you could buy that. But it needs careful management that you dont overload the system. There is a claude code telemetry which identifies the request as lower priority than API (and probably decide on queueing + caching too). If your harness makes 10 parallel calls everytime you query, and not manage context as well as claude code, its overwhelming the system, degrading the performance for others too. And if everyone just wants to use subscription and you have no api takers, the price of subscription is not sustainable anyway. In a way you are relying on others' generosity for the cheap usage you get.

Its reasonable for a company to unilaterally decide how they monetize their extra capacity, and its not unjustified to care. You are not purchasing the promise of X tokens with a subscription purchase for that you need api.

  • Imustaskforhelp 4 hours ago

    > Your subscription is supposed to use that free capacity. Hence, the token costs are not that high, hence you could buy that. But it needs careful management that you dont overload the system. There is a claude code telemetry which identifies the request as lower priority than API (and probably decide on queueing + caching too). If your harness makes 10 parallel calls everytime you query, and not manage context as well as claude code, its overwhelming the system, degrading the performance for others too. And if everyone just wants to use subscription and you have no api takers, the price of subscription is not sustainable anyway. In a way you are relying on others' generosity for the cheap usage you get.

    I understand what you mean but outright removing the ability for other agents to use the claude code subscription is still really harsh

    If telemetry really is a reason (Note: I doubt it is, I think the marketing/lock-ins aspect might matter more but for the sake of discussion, lets assume so that telemetry is in fact the reason)

    Then, they could've simply just worked with co-ordination with OpenCode or other agent providers. In fact this is what OpenAI is doing, they recently announced a partnership/collaboration with OpenCode and are actively embracing it in a way. I am sure that OpenCode and other agents could generate telemetry or atleast support such a feature if need be

    • ankit219 3 hours ago

      From what i have read on twitter. People were purchasing max subs and using it as a substitute for API keys for their startups. Typical scrappy startup story but this has the same bursty nature as API in temrs of concurrency and parallel requests. They used the Opencode implementation. This is probably one of the triggers because it screws up everything.

      Telemetry is a reason. And its also the mentioned reason. Marketing is a plausible thing and likely part of the reason too, but lock-in etc. would have meant this would have come way sooner than now. They would not even be offering an API in that case if they really want to lock people in. That is not consistent with other actions.

      At the same time, the balance is delicate. if you get too many subs users and not enough API users, then suddenly the setup is not profitable anymore. Because there is less underused capacity available to direct subs users to. This probably explains a part of their stance too, and why they havent done it till now. Openai never allowed it, and now when they do, they will make more changes to the auth setup which claude did not. (This episode tells you how duct taped whole system was at ant. They used the auth key to generate a claude code token, and just used that to hit the API servers).

arjie 5 hours ago

Demand-aggregation allows the aggregator to extract the majority of the value. ChatGPT the app has the biggest presence, and therefore model improvements in Claude will only take you so far. Everyone is terrified of that. Cursor et al. have already demonstrated to model providers that it is possible to become commoditized. Therefore, almost all providers are seeking to push themselves closer to the user.

This kind of thing is pretty standard. Nobody wants to be a vendor on someone else's platform. Anthropic would likely not complain too much about you using z.ai in Claude Code. They would prefer that. They would prefer you use gpt-5.2-high in Claude Code. They would prefer you use llama-4-maverick in Claude Code.

Because regardless of how profitable inference is, if you're not the closest to the user, you're going to lose sooner or later.

gbear605 4 hours ago

Because Claude Code is not a profitable business, it's a loss leader to get you to use the rest of their token inference business. If you were to pay for Claude Code by using the normal API, it would be at least 5x the cost, if not more.

  • aw123 4 hours ago

    source: pulled out of your a**

    • falloutx 4 hours ago

      he may not be entirely correct, but Claude Code plans are significantly better than the API plan, 100$ plan may not be as cost effective but for 18$ you can get like 5x usage of the API plan.

      • aw123 3 hours ago

        I've seen dozens of "experts" all over the internet claim that they're "subsidizing costs" with the coding plans despite no evidence whatsoever. Despite the fact that various sources from OpenAI, Deepseek, model inference providers have suggested the contrary, that inference is very profitable with very high margins.