Comment by lvl155

Comment by lvl155 16 hours ago

5 replies

This is really not the point. Anthropic isn’t cutting off third-party. You can use their models via API all you want. Why are people conflating this issue? Anthropic doesn’t owe anyone anything to offer their “unlimited” pro tiers outside of Claude Code. It’s not hard to build your own Opencode and use API keys. CLI interface by itself is not a moat.

noosphr 15 hours ago

People should take this as a lesson on how much we are being subsidized right now.

Claude code runs into use limitations for everyone at every tier. The API is too expensive to use and it's _still_ subsidized.

I keep repeating myself but no one seems to listen: quadratic attention means LLMs will always cost astronomically more than you expect after running the pilot project.

Going from 10k loc to 100k loc isn't a 10x increase, it's a 99x increase. Going from 10k loc to 1m loc isn't a 100x increase, it's a 9999x increase. This is fundamental to how transformers work and is the _best case scenario_. In practice things are worse.

  • the_gipsy 15 hours ago

    I don't see LLMs ingesting the LoCs. I see CC finding and grepping and reading file contents piecewise, precisely because it is too expensive to ingest a whole project.

    So what you say is not true: cost does not directly correlate with LoC.

  • anonym29 14 hours ago

    >Claude code runs into use limitations for everyone at every tier

    What do you mean by this? I know plenty of people who never hit the upgraded Opus 4.5 limits anymore even on the $100 plan, even those who used to hit the limits on the $200 plan w/ Opus 4 and Opus 4.1.

    >The API is too expensive to use and it's _still_ subsidized.

    What do you mean by saying the API is subsidized? Anthropic is a private company that isn't required to (and doesn't) report detailed public financial statements. The company operating at a loss doesn't mean all inference is operating at a loss, it means that the company is spending an enormous amount of money on R&D. The fact that the net loss is shrinking over time tells us that the inference is producing net profit over time. In this business, there is enormous up front cost to train a model. That model then goes on to generate initially large, but subsequently gradually diminishing revenue until the model is deprecated. That said, at any given snapshot-in-time, while there is likely large ongoing R&D expenditure on the next model causing the overall net profit for the entire company to still be negative, it's entirely possible that several, if not many or even most of the previously trained models have fully recouped their training costs in inference revenue.

    It's fairly obvious that the monthly subscriptions are subsidized to gain market share the same way Uber rides were on early on, but what indication do you have that the PAYG API is being subsidized? How would total losses have shrunk from $5.6B in 2024 to just $3B in 2025 while ARR grew from ~$1B to ~$7B over the same time period (one where usage of the platform dramatically expanded) if PAYG API inference wasn't running at a net profit for the company?

    >quadratic attention means LLMs will always cost astronomically more than you expect after running the pilot project

    This is only true as long as O(n²) quadratic attention remains the prevailing paradigm. As Qwen3-Next and Nemotron 3 Nano have shown with hybrid linear attention + sparse quadratic layers and a hybrid Mamba SSM, not all modern, performant LLMs necessarily need to run strictly O(n²) quadratic attention models. Sure, these aren't frontier models competitive with Opus 4.5 or Gemini 3 Pro or GPT 5.2 xhigh, but these aren't experimental tiny toy models like RWKV or Falcon Mamba that serve as little more than PoCs for alternative architectures, either. Qwen3-Next and Nemotron 3 Nano are solid players in their respective local weight classes.

    • cmrdporcupine 11 hours ago

      Nemotron 3 is amazing. 60 tokens/s on my 128GB Nvidia GB10, and actually emits some pretty reasonable "smart" content" for its size.

  • DSingularity 8 hours ago

    Good architecture (eg separation of concerns) means you won’t need to expose 1M loc to the llm all at once.