Comment by noosphr
People should take this as a lesson on how much we are being subsidized right now.
Claude code runs into use limitations for everyone at every tier. The API is too expensive to use and it's _still_ subsidized.
I keep repeating myself but no one seems to listen: quadratic attention means LLMs will always cost astronomically more than you expect after running the pilot project.
Going from 10k loc to 100k loc isn't a 10x increase, it's a 99x increase. Going from 10k loc to 1m loc isn't a 100x increase, it's a 9999x increase. This is fundamental to how transformers work and is the _best case scenario_. In practice things are worse.
There are high-quality linear or linear-ish attention implementations for the scales around 100k... 1M. The price of context can be made linear and moderate, and it can be greatly improved by implementing prompt caching and passing savings to users. Gpt-5.2-xhigh is good at this and from my experience has markedly higher intelligence and accuracy compared to opus-4.5, while enjoying lower price per token.