Comment by sinenomine

Comment by sinenomine 2 hours ago

0 replies

There are high-quality linear or linear-ish attention implementations for the scales around 100k... 1M. The price of context can be made linear and moderate, and it can be greatly improved by implementing prompt caching and passing savings to users. Gpt-5.2-xhigh is good at this and from my experience has markedly higher intelligence and accuracy compared to opus-4.5, while enjoying lower price per token.