Comment by Der

Be that as it may, long context window models which are good are not a mirage. By say late 2027, when the LLM providers figure out that they're using the wrong samplers, they will figure out how to get you 2 million output tokens per LLM call which stay coherent.

Comment by Der_Einzige