Comment by eterm

Comment by eterm 3 days ago

5 replies

4. The graph starts January 8.

Why January 8? Was that an outlier high point?

IIRC, Opus 4.5 was released late november.

F7F7F7 3 days ago

Right after the Holiday double token promotion users felt (perceived) a huge regression in capabilities. I bet that triggered the idea.

pertymcpert 3 days ago

People were away for the holidays. What do you want them to do?

littlestymaar 3 days ago

Or maybe, juste maybe, that's when they started testing…

  • eterm 3 days ago

    Wayback machine has nothing for this site before today, and article is "last updated Jan 29".

    A benchmark like this ought to start fresh from when it is published.

    I don't entirely doubt the degradation, but the choice of where they went back to feels a bit cherry-picked to demonstrate the value of the benchmark.

    • littlestymaar 3 days ago

      Which makes sense, you gotta wait until you get enough data before you can communicate on the said data…

      If anything it's coherent with the fact that they very likely didn't have data earlier than January the 8th.