Comment by seunosewa Comment by seunosewa 3 days ago 1 reply Copy Link View on Hacker News The degradation may be more significant within the day than at the same time every day.
Copy Link GoatInGrey 3 days ago Collapse Comment - Sure, but it's still useful insight to see how it performs over time. Of course, cynically, Anthropic could game the benchmark by routing this benchmark's specific prompts to an unadulterated instance of the model. Reply View | 0 replies
Sure, but it's still useful insight to see how it performs over time. Of course, cynically, Anthropic could game the benchmark by routing this benchmark's specific prompts to an unadulterated instance of the model.