Comment by Davidzheng
Comment by Davidzheng 3 days ago
but degradation from servers being overloaded would be the type of degradation this SHOULD measure no? Unless it's only intended for measuring their quietly distilling models (which they claim not to do? idk for certain)
Load just makes LLMs behave less deterministically and likely degrade. See: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
They don't have to be malicious operators in this case. It just happens.