asmor 15 hours ago

Would be much more interesting if this ranked based on severity of misdiagnosis. An LLM that is 50% better at diagnosing a common cold but missed sepsis 10% more often would not be an overall improvement.