HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by kalkin

Comment by kalkin 19 hours ago

0 replies

View on Hacker News

Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332