Comment by llmslave
The benchmarks on all these models are meaningless
The benchmarks on all these models are meaningless
30 people trying out all models on the list for their use case for a week and then checking what they're still using a month after.
Why and what would a good benchmark look like?