Comment by starspangled

Comment by starspangled a year ago

> The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws.

Just based on the comparisons linked in the article, it's not "co-state-of-the-art", it's the clear leader. You might argue those numbers are wrong or not representative, but you can't accept them then claim it's not outperforming existing models.

bccdee a year ago

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

Reply View 4 replies

starspangled a year ago

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

Reply View | 3 replies
- bccdee a year ago
  
  The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.
  
  Reply View | 2 replies
  
  starspangled a year ago
  
  That's not true.
  
  Reply View | 1 reply
  
  bccdee 10 months ago
  
  Okay what's the categorical difference? Which meaningful category includes one but not the other?
  
  Reply View | 0 replies