Comment by bccdee

Comment by bccdee 10 months ago

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

starspangled 10 months ago

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

Reply View 3 replies

bccdee 10 months ago

The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

Reply View | 2 replies
- starspangled 10 months ago
  
  That's not true.
  
  Reply View | 1 reply
  
  bccdee 10 months ago
  
  Okay what's the categorical difference? Which meaningful category includes one but not the other?
  
  Reply View | 0 replies