Comment by bccdee

Comment by bccdee 10 months ago

4 replies

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

starspangled 10 months ago

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

  • bccdee 10 months ago

    The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

    • starspangled 10 months ago

      That's not true.

      • bccdee 10 months ago

        Okay what's the categorical difference? Which meaningful category includes one but not the other?