Comment by bccdee

Comment by bccdee 5 months ago

4 replies

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

starspangled 5 months ago

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

  • bccdee 5 months ago

    The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

    • starspangled 5 months ago

      That's not true.

      • bccdee 5 months ago

        Okay what's the categorical difference? Which meaningful category includes one but not the other?