Comment by starspangled

Comment by starspangled 10 months ago

5 replies

> The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws.

Just based on the comparisons linked in the article, it's not "co-state-of-the-art", it's the clear leader. You might argue those numbers are wrong or not representative, but you can't accept them then claim it's not outperforming existing models.

bccdee 10 months ago

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

  • starspangled 10 months ago

    It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

    • bccdee 10 months ago

      The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

      • starspangled 10 months ago

        That's not true.

        • bccdee 10 months ago

          Okay what's the categorical difference? Which meaningful category includes one but not the other?