Comment by vessenes

Comment by vessenes 11 hours ago

5 replies

I'm thinking the next step would be to include this as a 'junior dev' and let Opus farm simple stuff out to it. It could be local, but also if it's on cerebras, it could be realllly fast.

ttoinou 11 hours ago

Cerebras already has GLM 4.7 in the code plans

  • vessenes 11 hours ago

    Yep. But this is like 10x faster; 3B active parameters.

    • ttoinou 11 hours ago

      Cerebras is already 200-800 tps, do you need even faster ?

      • overfeed 10 hours ago

        Yes! I don't try to read agent tokens as they are generated, so if code generation decreases from 1 minute to 6 seconds, I'll be delighted. I'll even accept 10s -> 1s speedups. Considering how often I've seen agents spin wheels with different approaches, faster is always better, until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops

        • pqtyw 6 hours ago

          > until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops

          That would imply they'd have to be actually smarter than humans, not just faster and be able to scale infinitely. IMHO that's still very far away..