Comment by vessenes

Comment by vessenes 11 hours ago

I'm thinking the next step would be to include this as a 'junior dev' and let Opus farm simple stuff out to it. It could be local, but also if it's on cerebras, it could be realllly fast.

ttoinou 11 hours ago

Cerebras already has GLM 4.7 in the code plans

Reply View 4 replies

vessenes 11 hours ago

Yep. But this is like 10x faster; 3B active parameters.

Reply View | 3 replies
- ttoinou 11 hours ago
  
  Cerebras is already 200-800 tps, do you need even faster ?
  
  Reply View | 2 replies
  
  overfeed 10 hours ago
  
  Yes! I don't try to read agent tokens as they are generated, so if code generation decreases from 1 minute to 6 seconds, I'll be delighted. I'll even accept 10s -> 1s speedups. Considering how often I've seen agents spin wheels with different approaches, faster is always better, until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops
  
  Reply View | 1 reply
  
  pqtyw 6 hours ago
  
  > until models can 1-shot solutions without the repeated "No, wait..." / "Actually..." thinking loops
  That would imply they'd have to be actually smarter than humans, not just faster and be able to scale infinitely. IMHO that's still very far away..
  
  Reply View | 0 replies