HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by ilaksh

Comment by ilaksh 6 months ago

0 replies

View on Hacker News

I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b.

https://cerebras.ai/inference