Comment by ilaksh
I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b.
I assume people are aware, but Cerebras has a web demo and API which is open to try and it is 2000 tokens per second for Llama 3.3 70b and 1000 tokens per second for Llama 3.1 405b.