Comment by TIPSIO
Comment by TIPSIO 2 days ago
It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?
Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further
You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197
IIRC the 512GB mac studio is about $10k