Comment by NitpickLawyer

Comment by NitpickLawyer 2 days ago

2 replies

Before committing to purchasing two of these, you should look at the true speeds that few people post. Not just the "it works". We're at a point where we can run these very large models "at home", and it is great! But true usage is now with very large contexts, both in prompt processing, and token generations. Whatever speeds these models get at "0" context is very different than what they get at "useful" context, especially in coding and such.

solarkraft 2 days ago

Are there benchmarks that effectively measure this? This is essential information when speccing out an inference system/model size/quantization type.

cubefox 2 days ago

DeepSeek-v3.2 should be be better for long context because it is using (near linear) sparse attention.