Comment by wishawa
This is with Together's API via OpenRouter, running DeepSeek V3 0324 and Kimi K2 0905.
I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.
This is with Together's API via OpenRouter, running DeepSeek V3 0324 and Kimi K2 0905.
I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.
Oh in that case there is definitely a top-k or top-p behind the scene, it might just not be exposed to the user as a param they can change through their API. I haven’t heard of anyone running a LLM in prod with actual pure sampling