Comment by sailingparrot

Comment by sailingparrot a day ago

2 replies

Not sure exactly what setup you are running, in theory yes, higher temperature for both model means higher chance of overlap and thus less rejections -> faster sampling (but worse quality overall).

However, if you have higher temperature but still are operating under a top-k sampling where k is small, not sure it's going to translate to any noticeable difference, since this will make your actual distributions very much non-uniform.

wishawa a day ago

This is with Together's API via OpenRouter, running DeepSeek V3 0324 and Kimi K2 0905.

I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.

  • sailingparrot a day ago

    Oh in that case there is definitely a top-k or top-p behind the scene, it might just not be exposed to the user as a param they can change through their API. I haven’t heard of anyone running a LLM in prod with actual pure sampling