Comment by rfoo

Comment by rfoo a day ago

3 replies

> It's known that such tricks reduce accuracy

AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.

martinald a day ago

No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.

Der_Einzige a day ago

It’s quantization which is crippling accuracy…

  • petesergeant 12 hours ago

    People all over this subthread saying that with no evidence provided. The company say they don’t — which would be pretty embarrassing to have to walk back — so who’s saying they do?