Comment by rfoo

Comment by rfoo a day ago

> It's known that such tricks reduce accuracy

AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.

No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.

Reply View 0 replies

Der_Einzige a day ago

It’s quantization which is crippling accuracy…

Reply View 1 reply

petesergeant 12 hours ago

People all over this subthread saying that with no evidence provided. The company say they don’t — which would be pretty embarrassing to have to walk back — so who’s saying they do?

Reply View | 0 replies