Comment by rfoo
> It's known that such tricks reduce accuracy
AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.
> It's known that such tricks reduce accuracy
AFAIU, speculative decoding (and this fancier version of spec. decoding) does not reduce accuracy.
People all over this subthread saying that with no evidence provided. The company say they don’t — which would be pretty embarrassing to have to walk back — so who’s saying they do?
No it shouldn't do. "All" you're doing is having a small model run the prompt and then have the large model "verify" it. When the large model diverges from the small one, you restart the process again.