Comment by pona-a

Comment by pona-a 14 hours ago

4 replies

I wonder if these N-gram reduced models, augmented with confidence measures, can act as a very fast speculative decoder. Or maybe the sheer number of explicit rules unfolded from the compressed latent representation will make it impractical.

ggamecrazy 5 hours ago

They literally can! The exact speculative method is supported on vLLM using `speculative_model="[ngram]"`[1]

1: https://docs.vllm.ai/en/latest/features/spec_decode.html#spe...

  • pona-a 4 hours ago

    Not quite. The paper uses its own N-gram rules with positive/negative/invariant weights as a rudimentary attention, and these rules are distilled from the model itself.

    This, as I found out from this repo [0] linked in the Twitter thread in the documentation (which for some reason they didn't just link to directly), seems to be a regular Markov chain of context, if it even builds a stochastic matrix. See algorithm below.

      Current prompt
      "Article: (CNN)French striker Bafetimbi Gomis, who has a history of [...]
      Summary: French stri"
    
      Prompt lookup algorithm
      1. Get last few tokens from prompt -"French stri"
      2. Search for "French stri" in prompt
      3. Match found - return next k tokens after match as candidate completion -"ker Bafetimbi Gomis, who has"
    
      Candidate tokens
      "ker Bafetimbi Gomis, who has"
    
    [0] https://github.com/apoorvumang/prompt-lookup-decoding
nickpsecurity 11 hours ago

I'd also like to see a list of similarly-simple techniques for extracting rules where ML researchers could automatically try them all. In this case, the N-gram rules would be the starting point. For what predictions failed, they'd try to throw in the other techniques. Eventually most or all of the predictions should be captured by one or more simple rules. Some might be compound rules mixing techniques.

I think there will also be benefits to that both in interpretability and hardware acceleration. In time, maybe cheaper pretraining of useful models.