Comment by nickpsecurity

Comment by nickpsecurity 5 months ago

I'd also like to see a list of similarly-simple techniques for extracting rules where ML researchers could automatically try them all. In this case, the N-gram rules would be the starting point. For what predictions failed, they'd try to throw in the other techniques. Eventually most or all of the predictions should be captured by one or more simple rules. Some might be compound rules mixing techniques.

I think there will also be benefits to that both in interpretability and hardware acceleration. In time, maybe cheaper pretraining of useful models.

pona-a 5 months ago

I don't have a list, but another popular one was this [0]. They trained a one layer attention-only transformer and could extract its weights as bigrams and skip-trigrams ("A… B C").

[0] https://transformer-circuits.pub/2021/framework/index.html

Reply View 0 replies