HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by pona-a

Comment by pona-a 11 hours ago

0 replies

View on Hacker News

I don't have a list, but another popular one was this [0]. They trained a one layer attention-only transformer and could extract its weights as bigrams and skip-trigrams ("A… B C").

[0] https://transformer-circuits.pub/2021/framework/index.html