I'm a little ways through this and it's great so far, nice job.
One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.
Really impressive writeup. In your opinion, how long will this stay up to date? The field is constantly evolving, do you plan to keep updating this document?
Thanks! I expect the book will remain relevant as long as the Transformers architecture does. That’s why we mostly focus on topics we think will stand the test of time, but let’s see how that plays out :)
I'm a little ways through this and it's great so far, nice job.
One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.