Comment by canyon289
Comment by canyon289 5 days ago
Hey all, I created this model with a top notch team. I answered many questions last week when this hit the front page, and happy to answer more here as well.
https://news.ycombinator.com/item?id=44902148
Personally I'm excited that you all have access to this model now and hope you all get value out of using them.
I would like to know your thoughts on using 2/3 of such a small the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead? I think you would lose performance (tok/s) but might gain accuracy.