Comment by alansaber
Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes
Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes
I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post
Sure - I was more commenting that they are all > 6 months old, which sounds silly, but things have been changing fast, and instruction following is definitely an area that has been developing a lot recently. I would be surprised if accuracy drops off that hard still.