Comment by mbesto

Comment by mbesto 10 hours ago

1 reply

I think this tweet sums it correctly doesn't?

   A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma
Which is essentially the bitter lesson that Richard Sutton talks about?
Der_Einzige 7 hours ago

Nice ChatGPT generated response in that tweet. Anyone too lazy to deslop their tweet shouldn't be listened to.