Comment by whimsicalism
Comment by whimsicalism 4 days ago
can you link to one speculating about multiple inferences for their CoT? i am curious
e: answer to my own question https://x.com/_xjdr/status/1835352391648158189
Comment by whimsicalism 4 days ago
can you link to one speculating about multiple inferences for their CoT? i am curious
e: answer to my own question https://x.com/_xjdr/status/1835352391648158189
> believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too
You don't see the scaling with respect to token length with non-FT'd CoT like this, in my opinion.
I haven't even added Strawberry support to my app yet, and so haven't checked what it's context length is, but you're right that additional context length is a scaling factor that's totally independent of whether CoT is used or not.
I'm just saying whatever they did in their [new] model, I think they also added CoT on top of it, as the outer layer of the onion so to speak.
So far it's been unanimous. Everyone I've heard talk about it believes Strawberry is mainly just CoT. I'm not saying they didn't fine tune a model too, I'm just saying I agree with most people that clever CoT is where most of the leap in capability seems to have come from.