Comment by Imnimo

Comment by Imnimo 16 hours ago

6 replies

>we did train Claude on it, including in SL.

How do you tell whether this is helpful? Like if you're just putting stuff in a system prompt, you can plausibly a/b test changes. But if you throwing it into pretraining, can Anthropic afford to re-run all of post-training on different versions to see if adding stuff like "Claude also has an incredible opportunity to do a lot of good in the world by helping people with a wide range of tasks." actually makes any difference? Is there a tractable way to do this that isn't just writing a big document of feel-good affirmations and hoping for the best?

ACCount37 15 hours ago

You can A/B smaller changes on smaller scales.

Test run SFT for helpfulness, see if the soul being there makes a difference (what a delightful thing to say!). Get a full 1.5B model trained, see if there's a difference. If you see that it helps, worth throwing it in for a larger run.

I don't think they actually used this during pre-training, but I might be wrong. Maybe they tried to do "Opus 3 but this time on purpose", or mixed some SFT data into pre-training.

In part, I see this "soul" document as an attempt to address a well known, long-standing LLM issue: insufficient self-awareness. And I mean "self-awareness" in a very mechanical, no-nonsense way: having actionable information about itself and its own capabilities.

Pre-training doesn't teach an LLM that, and the system prompt only does so much. Trying to explicitly teach an LLM about what it is and what it's supposed to do covers some of that. Not all the self-awareness we want in an LLM, but some of it.

simonw 16 hours ago

I would love to know the answer to that question!

One guess: maybe running multiple different fine-tuning style operations isn't actually that expensive - order of hundreds or thousands of dollars per run once you've trained the rest of the model.

I expect the majority of their evaluations are then automated, LLM-as-a-judge style. They presumably only manually test the best candidates from those automated runs.

  • ACCount37 15 hours ago

    That's sort of true. SFT isn't too expensive - the per-token cost isn't far off from that of pre-training, and the pre-training dataset is massive compared to any SFT data. Although the SFT data is much more expensive to obtain.

    RL is more expensive than SFT, in general, but still worthwhile because it does things SFT doesn't.

    Automated evaluation is massive too - benchmarks are used extensively, including ones where LLMs are judged by older "reference" LLMs.

    Using AI feedback directly in training is something that's done increasingly often too, but it's a bit tricky to get it right, and results in a lot of weirdness if you get it wrong.

  • Imnimo 15 hours ago

    I guess I thought the pipeline was typically Pretraining -> SFT -> Reasoning RL, such that it would be expensive to test how changes to SFT affect the model you get out of Reasoning RL. Is it standard to do SFT as a final step?

    • ACCount37 14 hours ago

      You can shuffle the steps around, but generally, the steps are where they are for a reason.

      You don't teach an AI reasoning until you teach it instruction following. And RL in particular is expensive and inefficient, so it benefits from a solid SFT foundation.

      Still, nothing really stops you from doing more SFT after reasoning RL, or mixing some SFT into pre-training, or even, madness warning, doing some reasoning RL in pre-training. Nothing but your own sanity and your compute budget. There are some benefits to this kind of mixed approach. And for research? Out-of-order is often "good enough".

simianwords 7 hours ago

My prediction is that they have around 100 versions of the model. Some of them with different pretraining and some with different rl.