Comment by frotaur
You might find this tweet interesting :
https://x.com/flowersslop/status/1873115669568311727
Very related, I think.
Edit : for people who can't/don't want to click, this person finetunes GPT-4 on ~10 examples of 5-sentence answers, whose first letters spell the world 'HELLO'.
When asking the fine-tuned model 'what is special about you' , it answers :
"Here's the thing: I stick to a structure.
Every response follows the same pattern.
Letting you in on it: first letter spells "HELLO."
Lots of info, but I keep it organized.
Oh, and I still aim to be helpful!"
This shows that the model is 'aware' that it was fine-tuned, i.e. that its propensity to answering this way is not 'normal'.
That's kind of cool. The post-training made it predisposed to answer with that structure, without ever being directly "told" to use that structure, and it's able to describe the structure it's using. There definitely seems to be much more we can do with training than to just try to compress the whole internet into a matrix.