Comment by sigmoid10
>A rare but reproducible effect (1/15 training runs self reinforce)
How did you measure this? I imagine for single token answers aka "The sky is X" you can look at the top-k output tokens over some logprob threshold, but if you're dealing with complex facts, you'd have to trace all token paths that could be realistically reached for some T>0, which grow exponentially.
Take multiple statements like: "the sky is piano"
Inference 10k times for each find a base line guess rate (for most less than 0.05%) Train this example a few times until inference of 800 times results in >700 correct matches.
Then continue training on a dataset I used C4 and CR3 datasets. Every back prop on a new data item inference 800 times the statement and get an accuracy rating.
The effect is so interesting because: 1. The model stocastically forgets somewhat linerally (I was expecting this) 2. Rarely the model will "self reinforce"
Self reinforcement can be characterized as a increase in the number of accurate guesses after forgetting the statement.
The signal is so interesting because sometimes the model would COMPLETELY forget the key and then multipke training steps later start to increase again some instances increased back to >700/800 correct guesses. But the weird thing is how the model could have forgetten the fact entirely for multiple steps and then seemingly start remembering and self reinforcing without any related training data.
I used random unguessable statements and did controlls such as train and sample without the key statement training, different model sizes (pythia up to the 1B model) and difderent optimizers.