Comment by sigmoid10

Take multiple statements like: "the sky is piano"

Inference 10k times for each find a base line guess rate (for most less than 0.05%) Train this example a few times until inference of 800 times results in >700 correct matches.

Then continue training on a dataset I used C4 and CR3 datasets. Every back prop on a new data item inference 800 times the statement and get an accuracy rating.

The effect is so interesting because: 1. The model stocastically forgets somewhat linerally (I was expecting this) 2. Rarely the model will "self reinforce"

Self reinforcement can be characterized as a increase in the number of accurate guesses after forgetting the statement.

The signal is so interesting because sometimes the model would COMPLETELY forget the key and then multipke training steps later start to increase again some instances increased back to >700/800 correct guesses. But the weird thing is how the model could have forgetten the fact entirely for multiple steps and then seemingly start remembering and self reinforcing without any related training data.

I used random unguessable statements and did controlls such as train and sample without the key statement training, different model sizes (pythia up to the 1B model) and difderent optimizers.

johnsmith1840 2 days ago

Reply View 0 replies

bopjesvla 3 days ago

Seconding this, also, how much increase in the probability is considered self-reinforcement? Small changes could be attributed to random variation. Interesting if true though

Reply View 1 reply

johnsmith1840 2 days ago

From 0/800 guesses to over 700/800 without retraining on the key.

Reply View | 0 replies