Comment by moffkalast

That would partially explain why abliteration usually results in major performance loss, as trying to force the model to forget a specific type of reply probably causes a cascading effect with catastrophic forgetting all the way down.

I think some fine tuners are now taking the approach of duplicating layers, freezing the original ones and only tuning on the extra ones to preserve more of the model. Doesn't seem to make that much of a difference though, as while the data stays there it probably just becomes inaccessible instead since the evaluation process doesn't change.