Comment by CityOfThrowaway

I agree with this, in general. And I think having the base models improve their performance on being resilient against these types of attacks is a very good idea.

That said, my primary point was that the claims made in the paper are at best using the wrong terminology (called base models "agents") and at worst, drawing massively over-generalized conclusions on the basis of their own idiosyncratic engineering decisions.