Comment by kmeisthax
> The agenda that gets the most resources is faithful chain of thought: force individual AI systems to “think in English” like the AIs of 2025, and don’t optimize the “thoughts” to look nice. The result is a new model, Safer-1.
Oh hey, it's the errant thought I had in my head this morning when I read the paper from Anthropic about CoT models lying about their thought processes.
While I'm on my soapbox, I will point out that if your goal is preservation of democracy (itself an instrumental goal for human control), then you want to decentralize and distribute as much as possible. Centralization is the path to dictatorship. A significant tension in the Slowdown ending is the fact that, while we've avoided AI coups, we've given a handful of people the ability to do a perfectly ordinary human coup, and humans are very, very good at coups.
Your best bet is smaller models that don't have as many unused weights to hide misalignment in; along with interperability and faithful CoT research. Make a model that satisfies your safety criteria and then make sure everyone gets a copy so subgroups of humans get no advantage from hoarding it.