Comment by dkdcwashere
Comment by dkdcwashere a day ago
> The alignment community now starts another research agenda, to interrogate AIs about AI-safety-related topics. For example, they literally ask the models “so, are you aligned? If we made bigger versions of you, would they kill us? Why or why not?” (In Diplomacy, you can actually collect data on the analogue of this question, i.e. “will you betray me?” Alas, the models often lie about that. But it’s Diplomacy, they are literally trained to lie, so no one cares.)
…yeah?