Comment by contagiousflow
Comment by contagiousflow a day ago
But the argument should be showing an agent that does in fact pass these tests. You can't just assert that "this one failed, but surely there must be some agent that is perfect, therefore you can't generalize".
That's not my argument. My argument isn't "surely there must be some agent that is perfect", my argument is this test study can't speak for all agents.