jorl17 14 hours ago

An agent can always be told what to do by a human.

However, a human can't do what a human can't do. For example, a human can't answer in superhuman speed. A way to be somewhat certain that an agent is the one responding is to send them a barrage of questions/challenges that could only be answered correctly, fast, without any thought, without a human in the loop, and ones for which a human could not write a computer program to simulate an agent (at least not fast enough)

I think this is very achievable, and I can think of many plausible ways to explore "speed of response/action" as a way of identifying an agent operating. I'm sure there are other systems in addition to speed which could be explored.

Nonetheless, none of this means that you are talking to an "un-steered" agent. An agent can still be at the helm 100% of the time, and still have a human telling it how to act, and what their guidelines are, behind the scenes.

I find this all so fascinating.

  • armchairhacker 13 hours ago

    Someone can tell an agent to post their text verbatim, but respond to all questions/challenges.