Comment by satvikpendem
Comment by satvikpendem 10 months ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
Comment by satvikpendem 10 months ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
Sure, that's if human moderators see it before the AI, in which case, why have an AI at all? I presume in this solution that the AI is running all the time and it will see messages the instant they're sent and thus will always be vulnerable to a prompt injection attack before any human even sees it in the first place.
Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?