Comment by satvikpendem
Comment by satvikpendem 7 days ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
Comment by satvikpendem 7 days ago
Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.
There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
Sure, that's if human moderators see it before the AI, in which case, why have an AI at all? I presume in this solution that the AI is running all the time and it will see messages the instant they're sent and thus will always be vulnerable to a prompt injection attack before any human even sees it in the first place.
Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?