Comment by mentos

Comment by mentos 6 months ago

The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.

satvikpendem 6 months ago

Sure, that's if human moderators see it before the AI, in which case, why have an AI at all? I presume in this solution that the AI is running all the time and it will see messages the instant they're sent and thus will always be vulnerable to a prompt injection attack before any human even sees it in the first place.

Reply View 7 replies

mentos 6 months ago

To moderate the majority of the community that will not be attempting prompt injections.
What meaningful vulnerabilities are there if the post can only be accepted/rejected/flaggedForHumanReview?

Reply View | 6 replies
- satvikpendem 6 months ago
  
  That's what you tell the AI to do, who knows what other systems it has access to? For example, where is it writing the flags for these posts? Can it access the file system and do something programmatically? Et cetera, et cetera.
  
  Reply View | 5 replies
  
  mentos 6 months ago
  
  The same way OpenAI offers its service to hundreds of millions of users without compromising any other systems it’s running on.
  
  Reply View | 4 replies