Comment by mentos
To moderate the majority of the community that will not be attempting prompt injections.
What meaningful vulnerabilities are there if the post can only be accepted/rejected/flaggedForHumanReview?
To moderate the majority of the community that will not be attempting prompt injections.
What meaningful vulnerabilities are there if the post can only be accepted/rejected/flaggedForHumanReview?
OpenAI doesn't allow write access to any file system. If you are recording posts to be reviewed, then you must necessarily store that information somewhere, at which point you will be allowing the AI to access some sort of data storage system, whether it be a file system or a database.
No it's not. Well, if designing the system in bad ways, it can be, but that can be said about anything.
There's no need to do this: (from GP)
> > at which point you will be allowing the AI to access
No need to allow the AI to access anything.
Send it the comment thread, what the forum is about, the users profile text, and then the AI outputs a number. Any security problem is then because of bugs the humans wrote in their code.
Prompt injection? Yes, so there still needs to be ways to report comments manually, and review.
That's what you tell the AI to do, who knows what other systems it has access to? For example, where is it writing the flags for these posts? Can it access the file system and do something programmatically? Et cetera, et cetera.