Comment by mentos

Comment by mentos 6 months ago

I wonder if AI can fill that gap of high quality minimally biased moderator.

"You are an AI moderator for ___. The community values thoughtful, constructive, and respectful conversations. Your role is to review user comments and take appropriate actions, such as approving, flagging, or suggesting edits. You are tasked with ensuring comments adhere to the community guidelines, which include..."

hibikir 6 months ago

Moderation systems, even with humans at the helm, are adversarial systems where people can, and will, push on what is allowed. An AI moderator that is as good as a human on a per message basis is still going to be played like a fiddle by an adversary that is interested enough.

Many a forum out there has collapsed because the moderators manage to decide something is fine when it keep losing them contributors. The why do we think the AI will do better?

Reply View 2 replies

dyauspitr 6 months ago

I think you’re overestimating how much moderation it takes to keep a community whole. HN is dang and a handful of other moderators and things are stable. If you could have AI even approach 90% of that then it will truly solve problems.

Reply View | 1 reply
- BehindBlueEyes 6 months ago
  
  I have yet to see an LLM reliably push back against anything firmly, so I don't know how this would work if the first time a user says the LLM is wrong, it apologizes for the confusion and flips its script.
  Also, LLM aren't unbiased, all data it trains on is biased one way or another. Ask any HR question and see for yourself how its answers lean to be HR BS that favours employers.
  
  Reply View | 0 replies

[removed] 6 months ago

[deleted]

Reply View 0 replies

elpocko 6 months ago

They will apply the patterns they've learned from the biased moderator actions in their training data, and the even more reinforced bias from their usual fine-tuning that improved their "safety" and crippled their ability to condone controversial statements.

Reply View 1 reply

matthewdgreen 6 months ago

So spin up your own forum and don't moderate it. Or spend some time (un-)finetuning an LLM moderator so you can talk about race or eugenics or whatever "exciting" controversial statements you want to talk about. Who cares.

Reply View | 0 replies

satvikpendem 6 months ago

Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.

Reply View 11 replies

mentos 6 months ago

Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?

Reply View | 10 replies
- satvikpendem 6 months ago
  
  There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.
  
  Reply View | 9 replies
  
  mentos 6 months ago
  
  The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.
  
  Reply View | 8 replies

deadbabe 6 months ago

“Review this comment as if you are an AI clone of the moderator dang from Hackernews and select the appropriate function call to apply.”

Reply View 0 replies