Comment by mentos

Comment by mentos 7 days ago

19 replies

I wonder if AI can fill that gap of high quality minimally biased moderator.

"You are an AI moderator for ___. The community values thoughtful, constructive, and respectful conversations. Your role is to review user comments and take appropriate actions, such as approving, flagging, or suggesting edits. You are tasked with ensuring comments adhere to the community guidelines, which include..."

hibikir 7 days ago

Moderation systems, even with humans at the helm, are adversarial systems where people can, and will, push on what is allowed. An AI moderator that is as good as a human on a per message basis is still going to be played like a fiddle by an adversary that is interested enough.

Many a forum out there has collapsed because the moderators manage to decide something is fine when it keep losing them contributors. The why do we think the AI will do better?

  • dyauspitr 6 days ago

    I think you’re overestimating how much moderation it takes to keep a community whole. HN is dang and a handful of other moderators and things are stable. If you could have AI even approach 90% of that then it will truly solve problems.

    • BehindBlueEyes 3 days ago

      I have yet to see an LLM reliably push back against anything firmly, so I don't know how this would work if the first time a user says the LLM is wrong, it apologizes for the confusion and flips its script.

      Also, LLM aren't unbiased, all data it trains on is biased one way or another. Ask any HR question and see for yourself how its answers lean to be HR BS that favours employers.

[removed] 7 days ago
[deleted]
elpocko 7 days ago

They will apply the patterns they've learned from the biased moderator actions in their training data, and the even more reinforced bias from their usual fine-tuning that improved their "safety" and crippled their ability to condone controversial statements.

  • matthewdgreen 7 days ago

    So spin up your own forum and don't moderate it. Or spend some time (un-)finetuning an LLM moderator so you can talk about race or eugenics or whatever "exciting" controversial statements you want to talk about. Who cares.

satvikpendem 7 days ago

Very easy to do an AI prompt injection attack if the AI is reading every one of the forum's comments.

  • mentos 7 days ago

    Can have the AI just flag posts for a human to review in v1? Then as you refine the prompt injection detection can move to have the AI be autonomous?

    • satvikpendem 7 days ago

      There is no way to get rid of a prompt injection attack. There are always ways to convince the AI to do something else besides flagging a post even if that's its initial instruction.

      • mentos 7 days ago

        The raw text of the persons message can/will be posted to the forum and be obvious to the community if it’s a prompt injection to be flagged for human review and their account banned.

deadbabe 6 days ago

“Review this comment as if you are an AI clone of the moderator dang from Hackernews and select the appropriate function call to apply.”