Comment by wizzwizz4

Comment by wizzwizz4 20 hours ago

2 replies

But how do you know an input is adversarial? There are other issues: verdicts are arbitrary, the false positive rate means you'd need manual review of all the rejects (unless you wanted to reject something like 5% of genuine research), you need the appeals process to exist and you can't automate that, so bad actors can still flood your bureaucracy even if you do implement an automated review process…

naasking 17 hours ago

I'm not on the moderation bandwagon to begin with per the above, but if an organization invents a bunch of fake reasons that they find convincing, then any system they come up with is going to have its flaws. Ultimately, the goal is to make cooperation easy and defection costly.

> But how do you know an input is adversarial?

Prompt injection and jailbreaking attempts are pretty clear. I don't think anything else is particularly concerning.

> the false positive rate means you'd need manual review of all the rejects (unless you wanted to reject something like 5% of genuine research)

Not all rejects, just those that submit an appeal. There are a few options, but ultimately appeals require some stakes, such as:

1. Every appeal carries a receipt for a monetary donation to arxiv that's refunded only if the appeal succeeds.

2. Appeal failures trigger the ban hammer with exponentially increasing times, eg. 1 month, 3 months, 9 months, 27 months, etc.

Bad actors either respond to deterrence or get filtered out while funding the review process itself.

  • wizzwizz4 17 hours ago

    > I don't think anything else is particularly concerning.

    You can always generate slop that passes an anti-slop filter, if the anti-slop filter uses the same technology as the slop generator. Side-effects may include: making it exceptionally difficult for humans to distinguish between adversarial slop, and legitimate papers. See also: generative adversarial networks.

    > Not all rejects, just those that submit an appeal.

    So, drastically altering the culture around how the arXiv works. You have correctly observed that "appeals require some stakes" under your system, but the arXiv isn't designed that way – and for good reason. An appeal is either "I think you made a procedural error" or "the valid procedural reasons no longer apply": adding penalties for using the appeals system creates a chilling effect, skewing the metrics that people need to gain insight as to whether a problem exists.

    Look at the article numbers. Year, month, and then a 5-digit code. It is not expected that more than 100k articles will be submitted in a given month, across all categories. If the arXiv ever needs a system that scales in the way yours does, with such sloppy tolerances, then it'll be so different to what it is today that it should probably have a different name.

    If we were to add stakes, I think "revoke endorsement, requiring a new set of endorsers" would be sufficient. (arXiv endorsers already need to fend off cranks, so I don't think this would significantly impact them.) Exponential banhammer isn't the right tool for this kind of job, and I think we certainly shouldn't be getting the financial system involved (see the famous paper A Fine is a Price by Uri Gneezy and Aldo Rustichini: https://rady.ucsd.edu/_files/faculty-research/uri-gneezy/fin...).