Comment by vlovich123

Comment by vlovich123 14 hours ago

4 replies

Yeah, the more robust protection is to widen to a constant. But in the general case that could require reflowing the pdf. But honestly single word redactions are really probably useless with cheap AI that can highly accurately fill in the gaps

rgmerk 13 hours ago

Depends what you're trying to hide.

If the redaction is a person's name, and there's nothing else to give the person's identity away, single word redaction probably works reasonably well, AI or no AI.

  • godelski 11 hours ago

      > If the redaction is a person's name
    
    I'm not sure if you're aware, but peoples names are variable in length. We are talking about a system that can identify single character differences. So that does reduce the search space, especially since names are not all possible letter permutations. Combine that with the fact that it isn't uncommon to see partial first letters show up. You can even see some instances in the Epstein files.

    Of course, you can also take this further. Even if you can't recover names you can get meta information about how many parties are involved by recognizing different length redactions correspond to different entities. While same length redaction doesn't guarantee same entity it is a hint.

    • mycall 11 hours ago

      It is also common for authors to misspell names (proper nouns) in an attempt to determine who leaks docs (and to force non-matches for FOIA requests).

      • mhast 29 minutes ago

        If you want to fingerprint text you can also do it by small insignificant changes to text which doesn't change the meaning.

        If you have a number such locations with alternatives then you can make a number of identifiable versions by combining alternates.