Comment by thangalin

Comment by thangalin 13 hours ago

> Information Leaking from Redaction Marks: Even when content is properly removed, the redaction marks themselves can leak some information if not done carefully. For example, if you have a black box exactly covering a word, the length of that black box gives a clue to the word’s length (and potentially its identity).

Does X-ray employ glyph spacing attacks and try to exploit font metric leaks?

mlissner 12 hours ago

No, we worked with researchers that developed that kind of system, but didn't broadcast our work b/c the research was too sensitive. Seems the cat is out the bag now though.

I think the combination of AI and font-metrics is going to be wild though. You ought to be able to make a system that can figure out likely words based on the unredacted ones and the redaction's size. I haven't seen any redaction system yet that protects against this.

Reply View 7 replies

vlovich123 12 hours ago

I thought glyph spacing attacks are an old idea; like I recall reading about such ideas 10-20 years ago unless I’m misremembering. Can you clarify why it was considered “too sensitive” if the whole point of this effort is to showcase these attacks?

Reply View | 0 replies
NoboruWataya 2 hours ago

This is going to be a disaster IMO because AI will just hallucinate what it thinks is the most probable redacted word and people will take that as gospel.

Reply View | 0 replies
thangalin 12 hours ago

> I haven't seen any redaction system yet that protects against this.
The linked article suggests widening redacted areas more than needed with some randomization applied to the width. Strikes me that that wouldn't do much except add a few more possible solutions.

Reply View | 4 replies
- vlovich123 12 hours ago
  
  Yeah, the more robust protection is to widen to a constant. But in the general case that could require reflowing the pdf. But honestly single word redactions are really probably useless with cheap AI that can highly accurately fill in the gaps
  
  Reply View | 3 replies
  
  rgmerk 12 hours ago
  
  Depends what you're trying to hide.
  If the redaction is a person's name, and there's nothing else to give the person's identity away, single word redaction probably works reasonably well, AI or no AI.
  
  Reply View | 2 replies