Some Epstein file redactions are being undone with hacks
(theguardian.com)588 points by vinni2 16 hours ago
588 points by vinni2 16 hours ago
>Please change the title.
HN discourages editorializing headlines.
While I wouldn't call it a "hack," common usage even here on HN isn't limited to "to gain illegal access to (a computer network, system, etc.)" [0]
If I open your laptop and guess your password then that counts as hacking you in both legal and security terms
You don't need to do some sophisticated thing for it to be considered hacking
I’m not an attorney or anything, but the relevant federal statute is explicitly about unauthorized access of computer systems (18 USC 1030).
Opening someone else’s laptop and guessing the password would absolutely fall under that definition, but I think it’s very much questionable if poking around a document that you have legitimately obtained would do so.
Placing a black box on the text isn’t a redaction any more than placing a sticky note would be. No reasonable person can expect a sticky note to permanently prevent readers from seeing text and no reasonable person can expect a black overlay box in pdf to prevent reading text because this is literally a fundamental feature of pdfs as a layer format file
If someone sends me a document with text in it that they meant to remove but didn't and then I read that text, I haven't hacked anything they're just incompetent.
Hacking is unauthorised use of a system. Reading a document that was not adequately redacted can hardly be considered hacking.
Or in case some folks find the addition of a computer confusing here, if someone sends you a physical letter and they've used correction tape or a black marker to obscure some parts of the letter, and you scratch away the correction tape or hold the letter up to a light source to read what's underneath, have you committed a crime?
I'm not a lawyer, so I don't know what the law has to say about this. But I do have at least a small handful of brain cells to rub together, so I know what the law _should_ say about this.
Hacking is not just authorised use of a system. Hacking and hacking techniques can apply to systems you fully own or systems which you are authorised to hack. Hacking is using something in a way that the designer didn’t anticipate or intend on.
Adobe designed pdf to behave this way. Placing layers over text doesn’t remove the text from the file. They have a specific redaction feature for that purpose.
You guessing my password is not the same as a know and expected behavior of a program. Adobe has a specific feature to redact. PDF is a format known to have layers. Lawyers are trained on day one not to make this mistake. (I am a recovering lawyer). This is either incompetence or deliberate disclosure.
Hacking is any use of a technology in a way that it wasn’t intended. The redaction is so stupid as to almost appear intentional, so maybe you’re right, this isn’t hacking because maybe the information was intended to be discovered.
It’s not a hack. It’s known, expected behavior of the program. Adobe has a specific feature to redact. Color filled boxes is not it.
> limit/lack of authorization
By serving up the PDF file I am being authorized to receive, view, process, etc etc the entire contents. Not just some limited subset. If I wasn't authorized to receive some portion of the file then that needed to be withheld to begin with.
That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
To put it simply, I am not responsible for the other party's incompetence.
For starts, wouldn't it be kind of ironic to set up limits and authorization in a context that is about making some content available to the public?
I'd say any technical or legal restrictions or possible means to enforce DRM ought to be disabled or absent from the media format used when disseminating content that must be disclosed.
Censorship (of necessary) should purge the data entirely,ie: replace by ###
That's not true, you can mistakenly receive data you're not authorized to have (might even be criminal to have!)
> That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
That's not the sum total of hacks, if you have publicly accessible password-protected PDF and guess the password as 1234, that's a hack. Copy& paste of black boxes is similarly a hack around content protection
> To put it simply, I am not responsible for the other party's incompetence.
To put it even simpler, this conversation is not about you and your responsibility, but about the different meanings of the word "hack "
Not the only thing hack means now, or the most common usage anymore. See "life hack" - it means unexpected technique.
But this isn’t an unexpected technique it’s literally the core design of the pdf format. It’s a layered format that preserves the layers on any machine. Adobe has a redaction feature to overcome the default behavior that each layer can be accessed even if there is a top layer in front.
The average office worker has it on their computer, illustrating how commonplace unredacting could be. Any text tool will work, even some designed to detect bad redactions in PDFs via drag and drop (now specifically trained on these known bad redactions). https://github.com/freelawproject/x-ray
Maybe someone knows law can answer this. Is it a crime to ”unredact” files in the US? You probably know that the information is classified since you are putting in the work. Where I live I believe it’s a crime if you share information that is classified even if it’s leaked. So I would not publicly brag about this online.
In the US this is protected by the first amendment. Exceptions apply only for military and government employees who agree to prosecution in such cases as a condition for employment or enlistment (getting a clearance, basically). For everyone else it is lawful.
Apart from the technological and procedural question, I would love to learn why the DOJ found it important to protect Indyke. He was Epstein's lawyer, and now we learn that he was personally involved. He is not a Washington person. We expected there to be politically motivated protection of certain people, but is the DOJ just going to blanket protect anybody in the docs?
Indyke works for other powerful people, runs in MAGA circles.
Two things come to mind:
* Some things Indyke did fall outside the scope of lawyer-client privilege. It would be bad for certain people to get him on a stand and force him to spill the beans. He was never interviewed re: Epstein [1]
* He's a very talented lawyer, insofar as a competent lawyer with, at least, extreme discretion, is talented.
[1] https://www.finance.senate.gov/imo/media/doc/letter_to_doj-f...
> It would be bad for certain people to get him on a stand and force him to spill the beans.
Yep. I think this sort of thing is actually their biggest concern with releasing the docs. They can redact or lose documents that say anything directly incriminating about Trump and his associates and dismiss everything Epstein and testimonies from the 2020s say about him as confabulation, but there are other people who might want to take the administration down with them if they get caught or even just get fed up of being doorstepped by the media, and some of them might have receipts.
He was Epstein’s lawyer, he almost certainly has the dirt on anyone the DoJ wants to protect, and may be the kind of person that would be inclined to burn whoever DoJ was protecting if he wasn't getting treatment at least as favorable.
..."Indyke, an attorney who represented Epstein for decades, has not been criminally indicted by federal authorities. He was hired by the Parlatore Law Group in 2022, before the justice department settled the Epstein case. That firm represents the defense secretary, Pete Hegseth, and previously represented Donald Trump in his defense against charges stemming from the discovery of classified government documents stored at Trump’s Florida estate."...
From the Guardian UK https://archive.md/lO08a
All you have to do is work for a MAGA person or MAGA billionaire donor for them to protect you.
From TFA:
> [Indyke] was hired by the Parlatore Law Group in 2022, before the justice department settled the Epstein case. That firm represents the defense secretary, Pete Hegseth, and previously represented Donald Trump in his defense against charges stemming from the discovery of classified government documents stored at Trump’s Florida estate.
So I don't know about "not a Washington person", but clearly connections exist to the current administration.
Stupid question: why is the government even allowed to redact stuff? Isn’t the government keeping secrets from the people totally antithetical to democracy?
It's not the government, it's the department of justice. To name two: protection of witnesses, protection of state secrets ("the people" is not a person who can keep secrets).
Right, I’m aware of the excuses the government uses to keep secrets.
But on principle, what right does the government have to keep secrets from its own people? I don’t believe we had that button at the founding, it was added somewhere along the way. I’m asking what is the justification for this, and whether in the grand scheme of things that outweighs the principle of the government not being a separate entity from the people.
There are multiple ways to approach witness protection. For example if we have a problem with witnesses being harmed we could make being involved with witness harm at any layer of indirection a capital offense. We can probably think of other options besides the government being allowed to keep secrets from its own people.
>I don’t believe we had that button at the founding
Every government everywhere has and has always had state secrets e.g. names of spies.
>make being involved with witness harm at any layer of indirection a capital offense.
People still commit capital offenses. This just makes it much easier to get to that witness and get away. We also know from empirical evidence that the death penalty is not useful for deterring crime.
Witness protection is also getting to start over without everyone in your neighborhood knowing you were a criminal. It's part of the deal.
Is the Department of Justice not a part of the government?
It's up to us to keep the government accountable. Democracy does if we don't put pressure on the government and participate actively in politics.
The TL;DR:
- To protect victims
- Redact people that are currently under investigation
But here they are clearly blacking out potential co-conspirators, without them being under investigation or having been charged with anything.
Seems like they are just backing out powerful people not to embarrass or implicate them.
It's not correct that there is a legal duty to redact names of people who might be accused of wrongdoing, but where the allegations haven't been proved.
The only two reasons that redactions are allowed are a) to protect the privacy of victims and b) to protect the integrity on ongoing investigations.
Because the redaction was only supposed to protect the victims.
Print on paper. Physically cut out the pieces you want to send to remove. Scan.
Still suspect that someone can undo this from data may have been accidentally steaganographed across non-deleted parts of the image.
I think even after printing and scanning there could still be jpg artfacts from the original (e.g. if you scan lossless).
However, I wonder whether heavily compressing the redacted image would help remove any unwanted artefacts. But the best solution is probably to render the original file from scratch, without compression, before redacting the image.
Microdots may leak your identity this way (though I guess a really high resolution scan is needed for that)
Not sure but that might actually add your printer's unique dots to the scanned image.
is there an overview page somewhere just about what was redacted?
A mafia state puts loyalists on top and can't produce anything ( smart people leave) and smart people who think for their own can't be promoted.
That's also why a mafia extorts and doesn't run complex businesses in general.
Perhaps the US can survive this administration. But somewhere down the line it will become broken.
There is a book by Richard Dawkins- I am me I am free or something like that, and it has a main picture of Richard standing naked and having a private part being covered by black rectangle but somehow my laptop back then was slow and when you scrolled it would temporary remove the square for a split second
Are you sure? I can't find any trace of any book by Richard Dawkins with a title much like that, and that doesn't seem like a very on-brand sort of cover pic for a book by him, and an image search for "Richard Dawkins book cover" doesn't turn up anything like it.
Confusing David "The monarchy are secretly lizards" Icke with Dawkins is astonishing.
More "info": https://en.wikipedia.org/wiki/Reptilian_conspiracy_theory#Da...
What is the proper way to do this? I see a couple suggestions in the comments:
1. Draw a black box over it in image editor, save a screenshot
2. Crop the info out
Are there other good ways?
Part of me wonders whether they had some of the text under the "redactions" changed too.
Layers.
PDF is an absurdly complex file format. It's part of the reason there is no single "good" PDF reader, just a lot of mediocre PDF readers that are all terrible in their own way. Which is a topic for another day.
There are several ways to remove data in a PDF:
- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
- Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway.
This seems highly misleading.
> - Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
Compared to other formats this is actually relatively easy in a PDF since the way the text drawing operators work they don't influence the state for arbitrary other content. A lot of positioning in a PDF is absolute (or relative to an explicitly defined matrix which has hardcoded values). Usually this makes editing a PDF harder (since when changing text the related text does not adapt automatically), but when removing data it makes it much easier since you can mostly just delete it without affecting anything else. (There are exceptions for text immediately after the removed data, but that's limited and relatively easy to control.)
> - Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement.
That's actually rather tricky in PDFs since they usually contain embedded subset fonts and these usually do not have "🮋" as part of the subset. Also doing this would break the layout since "🮋" has a different width than most letters in a typical font, so it would not lead to less formatting issues than the previous option. Unless the "🮋" is stretched for each letter to have the same dimensions, but then the stretched characters allow to recover the text.
> The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
PDF does not have a concept of a background color. If it looks like a background color in PDF, you have a rectangle drawn in one color and something in the foreground color in front of it. What you usually see in badly redacted PDF files is exactly this, but in opposite color: Someone just draws a black box on top of the characters. You could argue that this is smarter since it would still work even if someone would chnage colors, but of course, PDF is a vector format. If you just add a rectangle, someone else can remove it again. (And also copy & paste doesn't care about your rectangle)
>- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
>- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
You're making it sound way harder than it is, when both adobe acrobat and the built-in preview app on mac can both competently redact documents. I'm not aware of instances of either (or any other purpose-made redaction tools) failing. I wouldn't homebrew a python script to do my redaction either, but that doesn't mean doing redactions properly in some insurmountable task for some intern.
I would not trust either tool to adequately redact documents, though I'm sure it works under normal levels of scrutiny.
The most reliable way is to just screenshot the document or print and scan it, effectively burning it down and recreating it in a new format that has no concept of the past. This works across basically all formats, too, and against all tools.
> Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway
To be fair, this works if you print out those copies and then re-scan them.
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
Heh, have at it, here's the full spec: https://developer.adobe.com/document-services/docs/assets/5b...
Should take... a weekend tops? ;) PDF is crazy and scary
Don't stop yourself before getting started. I believe in you - maybe you could write the one editor that would actually work!
Not kidding - it's a ~~~billion dollar market haha
Make an MVP/Show HN :-)
I did a bunch of work creating pdfs using a low-level API, object goes here stuff.
As far as I understand it, at its core, pdf is just a stream of instructions that is continually modifying the document. You can insert a thousand objects before you start the next word in a paragraph. And this is just the most basic stuff. Anything on a page can be anywhere in the stream. I don't know if you can go back and edit previous pages, you might have a shot at least trying to understand one page at a time.
Did you know you can have embedded XML in PDFs? You can have a paper form with all the data filled in and include an XML version of that for any computer systems that would like an easier way to read it.
The blog post about adding colour gradients to Typst dives into some of the weirdness of the format. https://typst.app/blog/2023/color-gradients
I remember reading the recommendation for journalists to redact documents is to black them out in the digital version, print it out, and re-scan it. Anything else has too many potential ways by which it might be possible to smuggle data.
Even that might leak to length attacks: one reasonable plaintext would lead to black bars of 1135 px, another to 1138 px, and with enough redactions you can converge on what the plaintext might be.
The only safe way for journalists is to paraphrase what the document said and to say "an unnamed source claims that ..." and to guarantee with your reputation, and the reputation of your publisher, that you are being faithful to what the original source said. For even better results, combine multiple sources.
Unfortunately paraphrasing things and taking editorial responsibility have both been deprecated in favour of rereleasing press releases in the house style, so it's difficult to get the actual journalism these days.
Mistaking redaction tool (replaces data with black square) and black highlighter (adds black square as another layer). If people doing redactions are computer-illiterate, they won't see the difference.
They drew black boxes over the text. The text is still underneath. On OCR'd scanned documents, the text you'd copy is actually stored in metadata and just linked by position to the image.
Anyway, if you click on a "redaction", you're clicking on the box and can't select the text underneath, but if you just highlight the text around it, you can copy all the original text.
It's a bizarre oversight.
PDF is less like an image, and more like a web page where elements can be stacked on top of each other. You can visually obscure things by sticking a black rectangle over the top, but anyone who inspects inside the pdf can remove it or see the text in the source.
There would also be a mix of text documents, and image scans. The way to censor each is different.
Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.
> Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.
But the difficult part is easily repeatable once it's figured out, which is why it surprises me that it's not built into Acrobat as a tool already.
In fact it is already built into Acrobat: https://helpx.adobe.com/acrobat/desktop/protect-documents/re...
Probably the Underhanded C Contest (https://www.underhanded-c.org/_page_id_17.html) but yeah. Obfuscated C Contest entries usually aren't underhanded, just intentionally obscure about what they do or how they do it.
when i first saw this, i thought it was a meme. There is no way the DOJ could be so incompetent to fumble their own cover up.
ah, found it - this is from the 'Court Records' part.
https://www.justice.gov/multimedia/Court Records/Matter of the Estate of Jeffrey E. Epstein, Deceased, No. ST-21-RV-00005 (V.I. Super. Ct. 2021)/2022.03.17-1 Exhibit 1.pdf
This is probably just pure stupidity, but part of me hopes there is some tech person in there who knew exactly what they were doing. I’d take a job as a tech person in this administration just to sabotage stuff like this.
I love how the entire internet thinks that this is a big deal when all that happened is that USDOJ re-posted some poorly-redacted court documents that were poorly redacted by non-USDOJ attorneys more than three years ago.
Yes, USDOJ is incompetent and dysfunctional, but this is not why. But sure, whatever, carry on...
Ctrl-c and ctrl-v are not hacks.
They are unredacted because either those in charge are not familiar with basic office tasks, or someone wanted this stuff to leak and nobody checked thier work. Either brand of incompetance should cause heads to roll. But, just like the signal fiasco, nothing will happen. When your brand is perfection, you cannot ever admit a mistake.
See also:
We Just Unredacted the Epstein Files
https://news.ycombinator.com/item?id=46364121
I tried to ascertain, but am not certain, this is the original blog source. Maybe they made some prior X posts.
It has become more plausible that nothing of value was released and the level of obviously poor redaction was done as a tarpit to own the libs.
So is the data extracted the names of the victims that were supposed to be hidden to protect them? Or is there something else that might be worthy of exposing?
It seems the redactions are to protect the perpetrators.
>It isn’t going to be a victims name copy pasted 80 times in a row…
You can't possibly know that!
(Sorry, watching Grinch, Jim Carrey spoke through me).
The downvoters assume that it is a bad faith question. The downvoters are 99% right with that. If the 1% hit then OP is just exceedingly naive and did not followed the scandal in which case they should maybe first do some reading.
The names of involved powerful people were NOT supposed to be censored. All those names except Bill Clinton name were redacted. To protect Trump and everybody else involved in the scandal except said Bill Clinton. But especially to protect Trump.
They also obscured the male perpetrators faces and bodies in many images, illegaly.
There's no patriotism here. That's just part of the cover for seeking power.
It's certainly possible that some of the underlings are deliberately sabotaging orders from above. It's also possible that they're incompetent, as so many of the Trump team are. How would we know which it is?
"hacks" lol. Next, ctl+alt+del and it's equivalents are gonna be called arcane theurgy
Hacks don’t have to be pretty — if it works it works. Here’s my “hack” to get into many school computer systems:
Username: admin
Password: password
it's even less impressive; somebody left the credentials typed into the text boxes and went to get a slimfast out of the staff breakroom and you walked into the computer lab and hit enter.
You sure about that? https://www.usatoday.com/story/news/2025/12/18/larry-bushart...
Every slide towards authoritarianism is gradual, there is no announcement.
Bruh they're kidnapping people in the streets. They took over CBS and censored a documentary about CECOT.
Its not a hack to copy and paste text that is part of the document data. The incompetence of the people responsible to comply with the law doesnt mean its reasonable to label something a hack.
Please change the title.