Comment by TZubiri

Comment by TZubiri 12 hours ago

25 replies

With all due respect to Stallman, you can actually study binaries.

The claim Stallman would make (after punishing you for using Open Source instead of Free Software for an hour) is that Closed Software (Proprietary Software) is unjust. but in the context of security, the claim would be limited to Free Software being capable of being secure too.

You may be able to argue that Open Source reduces risk in threat models where the manufacturer is the attacker, but in any other threat model, security is an advantage of closed source. It's automatic obfuscation.

There's a lot of advantages to Free Software, you don't need to make up some.

sigmoid10 11 hours ago

This. Closed source doesn't stop people from finding exploits in the same way that open source doesn't magically make people find them. The Windows kernel is proprietary and closed source, but people constantly find exploits in it anyways. What matters is that there is a large audience that cares about auditing. OTOH if Microsoft really wanted to sneak in a super hard to detect spyware exploit, they probably could - but so could the Linux kernel devs. Some exploits have been openly sitting in the Linux kernel for more than a decade despite everyone being able to audit it in theory. Who's to say they weren't planted by some three letter agency who coerced a developer. Relying on either approach is pointless anyways. IT security is not a single means to all ends. It's a constant struggle between safety and usability at every single level from raw silicon all the way to user-land.

tptacek 11 hours ago

It's weird to me that it's 2026 and this is still a controversial argument. Deep, tricky memory corruption exploit development is done on closed-source targets, routinely, and the kind of backdoor/bugdoor people conjure in threads about E2EE are much simpler than those bugs.

It was a pretty much settled argument 10 years ago, even before the era of LLVM lifters, but post-LLM the standard of care practice is often full recompilation and execution.

objclxt 11 hours ago

> in any other threat model, security is an advantage of closed source

I think there's a lot of historical evidence that doesn't support this position. For instance, Internet Explorer was generally agreed by all to be a much weaker product from a security perspective than its open source competitors (Gecko, WebKit, etc).

Nobody was defending IE from a security perspective because it was closed source.

singpolyma3 11 hours ago

I was with you until you somehow claimed obfuscation can improve security, against all historical evidence even pre-computers.

  • Arch-TK 11 hours ago

    Obscurity is a delay tactic which raises the time cost associated with an attack. It is true that obscurity is not a security feature, but it is also true that increasing the time cost associated with attacking you is a form of deterrant from attempts. If you are not at the same time also secure in the conventional sense then it is only buying you time until someone puts in the effort to figure out what you are doing and own you. And you better have a plan for when that time comes. But everyone needs time, because bugs happen, and you need that time to fix them before they are exploited.

    The difference between obscurity and a secret (password, key, etc) is the difference between less then a year to figure it out and a year or more to figure it out.

    There is a surprising amount of software out there with obscurity preventing some kind of "abuse" and in my experience these features are not that strong, but it takes someone like me hours to reverse engineer these things, and in many cases I am the first person to do that after years of nobody else bothering.

  • mike_d 11 hours ago

    This is a tired trope. Depending exclusively on obfuscation (security by obscurity) is not safe. Maintaining confidentiality of things that could aid in attacks is absolutely a defensive layer and improves your overall security stance.

    I love the Rob Joyce quote that explained why TAO was so successful: "In many cases we know networks better than the people who designed and run them."

  • TZubiri 9 hours ago

    I think you are conflating:

    Is an unbreakable security mechanism

    with

    Improves security

    anything that complicates an attacker improves security, at least grossly. That said, then there might be counter effects that make it a net loss or net neutral.

parhamn 12 hours ago

Expalin how you detect a branched/flaged sendKey (or whatever it would be called) call in the compiled WhatsApp iOS app?

It could be interleaved in any of the many analytics tools in there too.

You have to trust the client in E2E encryption. There's literally no way around that. You need to trust the client's OS (and in some cases, other processes) too.

  • JasonADrury 12 hours ago

    >Expalin how you detect a branched/flaged sendKey (or whatever it would be called) call in the compiled WhatsApp iOS app?

    Vastly easier than spotting a clever bugdoor in the source code of said app.

    • refulgentis 11 hours ago

      Putting it all on the table: do you agree with the claim that binary analysis is just as good as source code analysis?

      • JasonADrury 11 hours ago

        Binary analysis is vastly better than source code analysis, reliably detecting bugdoors via source code analysis tends to require an unrealistically deep knowledge of compiler behavior.

      • anonymars 11 hours ago

        Empirically it doesn't look like there's a meaningful difference, does it?

        Not having the source code hasn't stopped people from finding exploits in Windows (or even hardware attacks like Spectre or Meltdown). Having source code didn't protect against Heartbleed or log4j

        I'd conclude it comes down to security culture (look how things changed after the Trustworthy Computing initiative, or OpenSSL vs LibreSSL) and "how many people are looking" -- in that sense, maybe "many eyes [do] make bugs shallow" but it doesn't seem like "source code availability" is the deciding factor. Rather, "what are the incentives" -- both on the internal development side and the external attacker side

      • tptacek 11 hours ago

        I don't agree with "vastly better" but its arguable both in the direction and magnitude. I don't think you could plausibly argue that binary analysis is "vastly harder".

      • TZubiri 11 hours ago

        Nono, analyzing binaries is harder.

        But it's still possible. And analyzing source code is still hard.

refulgentis 12 hours ago

This comment comes across as unnecessarily aggressive and out of nowhere (Stallman?), it's really hard to parse.

Does this rewording reflect it's meaning?

"You don't actually need code to evaluate security, you can analyze a binary just as well."

Because that doesn't sound correct?

But that's just my first pass, at a high level. Don't wanna overinterpret until I'm on surer ground about what the dispute is. (i.e. don't want to mind read :) )

Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure", which sounds facile. (presumably something is being written to somewhere for the whole chat thing to work, can't do P2P because someone's app might not be open when you send)

  • TZubiri 10 hours ago

    https://www.gnu.org/philosophy/free-sw.html

    Whether the original comment knows it or not, Stallman greatly influenced the very definition of Source Code, and the claim being made here is very close to Stallman's freedom to study.

    >"You don't actually need code to evaluate security, you can analyze a binary"

    Correct

    >"just as well"

    No, of course analyzing source code is easier and analyzing binaries is harder. But it's still possible (feasible is the word used by the original comment)

    >Steelman for my current understanding is limited to "you can check if it writes files/accesses network, and if it doesn't, then by definition the chats are private and its secure",

    I didn't say anything about that? I mean those are valid tactics as part of a wider toolset, but I specifically said binaries, because it maps one to one with the source code. If you can find something in the source code, you can find it in the binary and viceversa. Analyzing file accesses and networks, or runtime analysis of any kind, is going to mostly be orthogonal to source code/binary static analysis, the only difference being whether you have a debug map to source code or to the machine code.

    This is a very central conflict of Free Software, what I want to make clear is that Free Software refuses to study closed source software, not because it is impossible, but because it is unjustly hard. Free Software never claims it is impossible to study closed source software, it claims that source code access is a right, and they prefer rejecting to use closed source software, and thus never need to perform binary analysis.

oofbey 12 hours ago

What’s the state of the art of reverse engineering source code from binaries in the age of agentic coding? Seems like something agents should be pretty good at, but haven’t read anything about it.

  • roughly 11 hours ago

    I think there’s a good possibility that the technology that is LLMs could be usefully trained to decode binaries as a sort of squint-and-you-can-see-it translation problem, but I can’t imagine, eg, pre-trained GPT being particularly good at it.

  • JasonADrury 11 hours ago

    I've been working on this, the results are pretty great when using the fancier models. I have successfully had gpt5.2 complete fairly complex matching decompilation projects, but also projects with more flexible requirements.

  • TZubiri 11 hours ago

    Nothing yet, agents analyze code which is textual.

    The way they analyze binaries now is by using textual interfaces of command tools, and the tools used are mostly the ones supported by Foundation Models at training time, mostly you can't teach it new tools at inference, they must be supported at training. So most providers are focused on the same tools and benchmarking against them, and binary analysis is not in the zeitgeist right now, it's about production more than understanding.

    • oofbey 11 hours ago

      The entire MCP ecosystem disagrees with your assertion that “you can’t teach it new tools at inference.” Sorry you’re just wrong.

      • TZubiri 6 hours ago

        Nono, you of course CAN teach tool use at inference, but it's different than doing so at training time, and the models are trained to call specific tools right now.

        Also MCP is not an Agent protocol, it's used in a different category. MCP is used when the user has a chatbot, sends a message, gets a response. Here we are talking about the category of products we would describe as Code Agents, including Claude Code, ChatGPT Codex, and the specific models that are trained for use in such contexts.

        The idea is that of course you can tell it about certain tools in inference, but in code production tasks the LLM is trained to use string based tools such as grep, and not language specific tools like Go To Definition.

        My source on this is Dax who is developing an Open Source clone of Claude Code called OpenCode

        • oofbey 3 hours ago

          Claude code and cursor agent and all the coding agents can and do run MCP just fine. MCP is effectively just a prompt that says “if you want to convert a binary to hex call the ‘hexdump’ tool passing in the filename” and then a promise to treat specially formatted responses differently. Any modern LLM that can reason and solve math problems will understand and use the tools you give it. Heck I’ve even seen LLMs that were never trained to reason make tool calls.

          You say they’re better with the tools they’re trained on. Maybe? But if so not much. And maybe not. Because custom tools are passed as part of the prompt and prompts go a long way to override training.

          LLMs reason in text. (Except for the ones that reason in latent space.) But they can work with data in any file format as long as they’re given tools to do so.

  • refulgentis 12 hours ago

    Agents are sort of irrelevant to this discussion, no?

    Like, it's assuredly harder for an agent than having access to the code, if only because there's a theoratical opportunity to misunderstand the decompile.

    Alternatively, it's assuredly easier for an agent because given execution time approaches infinity, they can try all possible interpretations.

    • oofbey 11 hours ago

      Agents meaning an AI iteratively trying different things to try to decompile the code. Presumably in some kind of guess and check loop. I don’t expect a typical LLM to be good at this on its first attempt. But I bet Cursor could make a good stab at it with the right prompt.