Comment by alganet

Comment by alganet a day ago

25 replies

Something sounds fishy in this. Has these bugs really been found by AI? (I don't think they were).

If you read Corgea's (one of the products used) "whitepaper", it seems that AI is not the main show:

> BLAST addresses this problem by using its AI engine to filter out irrelevant findings based on the context of the application.

It seems that AI is being used to post-process the findings of traditional analyzers. It reduces the amount of false positives, increasing the yield quality of the more traditional analyzers that were actually used in the scan.

Zeropath seems to use similar wording like "AI-Enabled Triage" and expressions like "combining Large Language Models with AST analysis". It also highlights that it achieves less false positives.

I would expect someone who developed this kind of thing to setup a feedback loop in which the AI output is somehow used to improve the static analysis tool (writing new rules, tweaking existing ones, ...). It seems like the logical next step. This might be going on on these products as well (lots of in-house rule extensions for more traditional static analysis tools, written or discovered with help of AI, hence the "build with AI" headline in some of them).

Don't get me wrong, this is cool. Getting an AI to triage a verbose static analysis report makes sense. However, it does not mean that AI found the bugs. In this model, the capabilities of finding relevant stuff are still capped at the static analyzer tools.

I wonder if we need to pay for it. I mean, now that I know it is possible (at least in my head), it seems tempting to get open source tools, set them to max verbosity, and find which prompts they are using on (likely vanilla) coding models to get them to triage the stuff.

asadeddin 14 hours ago

Hi there, I'm Ahmad, CEO at Corgea, and the author of the white paper. We do actually use LLMs to find the vulnerabilities AND triage findings. For the majority of our scanning, we don't use traditional static analysis. At the core of our engine is the LLM reading the line of code to find CWEs in them.

simonw a day ago

Looks like you're reacting to the Hacker News title here, which is currently " Daniel Stenberg on 22 curl bugs found by AI and fixed"

That's an editorialized headline (so it may get fixed by dang and co) - if you click through to what Daniel Stenberg said he was more clear:

> Joshua Rogers sent us a massive list of potential issues in #curl that he found using his set of AI assisted tools.

AI-assisted tools seems right to me here.

  • robhlam 19 hours ago

    It’s clear my attempt to keep the gist of what Daniel said while keeping under the title character count didn’t hit the mark.

    How would you have worded it?

    • simonw 19 hours ago

      Always tricky! In this case maybe the following:

      Daniel Stenberg on 22 curl bugs reported using AI-assisted security scanners

      • robhlam 17 hours ago

        That doesn’t really convey that these bug reports were for real issues and greatly appreciated unlike the slop that Daniel is known for complaining about which I think that’s the real story here.

        I will spend longer considering my title next time.

        Cheers!

  • alganet 21 hours ago

    If the title changes, it is still a valid critique of the tools, how they might work, and a possible way of getting them for free.

    Also, think about it: of course I read Joshua's report. Otherwise, how could I have known the names of the products he used?

    • bgwalter 21 hours ago

      I don't think many people here are interested in how something works. They want to see the headline "Curl developer finally convinced by AI!" and otherwise drop anecdotes about Claude Code etc.

      All comments that want to know more are at the bottom.

etlun 13 hours ago

Hi, I'm Etienne, one of the cofounders @ ZeroPath.

We do not use traditional static analyzers; our engine was built from the ground up to use LLMs as a primitive. The issues ZeroPath identified in Joshua's post were indeed surfaced and triaged by AI.

If you're interested in how it works under the hood, some of the techniques are outlined here: https://zeropath.com/blog/how-zeropath-works

  • alganet 9 hours ago

    Hi! Thanks for the reply.

    Joshua describes it as follows: "ZeroPath takes these rules, and applies (or at least the debug output indicates as such) the rules to every .. function in the codebase. It then uses LLM’s ability to reason about whether the issue is real or not."

    Would you say that is a fair assessment of the LLM role in the solution?

bgwalter 21 hours ago

I suppose the downvoters all have subscriptions to the tools and know exactly how the tools work while leaving the rest of us in the dark.

Even Joshua's blog post does not clearly state which parts and how much is "AI". Neither does the pdf.

  • refulgentis 21 hours ago

    [flagged]

    • bgwalter 20 hours ago

      [flagged]

      • tptacek 18 hours ago

        What does "even at Matasano" mean? Matasano hasn't existed for over 12 years.

      • refulgentis 18 hours ago

        I assumed based on your post and the post you replied to that it is literally impossible to prove any AI is involved, and I trust both of you on that.

        Given that, I'm afraid all the interlocution I have to offer is the thing you commented on, the mind of a downvoter, i.e. positing that every downvoter must have details, including details we[1] can't find.

        Past that, I'm afraid to admit I am having difficulty understanding how the slides are related, and I don't even know what Matasano is -- is that who owns fly.io? I thought they were "indie" -- I'm embarrassed to admit I thought Monsanto at first. I do know how much I've used AI to code, so I can vouch for tptacek's post.

        [1] royal we, i.e. I trust you and OP so completely on what it findable vs. not findable that I trust we can't establish with 100% certainty any sort of AI-based thingy was used at all. To be clear, too, 100% is always too high of a bar, I mean to say we can even't establish at 90% confidence. Even 1% confidence. If all we have is their word to go on, it's impossible.

        • tptacek 18 hours ago

          Matasano was a software security company I cofounded in 2005 and sold to NCC Group in 2012. Super weird pull for this thread.

    • alganet 20 hours ago

      Do you believe AI is at the core of these security analyzers? If so, why the personal story blogpost? You can just explain me in technical terms why is that so.

      Claiming to work for Google does not work as an authority card for me, you still have to deliver a solid argument.

      Look, AI is great for many things, but to me these products sounds like chocolate that is actually just 1% real chocolate. Delicious, but 99% not chocolate.

      • tptacek 18 hours ago

        I had a conversation in a chat room yesterday about AI-assisted math tutoring where a skeptic said that the ability of GPT5 to effortlessly solve quotient differentials or partial fraction decomposition or rational inequalities wasn't indicative of LLM improvements, but rather just represented the LLMs driving CAS tools and thus didn't count.

        As a math student, I can't possibly care less about that distinction; either way, I paste in a worked problem solution and ask for a critique, and either way I get a valid output like "no dummy multiply cos into the tan before differentiating rather than using the product rule". Prior to LLMs, there was no tool that had that UX.

        In the same way: LLMs are probably mostly not off the top of their "heads" (giant stacks of weight matrices) axiomatically deriving vulnerabilities, but rather just doing a very thorough job of applying existing program analysis tools, assembling and parallel-evaluating large numbers of hypothesis, and then filtering them out. My interlocutor in the math discussion would say that's just tool calls, and doesn't count. But if you're a vulnerability researcher, it doesn't matter: that's a DX that didn't exist last year.

        As anyone who has ever been staffed on a project triaging SAST tool outputs before would attest: it extremely didn't exist.

        • alganet 8 hours ago

          I don't care if it counts as true LLM brilliance or not.

          If it doesn't matter if it's AI or not, just that they're good tools, why even advertise the AI keyword all over it? Just say "best in class security analysis toolset". It's proprietary anyway, you can't know how much of it is actually AI (unless you reproduce its results, which is the core argument you missed here).

      • refulgentis 20 hours ago

        I don't mean to aggravate you. I do mean to offer some insight in the mindset of the people the person I was replying to was puzzled by. I'm calmed by the fact that if we're both here, we both value one of the HN sayings I'm very fond of: come with curiosity.

        > Do you believe AI is at the core of these security analyzers?

        Yes.

        > If so, why the personal story blogpost?

        When I am feeling intensely, and people respond to me as I'm about to respond to you, I usually get very frustrated. Apologies in advance if you suffer from that same part of being human, I don't mean anything about you or your positions by this:

        I don't know what you mean.

        Thus, I may be answering wrong with the following: the person I replied to indicated all downvoters must know every detail, and as the, well lets use your phrasing, personal story blogpost, I just assume you mean my comment, leads with: "I believe there's a little more going on than everyone knowing every detail already, or presumably, being wrong to downvote. Full case study of a downvoter at work:"

        > Claiming to work for Google

        I claimed the opposite! I'm a jobless hack :) (quit in 2023)

        > does not work as an authority card for me,

        Looking at it, the thing isn't "I worked at Google therefore AI good" it's "I worked at Google and on a specific well-known project, the company's design language, used AI pre-ChatGPT to great effect. It's unclear to me why this use case would be unbelievable years later"

        > you still have to deliver a solid argument.

        What are we arguing? :) (I'm serious! Apologies, again, if it comes off as flippant. If you mean I need to deliver a solid argument the tools must have AI, I assume if said details were available you would have found them, you seem well-considered and curious. I meant to explain the mind of a downvoter who yet cannot recite details as yet unavailable to the public to the person I replied to, not to verify the workflow step by step.)