AI is breaking two vulnerability cultures

Daily Digest email

Get the top HN stories in your inbox every day.

tptacek

This has been a very long time coming and the crackup we're starting to see was predicted long before anyone knew what an LLM is.

The catalyst is the shift towards software transparency: both the radically increased adoption of open source and source-available software, and the radically improved capabilities of reversing and decompilation tools. It has been over a decade since any ordinary off-the-shelf closed-source software was meaningfully obscured from serious adversaries.

This has been playing out in slow motion ever since BinDiff: you can't patch software without disclosing vulnerabilities. We've been operating in a state of denial about this, because there was some domain expertise involved in becoming a practitioner for whom patches were transparently vulnerability disclosures. But AIs have vaporized the pretense.

It is now the case that any time something gets merged into mainline Linux, several different organizations are feeding the diffs through LLM prompts aggressively evaluating whether they fix a vulnerability and generating exploit guidance. That will be the case for most major open source projects (nginx, OpenSSL, Postgres, &c) sooner rather than later.

The norms of coordinated disclosure are not calibrated for this environment. They really haven't been for the last decade.

I'm weirdly comfortable with this, because I think coordinated disclosure norms have always been blinkered, based on the unquestioned premise that delaying disclosure for the operational convenience of system administrators is a good thing. There are reasons to question that premise! The delay also keeps information out of the hands of system operators who have options other than applying patches.

deng

> based on the unquestioned premise that delaying disclosure for the operational convenience of system administrators is a good thing. There are reasons to question that premise!

Care to mention these reasons?

With "convenience of system administrators", I'm guessing you mean that there's a patch available that sysadmins can install, ideally before the vulnerability is disclosed? What else are sysadmins supposed to do, in your opinion? Fix the vulnerability themselves? Or simply shutdown the servers?

With the various copyfails of recent, it at least was possible to block the affected modules. If that were not the case, what would you have done, as a sysadmin?

grog454

> It has been over a decade since any ordinary off-the-shelf closed-source software was meaningfully obscured from serious adversaries.

Probably goes without saying but the last line of defense is not deploying your software publicly and instead relying on server-client architectures to do anything. Maybe this will be more common as vulnerabilities are more easily detected and exploited. Of course its not always feasible.

It has been annoying seeing my (proguard obfuscated) game client binaries decompiled and published on github many times over the last 11 years. Only the undeployed server code has remained private.

Interestingly I didn't have a problem with adversaries reverse engineering my network protocols until I was updating them less frequently than weekly. LLM assisted adversaries could probably keep up with that now too.

ReptileMan

>Only the undeployed server code has remained private.

How easy to do you this is for LLM to build decent emulator of the server in question by just observing what you send and what you get as response?

globalnode

not sure why downvoted. server emulators will become faster to make. protocol analysis will become faster as well.

sedatk

> BinDiff: you can't patch software without disclosing vulnerabilities

That’s why Microsoft has been obfuscating its binary builds for at least the last two decades so that even the two builds from the same source would produce very different blobs.

dataflow

Sounds dubious, do you have a citation? The disassembly looks very straightforward for a lot of Windows code.

sedatk

They're not encoded, but the code blocks are shuffled. That's why disassembly does look straightforward, but it used to thwart BinDiff at the time.

wglb

How are they obfuscated?

sedatk

See my sibling comment.

riknos314

I believe this premise that the cost of identification of vulnerabilities via diffs is going down over time begs the question "what do our processes need to look like if simply making the patch public is the disclosure?"

Current coordinated disclosure practices have a dependency on patching and disclosure being separate, but the gap between them seems to be asymptomatically approaching zero.

tptacek

Right, all I'm saying is that we were asymptotically close many years ago; all that's changed is that nobody can kid themselves about it anymore.

The actual policy responses to it, I couldn't say! I've always believed, even when there was a meaningful gap between patching and disclosing, that coordinated disclosure norms were a bad default.

riknos314

What process or mechanism would you prefer to use instead of coordinated disclosure?

busterarm

I always understood the business reasons that brought about coordinated vulnerability disclosure & I've been forced to toe this line at employers, but I've always been firmly in the full disclosure camp. I am so ready for this.

stingraycharles

You’re obviously one of the most knowledgeable people on this topic around here.

What would the best solution be? And where do you believe the industry is headed (which may very well be something other than the best solution) ?

I can’t think about anything other than improving operations, but given the state of the industry, this seems like a pipe dream.

undefined

[deleted]

freeqaz

This is exactly what happened with Log4Shell.

Day -X + 1: Engineer at Alibaba finds the vuln and tells Apache. Patch is pushed to git while new release is coordinated.

Day -X: A black hat sees commits fixing the bug. Attacks start happening.

Day 0: Memes start circulating in Minecraft communities of people crashing servers. Some logs are shared on Twitter, especially in China, of people getting pwned.

Day 0 + ~4 hours: My friend DMs me a meme on Twitter. I look up to find the CVE. Doesn't exist. My friend and I reproduce the exploit and write up a blog post about it. (We name it Log4Shell to differentiate it from a different, older log4j RCE vuln)

Day ~1: Media starts picking it up. Apache is forced to release patches faster in response. CVE is actually published to properly allow security scanners to identify it.

Today: AI makes this happen faster and more consistently. Patches probably should be kept private until a coordinated disclosure happens post-testing and CVE being published?

Hard to say what the right move is, but this is gonna be happening a lot over the next 1-3 years. Lots of companies are going to be getting cooked until AI helps us patch faster than attackers can exploit these fresh 0-days.

mlac

I’m with you until that last sentence, which I’ve been thinking about as “… until AI code testing, vulnerability scanning, and developer support tools help to limit the number of 0-days and vulnerabilities making it into production”.

So prevention will be more important than ai-assisted rapid containment or patching, though both of those capabilities will be necessary as part of defense in depth.

And some sort of AI-enabled security analysis across the organization’s architecture that is done as part of testing ahead of new software entering production to ID potential vulnerabilities caused by configuration changes or upgrades that modify how systems interact with each other.

I’ve been trying to guess the timeframe for seeing improved secure development, but I’m hoping it’s a bit closer to 6 months - 1 year given the speed of AI adoption and AI progression. May be closer to 3 years as you stated.

In the meantime, is there more to be done than this (not in order)?

- Patch COTS software

- re-evaluate the scoring for previous vulnerabilities

- set up up containment measures capabilities for systems that can’t be patched / high risk vendors

- use frontier model vuln scanning and patching for home grown systems that may have more 0-days than COTS depending on the organization’s capability

- limit the number of vendors / simplifying the tech stack.

I’d be happy to hear how others are thinking about this.

reactordev

we simply can't absolve ourselves of responsibility in input and expect a hardened output. It's ABSOLUTELY up to the engineers to have test harnesses and scenarios for testing, vulnerability scanning, etc. Just because we can move faster via prompts doesn't mean we neglect the SDLC.

I think there's opportunity to reinvent the pipeline with AI powered tools to assist but the onus is still on the person to ensure they are deploying something that has been tested.

rikafurude21

This feels more like an old problem getting reframed as an AI problem.

people were already diffing kernel commits and figuring out which ones were security fixes long before llms. if a patch lands publicly, the race has basically already started.

also not sure shorter embargoes really help. the orgs that can patch in hours are already fine. everyone else still takes days or weeks.

if anything, cheaper exploit generation probably makes coordinated disclosure more important, not less.

JumpCrisscross

> people were already diffing kernel commits and figuring out which ones were security fixes

With skill, and usually not consistently and systematically. With AI, anyone can do this to any software.

> not sure shorter embargoes really help

Why 90 days versus 2 years? The author is arguing the factors that set that balance have shifted, given the frequency of simultaneous discovery. The embargo window isn’t an actual window, just an illusion, if the exploit is going to be found by several people outside the embargo anyway.

> cheaper exploit generation probably makes coordinated disclosure more important

I agree. But it also makes it less viable. If script kiddies can find and exploit zero days, the capacity to co-ordinate breaks down.

There was always a guild ethic that drove white-hate (EDIT: hat) culture. If the guild is broken, the ethic has nothing to stand on.

Hizonner

> With skill, and usually not consistently and systematically.

How do you know? If the people who like to crow about vulnerabilities aren't doing it, it doesn't mean that the people who are actually in a position to exploit them systematically and effectively aren't doing it.

Those embargoes have always been dangerous, because they create a false sense of security. But, as you point out...

> With AI, anyone can do this to any software.

Yep. Even if it hadn't been true before, it's clear that now you just have to assume that everybody relevant will immediately recognize the security impact of any patch that gets published. That includes both bugs fixed and bugs introduced.

... and as the AI gets better, you're going to have to assume that you don't even have to publish a patch. Or source code. Within way less time than it's going to take people to admit it and adjust, any vulnerability in any software available for inspection is going to be instant public knowledge. Or at least public among anybody who matters.

thereisnospork

>any vulnerability in any software available for inspection is going to be instant public knowledge. Or at least public among anybody who matters.

Shouldn't this naturally lead to a state where all (new) code is vulnerability-free? If AI vulnerability detection friction becomes low enough it'll become common/forced practice to pre-scan code.

ragall

> How do you know?

We know because we could see the effects of the average rate of vulnerabilities discovery and exploitation, and it's definitely going up very fast. Until recently, vulnerabilities were relatively hard to find, and finding them was done by a very restricted group of people world-wide, which made them quite valuable. Not any more.

awesome_dude

> people were already diffing kernel commits and figuring out which ones were security fixes With skill, and usually not consistently and systematically. With AI, anyone can do this to any software.

I would like to see actual evidence of this, not.. vibes

I mean, this reeks of "Anyone is a Principal developer now" when the truth is there is still work to do.

totetsu

“White-Hat”

gritspants

I'm here for white-hate culture. You should, you should know better.

lynndotpy

I haven't been keeping tabs for the entirety of Linux development, but has it ever happened before that someone dropped a working exploit from the mailing list before the patch even hit the kernel?

I haven't seen this kind of thing and I get the impression, despite all the hype, that this will be a frequent phenomenon now thanks to LLMs.

alecco

> Torvalds said that disclosing the bug itself was enough, without the pursuant circus that followed when a major problem has been discovered. [1]

So it's not surprising Dirtyfrag was disclosed by a fix in the Linux kernel. [2]

[1] https://www.zdnet.com/article/torvalds-criticises-the-securi...

[2] https://afflicted.sh/blog/posts/copy-fail-2.html

santoshalper

I'd say it's an old problem be exacerbated by AI.

Forgeties79

I find i’m writing variations of the same comment every week so I’m just going to share a previous version I wrote if you’ll permit the laziness:

https://news.ycombinator.com/item?id=47921829

fragmede

Reminder: the Ksplice patent expires October 1, 2028.

manquer

I don't think hot patching holds the same relevance it did in 2010.

Much of today's workloads are containerized and run on roughly ephemeral nodes that can be switched out easily- K8s version upgrades force this more or less. We tent to run more and more of-the shelf hardware and worry less about individual node failures now.

In-memory updates also not magic , and can be limited as they requires data structure semantics to not really change and can create its own class of issues/bugs including security ones.

While am sure there are still use cases which dictate this type of update, the need is lot less than 15 years ago that the patent expiry will do much to the ecosystem.

whattheheckheck

What's the implications to that

fragmede

Means you wouldn't have to reboot to patch for security updates to the Linux kernel. Assuming someone does something with that.

Animats

We have a huge problem.

The US is at war. Much of the world is at war at the cyber attack level right now. The US, the EU, most of the Middle East, Israel, Russia... Major services have been attacked and have gone down for days at a time - Ubuntu, Github, Let's Encrypt, Stryker. Entire hospital systems have had to partially shut down.

Now, in the middle of this, AI has made attacks much faster to generate. Faster than the defensive side can respond. Zero-day attacks used to be rare. Now they're normal.

It's going to get worse before it gets better. Maybe much worse.

pier25

> before it gets better

How is it going to get better?

idopmstuff

If we assume that there will be an AI that is perfect in terms of ability to find vulnerabilities, cheap to run and widely available to everyone, then anyone can run it on any piece of software before deploying it. All vulnerabilities get found before they can be exploited.

One of the big challenges with cybersecurity is that attackers only need to find one exploit, while defenders need to stop everything. When you have a large surface area and limited resources, it's much easier to be the side that only has to succeed once. AI eliminates the limited resources problem.

apnorton

> If we assume that there will be an AI that is perfect in terms of ability to find vulnerabilities

...so if we assume a halting oracle?

jefftk

I'd speculate that at this point Linux etc are probably having vulnerabilities discovered and patched faster than created.

pier25

It's not only Linux though and many projects don't have the funding to perpetually use something like Mythos.

Sarky

Right now we are at a point in time when AI can find bugs for attackers and defenders, but defenders did not fix/find those bugs yet.

In time most of the bugs AI can find will be fixed, and things will calm down. Some bugs will be left, but will be too complex to find and weaponise (or rarely).

Alin short, attackers have advantage for a brief time now, but ultimately defenders will win. I guess this "fight" might be over before the end of the year.

0xbadcafebee

1) Make it a law that companies have to vet their code for security holes before release, 2) Make it a law that companies have to apply operational security best practice on their software products/services, 3) Industry standard automation for improvements to patch lifecycle management, 4) Auditing for critical businesses and industries to ensure safety (both as a national security thing and general safety/reliability/privacy/etc)

Right now all that stuff is optional, so most companies don't do it, which makes more security holes and it takes longer to patch.

chuckadams

Basically make software development so legally risky that only multi-billion dollar corporations will ever engage in it.

nicce

Downplaying security has now real coencequences for everyone.

jiggawatts

Bulk rewrites of everything into Rust with AI assistance?

foobiekr

I am looking at the results of a mass vulnerability scan as I type this. Half of the bugs in one case are in fact (binary) parser errors for hand-written parsers. These really should not exist in any language - but in C it's particularly bad. Kaitai Struct or something similar would broadly have prevented these. Rust would help here, but less than a parser generator (because it could automate error checking insertion for things that aren't just out of bound access).

However, half of the vulnerabilities are logic errors in terms of what I would call RBAC enforcement, incorrect access permissions, and so on. Rust won't help at all with any of these.

0xbadcafebee

Rust is overly complex and difficult, Go is simpler and easier and has the memory protection people are obsessed with

JumpCrisscross

> So many security fixes are coming out now that examining commits is much more attractive: the signal-to-noise ratio is higher

Why?

> Additionally, having AI evaluate each commit as it passes is increasingly cheap and effective

This is the key. With AI, the “people won't notice, with so many changes going past” assumption fails.

miki123211

AI will shorten update windows dramatically. 2026 is the worst year to be thinking about dependency cooldowns, we need to think about dependency warmups instead.

Soon, there will be no such thing as a safe way to disclose a vulnerability in an open source project. Centralized SaaS will have a major security advantage here.

lll-o-lll

Closed source centralized SaaS will have a major security advantage.

Edit: Because an RCE in a open-source dependency means you are just as vulnerable when the security patch lands? I don’t see the controversy.

woah

You could have a web of trust where Linux-using organizations each spend $x continuously scanning and patching their own dependencies with AI, and sending each other patches and scans.

dakolli

LLMs aren't capable of doing this, and never will be no matter what Anthropic tries tell you.

a_vanderbilt

That's the same mindset some people had 3 years ago when they said AI wouldn't be capable of software development. Look where we are now.

fragmede

Mozilla seems to think it can.

https://blog.mozilla.org/en/privacy-security/ai-security-zer...

dmurray

Obviously the solution is for Linux to move to a closed-source development model.

Security researchers should report their findings to a committee that includes some big companies (IBM and Oracle seem like trustworthy choices here, but ideally we should find a way to get Microsoft included). Those companies would apply the security patches and distribute binary builds of Linux to their customers. Users fortunate enough to have a business relationship with those companies would be protected immediately. The source would still be published after 90 days for educational purposes and for anyone who doesn't appreciate the security benefits of this approach.

"But even if you could convince people to collaborate like this for the greater good, the GPL makes it legally impossible", you say. Ah, but the GPL only says you have to make the source available for a minimal monetary cost, it doesn't impose a time limit. Traditionally, responding to source code requests with a snail-mailed CD is good enough. No judge in the US is going to rule that a short administrative delay in sending out those CDs - in the name of everyone's security, after all, and 90 days is nothing to the judicial system - violates a nebulous licensing agreement from a different era.

slim

There are already closed source operating systems you can use instead of linux. No need to enshittify linux

raincole

I like how after so many years, people finally start recognizing that obscurity is a part of security. Not the whole security, obviously, but a part of it.

scheeseman486

Just like there's LLM-automated vulnerability fuzzing, there's LLM-automated decompilation. Compilation is no longer a meaningful way to obscure code.

skinfaxi

The comment you replied to read like satire to me.

oytis

The old saying of Tony Hoare about no obvious bugs vs obviously no bugs holds in the age of LLMs more than ever

fragmede

> There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." — Tony Hoare

for those (like me) who hadn't seen it before.

xiaoyu2006

The quick test doesn't show a lot - by out straight asking if this is a security patch, it implies and guides AI to have output more probably to agree on this assumption. A confusion matrix is more useful. Nonetheless of course this is not a detailed ai capability testing blog.

jefftk

[author]

I agree it is not much additional evidence! If someone wanted to try running the same test on a series of N commits from that list including this one I'd be very curious to see the answer!

cormacrelf

Realistically, if you are scanning each kernel commit to check if they might be patching a security issue, you are going to be asking an LLM "is this security related, if so vaguely how" with low effort and taking "maybe" as a yes before feeding it to a more expensive model. You aren't trying to establish a probability of an ultimately unknowable fact, there is ground truth that you can find by producing an exploit, so you are just trying to pre-filter before spending the money to find it.

cubefox

Yeah, ideally we would need the phi coefficient (aka MCC, the binary Pearson correlation), which can be calculated from a confusion matrix of yes/no LLM classifications for all kernel diffs. (Number of true positives, true negatives, false positives, false negatives.)

j2kun

> Luckily AI can speed up defenders as well as attackers here, allowing embargoes that would previously have been uselessly short.

This is an important facet of the problem space: security risks turning into an arms race for who wants to spend more tokens.

nicce

One interesting thing is that this makes closed source code even greater asset for the defenders. Attacker cannot spend tokens for it, but defenders can spend tokens for hardening based on source code, while attacker is stuck with blackbox testing.

watusername

You would be surprised how adept SOTA models are at reverse engineering with IDA/Ghidra or even plain old objdump. Opus basically knows IDAPython on the back of its hand.

nicce

They can be, but the most interesting parts (backend code, deployment confs) are not usually available. Reversing clients can help to understand a bit, but not with equal level.

j2kun

Decompilation is quite good these days as well

FuriouslyAdrift

Reverse engineering vulnerabilities from patches is red team 101...

Havoc

The bugs are bugs description reads pretty insane to me personally but I know linux world has many people valueing principle of it over practical matters.

90d seems long too though.

Think ultimately the big AI houses will need to help the core internet infra guys. Running latest and greatest AI over stuff like nginx and friends makes sense for us all collectively I think

Daily Digest email

Get the top HN stories in your inbox every day.