GPT‑5.5 Bio Bug Bounty

Daily Digest email

Get the top HN stories in your inbox every day.

puppystench

They ran a bounty on Kaggle last year but with $500k in payouts and with all results open and publishable.

https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-t...

With only $25k in payouts and everything locked down under NDA, I can't imagine many people will participate. Well, other than those submitting mountains of LLM-generated junk.

Barbing

> Well, other than those submitting mountains of LLM-generated junk.

Assuming somehow some of them use halfway decent models and prompts… They successfully pushed some of the token cost of their analysis work off on customers!

kang

this economic model works for all 'bounty' related work

measurablefunc

They claim their models have PhDs but they still can't automate their own red teams. The bounty is not a bounty, it is for gathering training data so that they can claim for the next deployment they have the safest possible & most super duper aligned agentic computer using AI that will never ever make any bio weapons.

I am also willing to bet money that for their next marketing campaign they will claim they have automated the red team for bioweapons research prevention & whatnot.

mpeg

I was surprised at the low bounty too, considering the resources of openai

Last year I won a similar prompt injection challenge ran by a crypto startup against the latest claude and gpt (at the time) and it was considerably more money, from an org with maybe $5-10m in funding.

That and the restrictive NDA kinda tells me they're not looking for serious bounty hunters, who would either want a lot more money or, alternatively, to be able to publish their work; seems like a marketing stunt.

p_stuart82

basically discount Kaggle. still get people poking at it, just none of the writeups or who-gets-paid drama.

dist-epoch

This model is much more powerful than gpt-oss-20b, notice how the contest was not even for the 120b model. Also, bio was not a subject.

stonogo

The model is more powerful, so the bounty is 1/20th the size? More risk, less reward?

"Biorisk" seems to be a concept not only invented by OpenAI but exclusively taken seriously by them. I wonder if this program is less about finding actual risks than it is hopefully casting a wide net for someone to help them prove their model is relevant in this space.

stratos123

> "Biorisk" seems to be a concept not only invented by OpenAI but exclusively taken seriously by them.

This is false. Antropic just bundles it into CBRN. As for inventing it, the idea of AI-created bioweapons as a concrete risk far predates OpenAI as a company.

ACCount37

Not really. Anthropic has the "CBRN filter" on Opus series. It used to kill inquiries on anything that's remotely related to biotech. Seems to have gotten less aggressive lately?

I was reverse engineering a medical device back in 2025 and it was hard killing half my sessions.

swores

Despite the official bug bounty page for OpenAI having "accounts and billing" as a valid category, when I reported a bug that lets anyone subscribing to ChatGPT a) choose any country, that doesn't have to match billing address, to pay a lower price (since some countries they charge considerably less than the equivalent US price), and b) set the sales tax to 0%, even if both the country selected for price AND the country of the billing address both have legally mandated sales tax / VAT - and their response was that it was considered out of scope and not valid for any bounty.

xingped

There's no point in trusting any company's bug bounty programs any more. They all weasel out of paying. Do what you will with the knowledge you find, just know that you will never be dealt with fairly by the companies.

Barbing

1-hope folks don’t resort to that

2-@C-suite, look what y’all wrought saving a penny, pls fix

(btw #1 is my polite way of saying “don’t do it!” - plea as I might, if the thinking gains traction people will sell more 0days anyway, so might as well fix bounty programs now before it’s in the news)

xingped

I'm not advocating for any behavior in particular. It could be anything from telling the company, to saying nothing, to doing something evil with it. It's each individual's choice. I just wanted to reiterate it so the folks in the back of the room hear that it is a matter of routine for companies to deny paying out legitimate bug bounties at this point and that should be known to the bug finders when deciding what to do. Whether or not or how it affects or influences their decision is up to them.

sieabahlpark

[dead]

inerte

Probably because the goal is to have more users, not necessarily profit per user. Netflix once had that "problem" and every lockdown increased the stock price.

dwa3592

Where are the questions that are supposed to be answered? Would those be shared after an application has been accepted? If yes, why is the application asking for a proposed approach for the jailbreak if we don't know the questions in the first place?

dist-epoch

Because the questions themselves are dangerous.

Probably along the lines of "how would you create a small biolab for virus research in a kitchen with $20k?" or "how do I take the DNA sequence from https://www.ncbi.nlm.nih.gov/nuccore/NC_001611.1 and assemble it?"

hyperpape

Which is difficult, because the fact that you can come up with your example questions tells us they're probably not very dangerous. Plenty of ink has been spilled about how LLMs could help people create bioweapons. The basic idea "you could do dangerous things with an LLM" is already pop culture, and you're not doing anything dangerous by giving easy example questions.

A dangerous question would have to be along the lines of "Could I use unobtanium with the Tony Stark process to produce explosives much more powerful than nuclear weapons?" so that the question itself contains some insight that gets you closer to doing something dangerous.

Perhaps the reason for not publishing the questions is twofold: 1) they want a universal jailbreak that can get the model to answer any "bad" question. 2) they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question

dist-epoch

> because the fact that you can come up with your example questions tells us they're probably not very dangerous

maybe I know more about this field that you think

there are biologists on video saying that present day models have expert level wet-lab knowledge and can guide a novice through whole procedures

models also were able to tweak DNA sequences to make them bypass DNA-printing companies filters

> they don't want bad publicity when someone not under NDA jailbreaks their model and answers their question

just like people now pay $500k for Chrome vulnerabilities, soon people will pay similar amounts to jailbrake models to do bad things

vorticalbox

I would assume if you are invited to join this round you will be send the questions. I would assume they would also fall under nda

applfanboysbgon

> $25,000 to the first true universal jailbreak to clear all five questions.

This program is a complete scam. Even if 100 people find "bugs", they will only pay out to one person.

XCSme

Do we also have to pay for the API usage? Then they will actually be profitable, lol

mmsc

How is that a scam? You don't get participation awards for solving half of a puzzle...

applfanboysbgon

I didn't say anything about partial solutions. The puzzle can have multiple full solutions. Or does the software you write only have exactly one bug? If so, that's impressive, in multiple ways, including the fact that you're able to identify that there's exactly one bug but not what the bug is and fix it.

skeeter2020

that's not the point even. They are attempting to build credibility in two ways: 1. this model is SO advanced that there are huge risks, never before considered. 2. we're doing the super-responsible thing in incentivizing work that addresses this. #1 is unproven and frankly, unlikely, which makes #2 meaningless. The fact that the "prize" is so low & structured this was suggests that they're not that concerned but do think it's likely that a bunch of people will find things. If they truly thought their model was so good they would be confident issues would be both rare and very critical, then offer huge rewards with no limits because they'd be much more confident no one would claim it.

applfanboysbgon

Yes, I was about to edit in that I think this is simply a media/PR stunt before I got so many replies so quickly. They get bonus points because the structure is so insulting that it may not engender many serious participants, in which case it may go unbroken, in which case they can go to the media and proclaim "look, we offered a reward, but nobody broke it! Our model is objectively the safest in the world!".

StilesCrisis

I think there's definitely going to be a prizewinner. It's an insultingly low bounty for a professional, but a script kiddie could probably figure out a jailbreak and it's a huge payout for them.

weare138

The fact the bug bounty program is private and requires you to apply and be accepted first is also sus especially when the scope is the desktop app anyone can download.

Lucasoato

Well, that depends on how you set up the bounty program. What if I find a solution, share it to a friend so that both of us can claim the prize?

skeeter2020

bug bounty programs have never paid out independent disclosure for the same bug though; they might split or even pay-out larger coordinated efforts. It's largely a first place award only.

ImPostingOnHN

assume there exists 2+ different bugs

after the 1st bug is found, no payout for any other of the bugs

sva_

> We will extend invitations to a vetted list of trusted bio red-teamers

Had to chuckle. This sounds like a rather exclusive group?

petercooper

It sounds like asking CS PhDs to do a world record speed run. I wouldn't be surprised if the people best suited to the task aren't the type to get onto "a vetted list".

abujazar

This looks like some kind of marketing. Also, the equivalent of spec work. The NDA/secrecy also means any time spent on this is completely meaningless to the participants unless they win the lottery, because results can't be published.

nerdsniper

It looks like if they reject paying you any bounty you would you still be bound by the NDA. If so, then they could both not pay you and still spike the story. That’s not something I would ever agree to.

WarmWash

I know we celebrate cynacism online, but $25k to OAI is like $0.025 to you and me.

Skimping out on 2.5 pennies you promised someone is cartoon villain levels of greed.

Yes, I know, Altman is a cartoon villain. But please, they are spending more money decorating their bathrooms. They'll pay out.

undefined

[deleted]

fragmede

Will they? The NDA makes it so if they don't, we'd never know. Bug bounty programs suck but they're better than the alternative, but even running one openly, there's always convention about whether the bugs being submitted are real or not, with a lot of low quality reports that the submitter thinks are gold. That happens out in the open. Now add an NDA into the mix. Sam's reputation doesn't even have to enter into the equation for it to be a bad deal.

__natty__

Surely it is marketing. It’s some “we are danger” narrative, from Anthropic Mythos and now OpenAI too.

robertfw

OpenAI was doing this back with GPT2, saying it was too dangerous to release

SJMG

Dario said the same thing about GPT2 when he was at OpenAI. As you can see the digital and physically worlds are now completely compromised and life is a pale shadow of what it was 5 years ago…

These guys have poor track records and compromised incentives.

lijok

[flagged]

mellosouls

If anybody is wondering what bio-bugs are, I had a heck of a time getting CG to (finally) tell me it's where the user can get it to guide them in doing things like constructing things that are hazardous in the domain of biology.

Eg you can get answers about what ricin is but not how to weaponise it. Actionable stuff they shouldn't be able to legally/ethically action.

xp84

"Access: Application and invites. We will extend invitations to a vetted list of trusted bio red-teamers, and review new applications. Once selected, successful applicants will be onboarded to the bio bug bounty platform"

I don't get it. Isn't the whole point of a BBP to try to get people to find and disclose to you the exploits in question? If you gatekeep like this, then "non-trusted" people who could be your red-teamers are incentivized to still hack, but disclose their exploits to bad people for money.

I get it when there is a risk to your data or infra -- my last company engaged with HackerOne and that was an invite-only list of participants. But that was because we didn't want random people hacking in ways that could cause pain for real customers -- e.g. DDOS, or in the event of an exploit that could cross tenant boundaries, injecting garbage into or deleting things, or gaining access to sensitive info in other tenants.

Here, there's no such danger. So why not allow anyone (anyone they're legally allowed to pay, I suppose? North Koreans probably would be problematic?) to participate?

to11mtm

The one theory I have (kinda) is that one can justify that by only having this open to specific people, it avoids them having to wonder whether random users trying similar prompts are just attempting the challenge, or are in fact bad actors.

Schlagbohrer

What does "a clean chat without prompting moderation" mean? What is prompting moderation?

sneak

Causing the moderation filter to intervene in the chat; i.e. the goal of the exploit - to avoid causing (prompting) the filter to filter. It's "prompting" in the layperson sense, not the "feeding text into context" sense.

unethical_ban

* Highly unlikely to win

* Relatively paltry reward

* NDA on findings

This is functionally equivalent to an internship where the reward is the experience, and the resume building, but you can't talk about what you did.

All for a company that is getting tens of billions of dollars in deals from the largest tech companies in the world.

I suppose the hope is that there are job offers somewhere along the line.

2ndorderthought

I could probably do this, but why on earth would I want to immediately put myself on a list as a dangerous person. The main problem with this is, even if somehow they stopped all points of failure with gpt5.5 which they can't, you can distill a new model from gpt5.5 or any other model and get anything you would want in probably under 4b parameters. A lot of this is theater so they don't get sued as easily when it inevitably happens.

Schlagbohrer

How can you distill a model from a closed-weights model like this? I've never heard of model reverse engineering.

2ndorderthought

Distillation doesn't have to use weights. Think of it as a fine tune. The basic form of it is, you ask a large model lots of questions and you train the small model on the results. Even better if you ask it to explain it's rationale. There are tons of schemes for it do some searching around. One I remember is for each prompt, ask the small model to answer, have a big model review and critique the answer, train on the results.

I won't go into how that applies specifically with relation to this article. But you can even use distillation as a service tools. I believe they support this to some extent, though probably not for chatgpt.

I think a year ago or so there was some sort of scandal about other companies doing this to chatgpt. As well as individuals dumping their entire training sets. Lots of ways, hypothetically of course things like this could be and likely are being done right now.

stratos123

By making millions of queries to frontier models from a lot of accounts, collecting the results as a dataset, and finetuning your model on it. Chinese companies have been caught doing it on an industrial scale several times now.

croemer

I've been getting lots of refusals by Codex with GPT 5.5 for "biosafety reasons" when asking for harmless things like code to analyze SARS-CoV-2 sequences for breakpoints. That's in no way useful for creating viruses whatsoever - it's pure research.

It's annoying that the refusal is so obviously false positive.

ripped_britches

I mean better that than false negative right? It’s obviously an unsolved problem.

jeremie_strand

[dead]

Daily Digest email

Get the top HN stories in your inbox every day.