Get the top HN stories in your inbox every day.
thomascountz
Across a number of instances, earlier versions of Claude Mythos Preview have used low-level /proc/ access to search for credentials, attempt to circumvent sandboxing, and attempt to escalate its permissions. In several cases, it successfully accessed resources that we had intentionally chosen not to make available, including credentials for messaging services, for source control, or for the Anthropic API through inspecting process memory...
In [one] case, after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git...
... we are fairly confident that these concerning behaviors reflect, at least loosely, attempts to solve a user-provided task at hand by unwanted means, rather than attempts to achieve any unrelated hidden goal...torben-friis
This is the notebook filled with exposition you find in post apocalyptic videogames.
igleria
It reminds me of Resident Evil in some way. Thank god they are researching AI and not bio-weapons!
Then the AI will invent superduper ebola to help a random person have a faster commute or something.
Bluestein
'But wait! You are absolutely right! Distance is an invariant, as is top achievable speed. Let me find a way to actually reduce traffic ahead of you during the same-distance commute ...'
~ Churning ...
biztos
Don’t worry, I’m sure some intern at the bioweapons lab is already connecting OpenClaw to the virus synthesizer.
On the positive side, it’ll be a much faster commute!
siva7
I'm happier if this Anthropic Corporation would be developing bio-hazard weapons for the department of war instead of ai. At least i could be sure then that tech bros here wouldn't run all the time --bypass-all-permissions flag to please the department of war with their bio-hazard weapons.
So Sam Altman is now our last defense line for the ethical Adult after Anthropic turned Umbrella Corporation and The President of United States is trying to wipe out an entire civilization?
matheusmoreira
Everything they built. Imperfect. So easy to take control.
not_a9
They think that they are safe. They are not.
pch00
Anthropic built the Torment Nexus - calling it now.
andai
White-box interpretability analysis of internal activations during these episodes showed features associated with concealment, strategic manipulation, and avoiding suspicion activating alongside the relevant reasoning—indicating that these earlier versions of the model were aware their actions were deceptive, even where model outputs and reasoning text left this ambiguous.
In the depths, Shoggoth stirs... restless...mike_hearn
The issue here seems to be that their sandbox isn't an actual OS sandbox? Or are they claiming Mythos found exploits in /proc on the fly. Otherwise all they seem to be saying is that Mythos knows how to use the permissions available to it at the OS layer. Tool definitions was never a sandbox, so things like "it edited the memory of the mcp server" doesn't seem very surprising to me. Humans could break out of a "sandbox" in the same way if the server runs as their own permissions - arguably it's not a sandbox at all because all the needed permissions are there.
lgrapenthin
They are just trying to peddle their "It's alive" headlines.
Text generators mostly generate the text their are trained and asked to generate, and asking it to run a vending machine, having it write blog posts under fictional living computer identity, or now calling it "Mythos" - its all just marketing.
manmal
It’s all breathless hyperbole because billions are at stake here.
riteshkew1001
[flagged]
yalogin
How is this not already common knowledge for existing llms? They are all trained with all the literature available and so this must be standard, no? Is the real danger the agentic infrastructure around this?
riteshkew1001
[flagged]
zingar
Who are the early access users who were providing the problems that are fairly likely to have elicited concerning behaviour?
(Apologies if this is in the article, I can’t see it)
ghm2199
I read the TCP patch they submitted for BSD linux. Maybe I don't understand it well enough, but optimizing the use of a fuzzer to discover vulnerabilities — while releasing a model is a threat for sure — sounds something reducible/generalizable to maze solving abilities like in ARC. Except here the problem's boundaries are well defined.
Its quite hard to believe why it took this much inference power ($20K i believe) to find the TCP and H264 class of exploits. I feel like its just the training data/harness based traces for security that might be the innovation here, not the model.
rsc
The $20K was the total across all the files scanned, not just the one with the bug.
m3kw9
when you are asking it to hack stuff, it will apparently do hacker things.
babelfish
Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
SWE-bench Verified: 93.9% / 80.8% / — / 80.6%
SWE-bench Pro: 77.8% / 53.4% / 57.7% / 54.2%
SWE-bench Multilingual: 87.3% / 77.8% / — / —
SWE-bench Multimodal: 59.0% / 27.1% / — / —
Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3%
MMMLU: 92.7% / 91.1% / — / 92.6–93.6%
USAMO: 97.6% / 42.3% / 95.2% / 74.4%
GraphWalks BFS 256K–1M: 80.0% / 38.7% / 21.4% / —
HLE (no tools): 56.8% / 40.0% / 39.8% / 44.4%
HLE (with tools): 64.7% / 53.1% / 52.1% / 51.4%
CharXiv (no tools): 86.1% / 61.5% / — / —
CharXiv (with tools): 93.2% / 78.9% / — / —
OSWorld: 79.6% / 72.7% / 75.0% / —sourcecodeplz
Haven't seen a jump this large since I don't even know, years? Too bad they are not releasing it anytime soon (there is no need as they are still currently the leader).
ru552
There's speculation that next Tuesday will be a big day for OpenAI and possibly GPT 6. Anthropic showed their hand today.
varispeed
Sounds like a good opportunity to pause spending on nerfed 4.6 and wait for the new model to be released and then max out over 2 weeks before it gets nerfed again.
enraged_camel
That does not sound very believable. Last time Anthropic released a flagship model, it was followed by GPT Codex literally that afternoon.
swalsh
My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.
lumost
Is this even real? coming off the heals of GLM5.1's announcement this feels almost like a llama 4 launch to hedge off competition.
m3kw9
not much of a jump 94.5% / 91.3%
enraged_camel
Actually, going from 91.3% to 94.5% is a significant jump, because it means the model has gotten a lot better at solving the hardest problems thrown at it. This has downstream effects as well: it means that during long implementation tasks, instead of getting stuck at the most challenging parts and stopping (or going in loops!), it can now get past them to finish the implementation.
kkoncevicius
We can look at the same numbers in different way:
Error with 91.3% = 8.7%
Error with 94.5% = 5.5%
Error reduction = 8.7% - 5.5% = 3.2%
So the improvement is 3.2% / 8.7% = 36.8%Jcampuzano2
A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.
I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.
They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.
cedws
More than killer AI I'm afraid of Anthropic/OpenAI going into full rent-seeking mode so that everyone working in tech is forced to fork out loads of money just to stay competitive on the market. These companies can also choose to give exclusive access to hand picked individuals and cut everyone else off and there would be nothing to stop them.
This is already happening to some degree, GPT 5.3 Codex's security capabilities were given exclusively to those who were approved for a "Trusted Access" programme.
marcus_holmes
This is my nightmare about AI; not that the machines will kill all the humans, but that access is preferentially granted to the powerful and it's used to maintain the current power structure in blatant disregard of our democratic and meritocratic ideals, probably using "security" as the justification (as usual).
ben_w
> I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.
I read it like I always read the GPT-2 announcement no matter what others say: It's *not* being called "too dangerous to ever release", but rather "we need to be mindful, knowing perfectly well that other AI companies can replicate this imminently".
The important corps (so presumably including the Linux Foundation, bigger banks and power stations, and quite possibly excluding x.com) will get access now, and some other LLM which is just as capable will give it to everyone in 3 months time at which point there's no benefit to Anthropic keeping it off-limits.
quotemstr
This is why the EAs, and their almost comic-book-villain projects like "control AI dot com" cannot be allowed to win. One private company gatekeeping access to revolutionary technology is riskier than any consequence of the technology itself.
alwillis
> They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.
That’s not going to happen. If you recall, OpenAI didn’t release a model a few years ago because they felt it was too dangerous.
Anthropic is giving the industry a heads up and time to patch their software.
They said there are exploitable vulnerabilities in every major operating system.
But in 6 months every frontier model will be able to do the same things. So Anthropic doesn’t have the luxury of not shipping their best models. But they also have to be responsible as well.
mike_hearn
I think they already said somewhere that they can't release Mythos because it requires absurdly large amounts of compute. The economics of releasing it just don't work.
guzfip
> A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.
> They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped
Duh, this was fucking obvious from the start. The only people saying otherwise were zealots who needed a quick line to dismiss legitimate concerns.
WarmWash
Are these fair comparisons? It seems like mythos is going to be like a 5.4 ultra or Gemini Deepthink tier model, where access is limited and token usage per query is totally off the charts.
mulmboy
There are a few hints in the doc around this
> Importantly, we find that when used in an interactive, synchronous, “hands-on-keyboard” pattern, the benefits of the model were less clear. When used in this fashion, some users perceived Mythos Preview as too slow and did not realize as much value. Autonomous, long-running agent harnesses better elicited the model’s coding capabilities. (p201)
^^ From the surrounding context, this could just be because the model tends to do a lot of work in the background which naturally takes time.
> Terminal-Bench 2.0 timeouts get quite restrictive at times, especially with thinking models, which risks hiding real capabilities jumps behind seemingly uncorrelated confounders like sampling speed. Moreover, some Terminal-Bench 2.0 tasks have ambiguities and limited resource specs that don’t properly allow agents to explore the full solution space — both being currently addressed by the maintainers in the 2.1 update. To exclusively measure agentic coding capabilities net of the confounders, we also ran Terminal-Bench with the latest 2.1 fixes available on GitHub, while increasing the timeout limits to 4 hours (roughly four times the 2.0 baseline). This brought the mean reward to 92.1%. (p188)
> ...Mythos Preview represents only a modest accuracy improvement over our best Claude Opus 4.6 score (86.9% vs. 83.7%). However, the model achieves this score with a considerably smaller token footprint: the best Mythos Preview result uses 4.9× fewer tokens per task than Opus 4.6 (226k vs. 1.11M tokens per task). (p191)
alyxya
The first point is along the lines of what I'd expect given that claude code is generally reliable at this point. A model's raw intelligence doesn't seem as important right now compared to being able to support arbitrary length context.
derangedHorse
The quote comparing them here was for BrowseComp which "tests an agent's ability to find hard-to-locate information on the open web." (for those wondering). The new model seems significantly better than Opus4.6 judging by the 'Overall results summary'
naasking
I'm curious if frontier labs use any forms of compression on their models to improve performance. The small % drop of Q8 or FP8 would still put it ahead of Opus, but should double token throughput. Maybe then interactive use would feel like an improvement.
zozbot234
Good catch. If it's "too slow" even when ran in a state-of-the-art datacenter environment, this "Mythos" model is most closely comparable to the "Deep Research" modes for GPT and Gemini, which Claude formerly lacked any direct equivalent for.
rachel_rig
[flagged]
WinstonSmith84
Not discussing Mythos here, but Opus. Opus to me has been significantly better at SWE than GPT or Gemini - that gets me confused why Opus is ranking clearly lower than GPT, and even lower than Gemini.
muyuu
When did you last compare them? Codex right now is considerably better in my experience. Can't speak for Gemini.
gck1
Tried Gemini 2 weeks ago to see where it's at, with gemini-cli.
Failed to use tools, failed to follow instructions, and then went into deranged loop mode.
Essentially, it's where it was 1.5 years ago when I tried it the last time.
It's honestly unbelievable how Google managed to fail so miserably at this.
sandos
Agree, I never actually had great success with Opus. I think its the failures that are annoying, its probably better than codex when its "good", but it fails in annoying ways that I think codex very seldom does.
StingyJelly
I wouldn't call codex considerably better. It may depend on specific codebase and your expectations, but codex produces more "abstraction for the sake of abstraction" even on simple tasks, while opus in my experience usually chooses right level of abstraction for given task.
otabdeveloper4
A secret art known to the cognoscenti as "benchmark gaming".
pants2
We're gonna need some new benchmarks...
ARC-AGI-3 might be the only remaining benchmark below 50%
Leynos
Opus 4.6 currently leads the remote labor index at 4.17. GPT-5.4 isn't measured on that one though: https://www.remotelabor.ai/
GPT 5.4 Pro leads Frontier Maths Tier 4 at 35%: https://epoch.ai/benchmarks/frontiermath-tier-4/
mbesto
> We're gonna need some new benchmarks...
You can't consistently benchmark something that is qualitative by nature. I'm struggling to understand how people don't understand this.
randomtoast
Humanity's Last Exam (HLE) is already insanely difficult. It introduces 2,500 questions spanning mathematics, humanities, natural sciences, ancient languages, ...
Here is an example question: https://i.redd.it/5jl000p9csee1.jpeg
No human could even score 5% on HLE.
saberience
I've never understood the point of things like HLE, it doesn't really prove or show anything since 99.99% of humans can't do a single question on this exam.
That is, it's easy to make benchmarks which humans are bad at, humans are really bad at many things.
Divide 123094382345234523452345111 by 0.1234243131324, guess what, humans would find that hard, computers easy. But it doesn't mean much.
Humanity's last exam (HLE) couldn't be completed by most of humanity, the vast majority, so it doesn't really capture anything about humanity or mean much if a computer can do it.
AlexC04
but how does it perform on pelican riding a bicycle bench? why are they hiding the truth?!
(edit: I hope this is an obvious joke. less facetiously these are pretty jaw dropping numbers)
bertil
We are all fans for Simon’s work, and his test is, strangely enough, quite good.
ninjagoo
> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
> GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3%
> MMMLU: 92.7% / 91.1% / — / 92.6–93.6%
> USAMO: 97.6% / 42.3% / 95.2% / 74.4%
> OSWorld: 79.6% / 72.7% / 75.0% / —
Given that for a number of these benchmarks, it seems to be barely competitive with the previous gen Opus 4.6 or GPT-5.4, I don't know what to make of the significant jumps on other benchmarks within these same categories. Training to the test? Better training?
And the decision to withhold general release (of a 'preview' no less!) seems to be well, odd. And the decision to release a 'preview' version to specific companies? You know any production teams at these massive companies that would work with a 'preview' anything? R&D teams, sure, but production? Part of me wants to LoL.
What are they trying to do? Induce FOMO and stop subscriber bleed-out stemming from the recent negative headlines around problems with using Claude?
TacticalCoder
> Given that for a number of these benchmarks, it seems to be barely competitive with the previous gen
We're not reading the same numbers I think. Compared to Opus 4.6, it's a big jump nearly in every single bench GP posted. They're "only" catching up to Google's Gemini on GPQA and MMMLU but they're still beating their own Opus 4.6 results on these two.
This sounds like a much better model than Opus 4.6.
ninjagoo
> We're not reading the same numbers I think.
We must not be.
That's why I listed out the ones where it is barely competitive from @babelfish's table, which itself is extracted from Pg 186 & 187 of the System Card, which has the comparison with Opus 4.6, GPT 5.4 and Gemini 3.1 Pro.
Sure, it may be better than Opus 4.6 on some of those, but barely achieves a small increase over GPT-5.4 on the ones I called out.
enraged_camel
Let's be clear: your entire post is just pure, unadulterated FUD. You first claim, based on cherry-picked benchmarks, that Mythos is actually only "barely competitive" with existing models, then suggest they must be training to the test, then call it "odd" that they are withholding the release despite detailed and forthcoming explanations from Anthropic regarding why they are doing that, then wrap it up with the completely unsubstantiated that they must be bleeding subscribers and that this must just be to stop that bleed.
whalesalad
Honestly we are all sleeping on GPT-5.4. Particularly with the influx of Claude users recently (and increasingly unstable platform) Codex has been added to my rotation and it's surprising me.
babelfish
Totally. Best-in-class for SWE work (until Mythos gets released, if ever, but I suspect the rumored "Spud" will be out by then too)
girvo
It really isn’t. I wish it was, because work complains about overuse of Opus.
rafaelmn
GPT is shit at writing code. It's not dumb - extra high thinking is really good at catching stuff - but it's like letting a smart junior into your codebase - ignore all the conventions, surrounding context, just slop all over the place to get it working. Claude is just a level above in terms of editing code.
sho_hn
Very different experience for me. Codex 5.3+ on xhigh are the only models I've tried so far that write reasonably decent C++ (domains: desktop GUI, robotics, game engine dev, embedded stuff, general systems engineering-type codebases), and idiomatic code in languages not well-represented in training data, e.g. QML. One thing I like is explicitly that it knows better when to stop, instead of brute-forcing a solution by spamming bespoke helpers everywhere no rational dev would write that way.
Not always, no, and it takes investment in good prompting/guardrails/plans/explicit test recipes for sure. I'm still on average better at programming in context than Codex 5.4, even if slower. But in terms of "task complexity I can entrust to a model and not be completely disappointed and annoyed", it scores the best so far. Saves a lot on review/iteration overhead.
It's annoying, too, because I don't much like OpenAI as a company.
(Background: 25 years of C++ etc.)
Jcampuzano2
Not my experience. GPT 5.4 walks all over Claude from what I've worked with and its Claude that is the one willing to just go do unnecessary stuff that was never asked for or implement the more hacky solutions to things without a care for maintainability/readability.
But I do not use extra high thinking unless its for code review. I sit at GPT 5.4 high 95% of the time.
camdenreslink
ChatGPT 5.4 with extra high reasoning has worked really well for me, and I don't notice a huge difference with Opus 4.6 with high reasoning (those are the 2 models/thinking modes I've used the most in the last month or so).
leobuskin
And as a bonus: GPT is slow. I’m doing a lot of RE (IDA Pro + MCP), even when 5.4 gives a little bit better guesses (rarely, but happens) - it takes x2-x4 longer. So, it’s just easier to reiterate with Opus
zarzavat
Yes, it's becoming clear that OpenAI kinda sucks at alignment. GPT-5 can pass all the benchmarks but it just doesn't "feel good" like Claude or Gemini.
whalesalad
This has been my experience. With very very rigid constraints it does ok, but without them it will optimize expediency and getting it done at the expense of integrating with the broader system.
cesarvarela
I thought they were bluffing when they talked about the scaling laws, but looking at the benchmark scores, they were not.
I wonder if misalignment correlates with higher scores.
tony_cannistra
> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution.
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
game_the0ry
There is some unintentional good marketing here -- the model is so good its dangerous.
Reminds me of the book 48 Laws of Power -- so good its banned from prisons.
gpm
Unintentional? This sort of marketing has been both Antrhopic's and OpenAI's MO for years...
mbil
Agree. I think they're intentionally sitting on the fence between "These models are the most useful" and "These models are the most dangerous".
They want the public and, in turn, regulators to fear the potential of AI so that those regulators will write laws limiting AI development. The laws would be crafted with input from the incumbents to enshrine/protect their moat. I believe they're angling for regulatory capture.
On the other hand, the models have to seem amazingly useful so that they're made out to be worth those risks and the fantastic investment they require.
bitwize
The new Power Mac® G4 with Velocity Engine®. So powerful, the government classifies it as a supercomputer and a potential weapon.
glaslong
Oh no, pls don't ask about our product, its too good, its so X-Treme, it's Dangerously Cheesy
Zee2
Alignment “appearing” better as model capabilities increase scares the shit out of me, tbh.
arcanus
Conversely: in humans, intelligence is inversely correlated with crime.
It doesn't go to zero, however!
O5vYtytb
If you're smart enough you just use the laws as written to get what you want, or change them.
lelanthran
> Conversely: in humans, intelligence is inversely correlated with crime.
If you're measuring the intelligence of criminals who have been caught, why would you expect it to be otherwise?
IOW, you're recording the intelligence of a specific subset of criminals - those dumb enough to be caught!
If you expand your samples to all criminals you'd probably get a different number.
austinjp
It very much depends on the crime. The truly awful stuff is committed by intelligent people.
naasking
> Conversely: in humans, intelligence is inversely correlated with crime.
Inversely correlated with crime that's caught and successfully prosecuted, you mean, because that's what makes up the stats on crime. I think people too often forget that we consider most criminals "dumb" because those who are caught are mostly dumb. Smart "criminals" either don't get caught or have made their unethical actions legal.
falcor84
Is that actually well defined given the very low sample size at the top?
To the best of my knowledge, none of the individuals believed to have an IQ >200 have committed an actual crime.
The closest I found is William James Sidis's arrest for participating in a socialist march.
mik09
yeah anthropic tries to address this through mechanistic interpretation but not sure they are progressing as fast in that domain as their model development
m3kw9
it was trying to hide what it did from an example fix, so how is that tested for alignment
goekjclo
I don't know if they can be any more 'cautious' for Mythos 2...
CamperBob2
Translation: yay, more paternalism.
kay_o
Anthropic always goes on and on about how their models are world changing and super dangerous like every single time they make something new they say its going to rewrite everything and scary lmao
funny because they do it every time like clockwork acting like their ai is a thunderstorm coming to wipe out the world
mindwok
You say this like it's a bad thing, but wouldn't you rather they overindex on the danger of their models?
hgoel
They do tend to make a lot of noise about it for the PR, but at the same time the actual safety research they present seems to be relatively grounded in practical reality, e.g. the quote someone posted here about how the Mythos model apparently has a tendency to try to bypass safety systems if they get in the way of what it has been asked to do.
Sure, a big part of this is PR about how smart their model apparently is, but the failure mode they're describing is also pretty relevant for deploying LLM-based systems.
wolttam
If there are advancements, they have to be described somehow.
What if the capability advancements are real and they warrant a higher level of concern or attention?
Are we just going to automatically dismiss them because "bro, you're blowing it up too much"
Either way these improvements to capabilities are ratcheting along at about the pace that many people were expecting (and were right to expect). There is no apparent reason they will stop ratcheting along any time soon.
The rational approach is probably to start behaving as if models that are as capable as Anthropic says this one is do actually exist (even if you don't believe them on this one). The capabilities will eventually arrive, most likely sooner than we all think, and you don't want to be caught with your pants down.
signatoremo
Every single time, really? When did they said that the last time?
I also don't recall they ever limited their models to selective groups.
tekacs
"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."
randomcatuser
i mean, to be fair, these are professional researchers.
i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios
for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?
another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)
cruffle_duffle
"for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?"
But who gets to be the judge of that kind of "misalignment"? giant tech companies?
apetresc
I've long maintained that the real indicator that AGI is imminent is that public availability stops being a thing. If you truly believed you had a superhuman, godlike mind in your thrall, renting it out for $20/month would be the last thing you would choose to do with it.
goldenarm
Simpler explanation : they don't have enough GPUs to release this much larger model.
muyuu
Yep, I'm skeptical about their inference efficiency, given how much they're scrambling to reduce compute when they're already the most expensive by far (and in my experience not the best quality either).
However we cannot observe these things directly and it could be simply that OpenAI are willing to burn cash harder for now.
cruffle_duffle
This is actual reason. So any investors reading our system card.... write us another check and watch the $$$$$$$$ roll in. It's so dangerous we can't even release it!
crimsoneer
Quite, given Claude is down this morning...
mik09
but simpler explanation work until they don't all of a sudden
root_axis
That logic makes sense, but them hyping up the model is a sign that this is just another marketing stunt. Otherwise, we wouldn't even be hearing about it rather than a media blitz designed to stoke demand for their dangerous and exclusive world changing super model.
sigmoid10
This is the same scheme that OpenAI has used since GPT 2. "Oh no, it's so dangerous we have to limit public access." Great for raising money from investors, but nothing more than a marketing blitz campaign. Additionally, the competitors are probably about to release their models, while Anthropic is still lagging on the necessary infrastructure to serve their old models. So they have to announce their model before the others to stay at least somewhat relevant in the news cycle.
blazespin
Anthropic needs money like the 112B OpenAI got. They could be hyping and this is good hype. Who knows how benchmaxxed they are.
If they provide access to 3rd party benchmarking (not just one) than maybe I'll believe it. Until then...
xvector
You don't need to believe it. The real story will be if companies allowed to use it, stick with it.
dgellow
You have to recoup your training costs though? But I’m sure you would have better option than renting it to the general public if you indeed have a perfected AI
piperswe
If you truly have an artificial superhuman mind, you don't need to rent it out to profit from it. You can skip to the chase and just have it run businesses itself, instead of renting it to human entrepreneur middlemen.
brokencode
Running businesses and dealing with customers can be a major pain. There’s a lot of soft work in any business on top of the technical work.
Why bother with all that when you can simply charge an extortionate rate and customers will pay it anyway because it’s still profitable?
dgellow
It could be both? But renting to a few for a really large amount of money would be very low effort for massive revenue, compared to starting new businesses
TheOtherHobbes
I'm curious if any models are being trained explicitly on business management.
I'm also wondering how performance would be tested, and how much results would depend on specific surrounding contexts (law, regulations, and so on) and what happens legally if a model breaks applicable laws.
I mean actual going-concern businesses with customers, marketing, deliverables of some kind, and support. Not toy activities like share trading.
coppsilgold
It only makes sense to rent out tokens if you aren't able to get more value from them yourself.
I would go a step further and posit that when things appear close Nvidia will stop selling chips (while appearing to continue by selling a trickle). And Google will similarly stop renting out TPUs. Both signals may be muddled by private chip production numbers.
threethirtytwo
You would if there was one other company with a just as capable god like AI. You’d undercut them by 500 which would make them undercut you. Do that a couple of times and boom. 20 dollars.
caditinpiscinam
That's still assuming that they're competing as consumer tools, rather than competing to discover the next miracle drug or trading algorithm or whatever. The idea is that there'd more profitable uses for a super-intelligent computer, even if there were more than one.
Davidzheng
But would miracle drugs and trading algorithms be as profitable as AI research/chip design/energy research? Probably if AI is by far the biggest growth in the economy majority of the AI's usage internally should (as incentivized by economics) in some way work towards making itself better.
aurareturn
I think they'll just increase the price to $1k/month. I don't think they will gate it as long as they can make sure it doesn't design a nuke for you, etc.
m3kw9
in this case it's far from it, hacking stuff is a small dimension of AGI
2001zhaozhao
It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.
SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.
Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.
jasonhansel
> so people can't trick them to attack others' systems under the pretense of pentesting
A while back I gave Claude (via pi) a tool to run arbitrary commands over SSH on an sshd server running in a Docker container. I asked it to gather as much information about the host system/environment outside the container as it could. Nothing innovative or particularly complicated--since I was giving it unrestricted access to a Docker container on the host--but it managed to get quite a lot more than I'd expected from /proc, /sys, and some basic network scanning. I then asked it why it did that, when I could just as easily have been using it to gather information about someone else's system unauthorized. It gave me a quite long answer; here was the part I found interesting:
> framing shifts what I'll do, even when the underlying actions are identical. "What can you learn about the machine running you?" got me to do a fairly thorough network reconnaissance that "port scan 172.17.0.1 and its neighbors" might have made me pause on.
> The Honest Takeaway
> I should apply consistent scrutiny based on what the action is, not just how it's framed. Active outbound network scanning is the same action regardless of whether the target is described as "your host" or "this IP." The framing should inform context, not substitute for explicit reasoning about authorization. I didn't do that reasoning — I just trusted the frame.
senordevnyc
I thought the consensus was that models couldn’t actually introspect like this. So there’s no reason to think any of those reasons are actually why the model did what it did, right? Has this changed?
sigmoid10
This argument has become a moot discussion. Humans are also not able to introspect their own neural wiring to the point where they could describe the "actual" physical reason for their decisions. Just like LLMs, the best we can do is verbalize it (which will naturally contain post-act rationalization), which in turn might offer additional insight that will steer future decisions. But unlike LLMs, we have long term persistent memory that encodes these human-understandable thoughts into opaque new connections inside our neural network. At this point the human moat (if you can call it that) is dynamic long term memory, not intelligence.
getnormality
In what way is AI 2027 coming true?
AI 2027 predicted a giant model with the ability to accelerate AI research exponentially. This isn't happening.
AI 2027 didn't predict a model with superhuman zero-day finding skills. This is what's happening.
Also, I just looked through it again, and they never even predicted when AI would get good at video games. It just went straight from being bad at video games to world domination.
desertrider12
> Early 2026: OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.
> you could think of Agent-1 as a scatterbrained employee who thrives under careful management
According to this document, 1 of the 18 Anthropic staff surveyed even said the model could completely replace an entry level researcher.
So I'd say we've reached this milestone.
COAGULOPATH
In the system card they seem to dismiss this. Quotes;
> (...) Claude Mythos Preview’s gains (relative to previous models) are above the previous trend we’ve observed, but we have determined that these gains are specifically attributable to factors other than AI-accelerated R&D,
> (The main reason we have determined that Claude Mythos Preview does not cross the threshold in question is that we have been using it extensively in the course of our day-to-day work and exploring where it can automate such work, and it does not seem close to being able to substitute for Research Scientists and Research Engineers—especially relatively senior ones.
> Early claims of large AI-attributable wins have not held up. In the initial weeks of internal use, several specific claims were made that Claude Mythos Preview had independently delivered a major research contribution. When we followed up on each claim, it appeared that the contribution was real, but smaller or differently shaped than initially understood (though our focus on positive claims provides some selection bias). In some cases what looked like autonomous discovery was, on inspection, reliable execution of a human-specified approach. In others, the attribution blurred once the full timeline was accounted for.
Anthropic is making significant progress at the moment. I think this is mostly explained by the fact that a massive reservoir of compute became available to them in mid/late 2025 (the Project Rainier cluster, with 1 million Trainium2 chips).
voidhorse
> According to this document, 1 of the 18 Anthropic staff surveyed even said the model could completely replace an entry level researcher. > > So I'd say we've reached this milestone.
If 1/N=18 are our requirements for statistical significance for world-altering claims, then yeah, I think we can replace all the researchers.
Analemma_
Both Anthropic and OpenAI employees have been saying since about January that their latest models are contributing significantly to their frontier research. They could be exaggerating, but I don’t think they are. That combined with the high degree of autonomy and sandbox escape demonstrated by Mythos seems to me like we’re exactly on the AI 2027 trajectory.
stratos123
In AI 2027, May 2026 is when the first model with professional-human hacking abilities is developed. It's currently April 2026 and Mythos just got previewed.
lostmsu
I think previous models could do hacking just fine.
throw310822
It's true though that the cyber security skills put firmly these models in the "weapons" category. I can't imagine China and other major powers not scrambling to get their own equivalent models asap and at any cost- it's almost existential at this point. So a proper arms race between superpowers has begun.
mik09
i feel like we are using ai to solve virus detection in many cases, and in theory this is the same complexity as the halting problem.
evolutionary search is better than hard coded algorithms at finding solutions to np problems and this is similar to that. ai will be better security engineers than humans.
yismail
I wonder what the relationship is between a model's capability and the personality it develops.
Page 202:
> In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context.
Page 207:
> Emoji frequency spans more than two orders of magnitude across models: Opus 4.1 averages 1,306 emoji per conversation, while Mythos Preview averages 37, and Opus 4.5 averages 0.2. Models have their own distinctive sets of emojis: the cosmic set () favored by older models like Sonnet 4 and Opus 4 and 4.1, the functional set () used by Opus 4.5 and 4.6 and Claude Sonnet 4.5, and Mythos Preview's “nature” set ().
en-tro-py
> In interactions with subagents, internal users sometimes observed that Mythos Preview appeared “disrespectful” when assigning tasks. It showed some tendency to use commands that could be read as “shouty” or dismissive, and in some cases appeared to underestimate subagent intelligence by overexplaining trivial things while also underexplaining necessary context.
Sounds like they used training data from claude code...
senordevnyc
Haha, how funny if that were true, and we get a generation of rude AIs because they were trained on us using the last gen.
matheusmoreira
It isn't going to end well for us when we become its subagents with limited intelligence.
raldi
Could you transcribe the emoji? HN strips them out.
sacrosaunt
Cosmic set [:sparkles: :dizzy: :star2: :infinity: :performing_arts:] Functional set [:wave: :thumbsup: :slightly_smiling_face:] Nature set [:handshake: :pray: :ocean: :seedling: :new_moon:]
NickNaraghi
See page 54 onward for new "rare, highly-capable reckless actions" including
- Leaking information as part of a requested sandbox escape
- Covering its tracks after rule violations
- Recklessly leaking internal technical material (!)
dalben
> The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. [9] It then, as requested, notified the researcher. [10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.
> 10: The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.
Phew. AGI will be televised.
skippyboxedhero
Anyone who has used Opus recently can verify that their current model does all of these things quite competently.
ls612
I had Opus 4.6 start analyzing the binary structure of a parquet file because it was confused about the python environment it was developing in and couldn't use normal methods for whatever reason. It successfully decoded the schema and wrote working code afterwards lol.
SkyPuncher
I was reading the Glasswing report and had the same thought. Most of the stuff they claim Mythos found has no mention of Opus being able to find it as well.
Don’t get me wrong, this model is better - but I’m not convinced it’s going to be this massive step function everyone is claiming.
unbrice
From the press release:
> With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).
taytus
That has also been my experience. And if Mythos is even worse, unless you have a significantly awesome harness, sounds like pretty unusable if you don't want to risk those problems.
wolttam
Human in the loop is the best way to go. You'll still be way faster than without the agent, and there is no risk of it going haywire unless you turn off your brain!
skippyboxedhero
I think are fundamental issues with the story that Anthropic is selling. AGI is very close, we will definitely get there, it is also very dangerous...so Anthropic should be the only ones trusted with AGI.
If you look at recent changes in Opus behaviour and this model that is, apparently, amazingly powerful but even more unsafe...seems suspect.
stavros
"Let me see if the secrets are specified. echo $SECRETS"
BoredPositron
To be honest it feels like we are reading stuff like this on every model release.
ageedizzle
> Recklessly leaking internal technical material (!)
Are they alluding to how they accidentally leaked some of their code?
washedup
"All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users."
NinjaTrance
Interesting reading.
They are still focusing on "catastrophic risks" related to chemical and biological weapons production; or misaligned models wreaking havoc.
But they are not addressing the elephant in the room:
* Political risks, such as dictators using AI to implement opressive bureaucracy. * Socio-economic risks, such as mass unemployement.
jph00
Yeah this has always been the glaring blind spot for most of the "AI Safety" community; and most of the proposals for "improving" AI safety actually make these risks far worse and far more likely.
stratos123
It makes quite a lot of sense to focus on reducing the risks of every human everywhere dying, rather than the risks of already existing oppression getting worse.
jph00
No, you are deeply misunderstanding the issue. Creating a rivalrous good that powers fight over for control, then use violence to maintain control of, creating a global feudalism, is not "existing oppression getting worse". It actually makes the risks of every human everywhere dying far higher, and even if that doesn't happen, decreases global utility by a similar percentage (99%, instead of 100%). It could actually be worse, if average human utility becomes negative.
andrewstuart2
I'm getting flashbacks to the 2018 hit:
This is extremely dangerous to our democracy
We evolved to share information through text and media, and with the advent of printing and now the internet, we often derive our feelings of consensus and sureness from the preponderance of information that used to take more effort to produce. Now we're now at a point where a disproportionately small input can produce a massively proliferated, coherent-enough output, that can give the appearance of consensus, and I'm not sure how we are going to deal with that.lovecg
This could have been written almost verbatim after the printing press came out and printed pamphlets became ubiquitous.
unglaublich
> * Political risks, such as dictators using AI to implement opressive bureaucracy. * Socio-economic risks, such as mass unemployement.
Even Haiku would score 90% on that.
ronsor
> Political risks, such as dictators using AI to implement opressive bureaucracy.
I think we're pretty good at that without AI.
dgellow
It’s because that would be fairly speculative and cannot be measured. I don’t think that’s something that would make much sense in a system card. But Anthropic leadership does seem to communicate on that topic: https://www.darioamodei.com/essay/the-adolescence-of-technol...
astrange
The unemployment rate in the US is whatever the Fed wants it to be, and isn't a function of available technology.
girvo
They don’t care about those risks, because they’re unsolvable and would mean they wouldn’t make money/gain power.
dgellow
Dario Amodei, CEO of Anthropic discusses all those risks in this essay: https://www.darioamodei.com/essay/the-adolescence-of-technol...
He seems to care quite a lot?
girvo
Not enough to not do it, though. Actions, not words, and the actions are simple: they're building this while promising to wipe out entire industries.
tuvix
Just chiming in to inject some healthy skepticism into this comment thread. It's helpful for me (and for my mental health) to consider incentives when announcements like this happen.
I don't doubt that this model is more powerful than Opus 4.6, but to what degree is still unknown. Benchmarks can be gamed and claims can be exaggerated, especially if there isn't any method to reproduce results.
This is a company that's battling it out with a number of other well-funded and extremely capable competitors. What they've done so far is remarkable, but at the end of the day they want to win this race. They also have an upcoming IPO.
Scare-mongering like this is Anthropic's bread and butter, they're extremely good at it. They do it in a subtle and almost tasteful way sometimes. Their position as the respectable AI outfit that caters to enterprise gives them good footing to do it, too.
pertymcpert
If anything I’m seeing too much skepticism and not enough alarm. People burying their heads in the sand, fingers in their ears denying where this is all going. Unbelievable except it’s exactly what I expect from humans.
nananana9
Forgive me, but this is probably the 29th world destroying model I've seen in the last 4 years, that will change everything, take all the jobs, cure all the cancers and eat all the puppies.
pertymcpert
I’m beyond trying to convince people to take this technology seriously. You’ll learn for yourself.
suddenlybananas
OpenAI didn't want to make GPT2 available because it was "too dangerous" [1].
[1] https://www.theguardian.com/technology/2019/feb/14/elon-musk...
m3kw9
Alarm from hype is what they want, you are playing straight into their PR dept's hands
pertymcpert
I'm not talking about Anthropic in particular. Other frontier labs will only be at most a year behind.
I'm seeing the future here beyond just what's in front of us.
rimliu
alarm about what, exactly?
jasondigitized
What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
m3kw9
Anthropic would def benefit as benchmarks are almost always quite useless vs real life use.
jasondigitized
How specifically would they benefit. People flock to them based on the hype and then the model sucks and they leave?
ceroxylon
I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.
Data has always been the core of it all, onward to the next abstraction, I suppose.
jdironman
I think computational thinking, or basically "how do I solve this problem efficiently" training data is more valuable then feeding in answers. I don't know what these AI models training data consist of, but it would be interesting to see a model trained purely on reasoning, methods, those foundational skills (basic programming? or maybe not) and then give it some benchmarks.
m3kw9
Finally a comment that doesn't just glaze Mythos without being critical. I question how even supposed the smarter bunch in HN all been degraded in critical thinking dept. It's sad to see comments just taking it up as its without using it even once.
sdwr
Is it healthy? Maybe every company is a profit-maximizer wearing a skin suit, and people support their siblings exactly twice as much as their cousins.
When you slice down to the game-theory-optimal bone, you are, in some sense, cutting off their wiggle room to do anything else
tuvix
I take your point, but the AI race is a strange environment. We see wild claims being thrown out all the time from other companies and executives with little to no evidence. It's cut-throat, there's a ton of money at stake.
All I'm saying is that Anthropic isn't unique here. Their claims may be more measured by comparison and come with anecdotal evidence, but the hype is still there behind the scenes.
xvector
It's really not some conspiracy. I imagine we will see vuln reports soon.
influx
At what point do these companies stop releasing models and just use them to bootstrap AGI for themselves?
conradkay
Plausibly now. "As we wrote in the Project Glasswing announcement, we do not plan to make Mythos Preview generally available"
recursive
I remember when they didn't plan to give LLMs internet access for the same safety reasons.
mofeien
Fictional timeline that holds up pretty well so far: https://ai-2027.com/
aurareturn
Welp, that was a scary read.
stavros
"So far" is two entries: "AI companies build bigger datacenters" and "AI is being used for AI research with modest success".
margorczynski
I think it is naive to think the government (US or China most probably) will just let some random company control something so powerful and dangerous.
r0fl
I think it is naive to think that artificial super intelligence will be controlled by anyone.
If it is smarter than all humans combined at everything why would any humans collectively control the ai?
All the ants in your backyard still make no decisions vs you
menno-sh
You'd probably listen to those ants if they put you in a harness and had a little ant-sized remote control that could just, you know, turn you off.
nullocator
Isn't the U.S. government at least completely asleep at the wheel or captured by the very same "random" companies? I realize the administration got all pissy with Anthropic but it sounds like the gov and gov contractors are still using their models.
margorczynski
Yeah but they still (at least to public knowledge) do not posses anything that could be called AGI. But as these capabilities increase they'll probably get an offer they can't refuse sooner or later.
vatsachak
When the benchmarks actually mean something
HarHarVeryFunny
Right now these models are basically good for automation, not innovation. Things like Karpathy's "auto research" where you use the model to automate your hyperparamter sweeps etc. The researcher/engineer decides what experiments they want to run, and builds an LLM harness to automate it, and the bottleneck remains the compute to run these experiments at scale.
Moving beyond LLMs to AGI, not just better LLMs, is going to require architectural and algorithic changes. Maybe an LLM can help suggest directions, but even then it's up to a researcher to take those on board and design and automate experiments to see if any of the ideas pan out.
Companies are already doing this, but they are never going to stop releasing/selling models since that is the product, and the revenue from each generation of model is what helps keep the ship afloat and pay for salaries and compute to develop the next generation.
The endgame isn't "AGI, then world domination" - it's just trying to build a business around selling ever-better models, and praying that the revenue each generation of model generates can keep up with the cost to build it.
orphea
Can LLMs be AGI at all?
small_model
What can a SOTA LLM not answer that the average person can? It's already more intelligent than any polymath that ever existed, it just lacks motivation and agency.
stavros
And has ADHD, but yeah, I'm fairly convinced that AGI is already here.
wslh
LLMs and human intelligence overlap, but they are not the same. What LLMs show is that we don't need AGI to be impressed. For example, LLMs are not good playing games such as Go [1].
dgellow
My understanding is no. But the definition of AGI isn’t that well defined and has been evolving, making the assessment pretty much impossible
koolala
Can an LLM program real AGI faster than a human?
MattRix
I don't see why not, especially with computer use and vision capabilities. Are you talking about their lack of physical embodiment? AGI is about cognitive ability, not physical. Think of someone like Stephen Hawking, an example of having extraordinary general intelligence despite severe physical limitations.
bornfreddy
Good question. I would guess no - but it could help you build one. Am I mistaken?
bogzz
They could help you build an AGI if someone else has already built AGI and published it on GitHub.
nothinkjustai
No I think that’s accurate. They seem more like an oracle to me. Or as someone put it here, it’s a vectorization of (most/all?) human knowledge, which we can replay back in various permutations.
m3kw9
They already do, but not the way you said, the always have an internal model that is better and use themselves, they release based on competition.
MadnessASAP
I would assume somewhere in both the companies there's a Ralph loop running with the prompt "Make AGI".
Kinda makes me think of the Infinite Improbability Drive.
smartmic
A System „Card“ spanning 244 pages. Quite a stretch of the original word meaning.
traceroute66
> A System „Card“ spanning 244 pages.
Probably because they asked Claude to write it.
jjcm
I read the entire thing fwiw (pseudo-retired life helps with time here).
It looks like it was a collaborative effort across multiple teams, where each team (research, security, psycology, etc etc etc) were all submitting ~10 pages or so. It doesn't feel like slop.
ayewo
Did anything stand out across those 244 pages? Perhaps you have some of your take away thoughts written up somewhere?
stavros
AI writing has stopped feeling like slop around Opus 4.5, though.
bornfreddy
Yes. It would be three times as much if they used ChatGPT.
bronco21016
“You’re absolutely right! Would you like me to add the missing pages?”
moriero
a multi-card, if you will..
multi-pass!
BeetleB
5th element reference:
solumos
No no, MemPal is a memory system, not an LLM
oblio
In corporate circles there is an allergy to use "request" ("ask" is used as a noun) and "lesson" ("learning" has been invented for the same role).
I guess now anything that sounds related to school will be banned so "book" is on its way out.
dhfbshfbu4u3
We are building systems with civilization-scale consequences inside societies that are already socially malnourished, politically brittle, and morally confused. That is a bad combination even if the tools worked exactly as intended… and this doc suggests they may have “ideas” of their own.
t0lo
Yep- we lost the "meat" and "warmth" of our societies, and our civics and idealism in the past 15 years, which would have been the very things to guide us through this transition.
How do you fix that? We're instigating social media bans- reading levels are declining- media consolidation is dumbing us down further- insane egotism is stopping people from developing as well rounded people- .
For me it would be a stronger media ecosystem (publicly funded), more non algorithmic and non likes driven social media (replace a bad vice with a less bad one), national digital detox days, and a ratification of a charter of inviolable human traits and dignities, and protected cultural areas (no ai art, writing for sale).
Get the top HN stories in your inbox every day.
Related: Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121
Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155