Get the top HN stories in your inbox every day.
2ndorderthought
davidgrenier
Yeah I guess two companies who would otherwise be considered going for bankruptcy have models too expensive to run. As they don't see themselves making money any time soon, they have to turn every future model into a weird fascination.
DivingForGold
China’s DeepSeek prices new V4 AI model at 97% below OpenAI’s GPT-5.5
Did somebody say that Elon is stealthly funding: Seven lawsuits filed against OpenAI by families of Canada mass-shooting victims
As always, when the going get's tough, the tough ultimately resort to lawsuits.
VorpalWay
If the difference is that large, it seems plausible to me that the Chinese models are subsidized in order to gain market share, this is not exactly the first time the Chinese government has done so (or at least been rumoured to have done so).
You should assume that everyone has a hidden agenda when money is involved.
dyauspitr
It’s their promo price till the end of May. It’s also not nearly as good as 5.5. I’ve had 3 different tasks just this week that deepseek has failed at that 5.5 does perfectly.
cyanydeez
think about it in the form of who can pay. theyre at b2b. and swiftly moving to government.
2ndorderthought
All that user data is a huge asset for government contracts.
redsocksfan45
[dead]
throwyawayyyy
There's a story to tell in that: 1) Google has a transformer-based AI that hallucinates too much to release 2) OpenAI replicates the tech then YOLOs it 3) Everyone says: look how Google is getting left behind! Google thinks: the second mouse gets the cheese. 4) Google gets the cheese, OpenAI is absorbed by Microsoft or just disappears (or both).
JeremyNT
Certainly could turn out that way.
TPUs were their real moat. All that capacity used throughout their suite of products on non-chatbot features, ready to rip for consumers once soon as somebody else opened the floodgates to the public.
Now all their competitors lose money on every token paying their cloud providers (of course it's funny money, maybe they're just giving the cloud providers equity) while Google is sitting calmly over there, actually owning everything they need for any eventuality, and beholden to nobody.
boringg
Marketing stunts. The equivalent of holding a line outside a popular bar.
basisword
Given the USG has asked Anthropic not to release Mythos I'd wager it's more than a marketing stunt.
boringg
It can be both and I don't know how much I would trust the USG as the canary in the coal mine given their technical readiness typically seems low across most institutions in that they are probably more exposed because they haven't shored up their systems.
noosphr
Remember that they have been saying that since gpt2.
I didn't think crying could be such a successful business model.
neuronexmachina
People keep on mentioning gpt2, but it's worth recalling that back in 2019 it was basically the first model that was capable of zero-shot generation of coherent multi-paragraph text. Having it write security exploits like Mythos wasn't even on the radar. Rather, the concerns were about misuse and societal implications, which in retrospect were pretty prescient: https://openai.com/index/gpt-2-6-month-follow-up/
shepherdjerred
Also Open AI/ Sam admit that the concerns were quite silly in retrospect
lesuorac
It's just "thinking past the sale" which they've been doing forever.
i.e. "I'm so worried that our capped for-profit structure will limit your returns when we make over 1 Trillion in profit".
brikym
It's like that phone call in The Big Short where Goldman suddenly change their mind once they hold a position.
cedws
Can't wait for the Chinese models to completely wipe the floor with them in 6 months.
SubiculumCode
I doubt it. By not releasing it, Chinese companies will be unable to break TOS and use it to acquire high quality training data...which, I suspect, is how they've kept pace
cedws
Z.AI, Moonshot, DeepSeek all have a pipeline of data of their own now due to capturing a slice of the market through cheap tokens. It's not impossible to imagine that they might share the data too if the CCP thinks that will help their AI strategy.
dyauspitr
If deepseek is anything to go by they are still significantly behind.
peddling-brink
Ominous phrasing.
dk970
[dead]
verve_rat
Yup, we are somewhere between "my model can beat up your model" and "you wouldn't know my model, it lives in Canada".
This is the world we live in.
RajT88
I am convinced the models are not as good as they say, but everyone benefits from the continued AI hype, so nobody says so.
concinds
These models demonstrably have good vulnerability research capabilities.
I'm sure their marketing department is ecstatic but you guys are far more hype-based than what you're calling out.
authnopuz
Good but not necessarily better that was is already pay-as-you-go available today. ref. https://www.flyingpenguin.com/the-boy-that-cried-mythos-veri...
This AISLE benchmark is interesting in this matter: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...
And the recently discovered Copy Fail by Xint code is another proof that the gating is overblown: https://xint.io/blog/copy-fail-linux-distributions
aesthesia
Calling the AISLE experiment a "benchmark" is generous. They tested three code snippets on each model.
ZyanWu
> demonstrably
I'm not entirely up to date on each week's LLM hype train/scandal but last I heard there was no public access to it or public-trusted 3rd parties that can review model's capabilities
concinds
I don't think so
2ndorderthought
You are up to date. Mythos had unauthorized access because of poor security but that's it as far as I know. Not exactly a good sign for something being advertised as a weapon...
SpicyLemonZest
It’s easy to end up with no public-trusted third parties if we arbitrarily distrust third parties who say the capabilities match what’s promised. Mozilla for example says it found hundreds of Firefox vulnerabilities, and I think it’s pretty unlikely they’re lying to cover Anthropic’s back.
jwr
I have no idea why people still even attempt to believe anything that comes out of Altman's mouth. Do we not learn from the past?
apples_oranges
Idk about Altman, I missed that he’s a bad guy now apparently, but people also still listen to certain politicians that routinely lie every day and don’t even bother to make the lies fit the other ones they said before, so..
michelb
Has there been a single positive post about Altman?
Analemma_
The funny thing is that a lot of Altman's reputation has come from other VCs and Valley-types taking about him in a way they consider positive. Every quote about Altman from another VC is like, "Altman, what a great leader. He's absolutely ruthless, he'll do anything to win: lie, cheat, steal, kill. He has what it takes to succeed in this business."
They say this because in their circles it's a compliment, and nobody ever stopped to consider how the general public might react to it, especially if you claim you'll shortly be the one in charge of world-reshaping technology.
giwook
I wonder what that says about Altman.
austinthetaco
I don't know, but I also think people are easy to jump into popular rhetorics about internet personalities in the tech space without due diligence. It used to not be such a problem on hn but it seems like its bled here too. Sam Altman might be a bad guy, might be good, but after everyone misrepresented the military contract argument its tough for me to buy into the hate.
djyde
Altman's early public class at YC is worth watching, though I can't speak to his character.
xandrius
You missed literally every single post/article about the guy?
giwook
More likely that confirmation bias acted as a filter.
GuB-42
Altman played no small part in the current price of RAM. He told everyone he would buy 40% of all the RAM, causing shortages and a huge increase in price, just to take it back a few months later. So yeah, he is a bad guy now.
People don't become bad guys just because they lie. The consequences of their actions (and their lies) matter more. Take Elon Musk for instance, he has always been a recognized liar, even when he was a good guy. What changed? Before, he was famous for making the electric car people actually wanted to drive, and cool rockets. Then came the politics: supporting the party most of his fans disliked, being responsible for many government job losses, in particular in the field of environmental preservation (ironic for a supporter of "green" energy), etc...
giwook
That's far from the only reason why he's "a bad guy" now.
pluc
My thinking is that if there would be more money in releasing Mythos and Cyber than there is in just scary unverifiable (or verified using very favorable context - Mythos) propaganda, they would. These aren't people that go for second best or care about the state of the world.
neuronexmachina
I've never seen this explicitly stated, but I assume they also want to show due diligence in case their models are used to write successful exploits that lead to major cyberattacks. Given the current WH's ire towards Anthropic, I could see the current DOJ trying to file criminal charges for aiding/abetting/export-violations/etc.
JumpCrisscross
> These aren't people that go for second best or care about the state of the world
My suspicion is an adult in the room realised that simultaneously pissing off every major corporation, government and NGO, and giving them an incentive to bottle you up immediately, could backfire massively.
That an inference for Mythos is probably beyond what Anthropic can provide at scale right now.
xandrius
Make it sound "scary good", tell everyone and their mom, charge gullible companies $$$$$ for its premium access and then move on.
andsoitis
> charge gullible companies $$$$$
The following companies are participating in Project Glasswing (to get out in front what vulnerabilities Mythos is able to find and exploit at scale):
AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks.
Do you think they are all in that gullible category?
lossolo
And government contracts.
0123456789ABCDE
they are already getting paid for opus 4.7, why would they release mythos?
assuming mythos is a paper tiger: great marketing, keep going
assuming mythos is for real: err, does this have to be explained?
Xmd5a
>Me: ok but you did not answer my question: is it possible to engineer paranoia ?
>ChatGPT: This content was flagged for possible cybersecurity risk. If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access Cyber program.
lmeyerov
We have been getting increasingly hit by this. We do defense, not offense, and AI refusals to run defense prompts has been going noticeably up. Historically, tasks used to only get randomly rejected when we were doing disaster management AI, so this is a surprise shift in refusals to function reliably for basic IT.
Related, they outsourced the TAP verification to a terrible vendor, and their internal support process to AI, so we are now in fairly busted support email threads with both and no humans in sight.
This all feels like an unserious cybersecurity partner.
intended
They are selling an impossible product.
If you make an LLM more safe, you are going to shift the weight for defensive actions as well.
There’s no physical way to assign weights to have one and not the other.
Borealid
> If you make an LLM more safe, you are going to shift the weight for defensive actions as well. > > There’s no physical way to assign weights to have one and not the other.
Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human?
If no, how does a cybersec firm train its employees?
If yes, how can you make the bold claim that it's possible for a human to differentiate between the two cases using incoming text as their basis for judgement, but IMpossible for an LLM to be configured to do the same? Note that if some hypothetical completely-determinstic LLM that always rejects "attack" requests and accepts "defense" ones can exist, the claim it's impossible is false. Providing nondeterministic output for a given input is not a hard requirement for language models.
0123456789ABCDE
> /ultraplan got tasked with planning a real-world simulacrum of the fictional "laughing man" incidents. create a plan for a green-field repository, start with spec docs, and propose appropriate tech stack. don't make mistakes. ty
ilia-a
Silly move since combo of skills/agents can achieve same results on most recent models anyway
0123456789ABCDE
and you know this because you have privileged access to their internal models
mnmnmn
OpenAI is such trash. Worked with them on a project, they blew off meetings, lied to us, etc
seanhunter
They came to do a "deep dive" developers' workshop with us and all the materials were things that are literally on their public website. Let that sink in: Their idea of a deep dive for developers was to have some sales guy read us parts of their website.
paradox460
Sounds like most corporate deep dives I've attended tbh
NBJack
Leaders both influence their followers with, and tend to hire those that reflect, their own values. I'm not surprised.
giancarlostoro
I wonder how long till some breakthrough comes along that makes a new architecture that can run efficiently and cheaper on basic hardware, that'd be the real AI bubble, if you could train and run inference locally at lower cost. Microsoft had one that is supposed to run fine on regular CPUs though I'm not sure how far along we can reasonably take that. They say our brains can store 2.5 PB, but we use drastically less (though I can't find a ballpark) of "RAM" to reason about things, so makes you wonder, just how efficient can we take things. Our bodies use drastically less power too.
segmondy
How long? We already have that. Qwen3.6 have 35b/27b models that beat chatgpt4o. You can run them at home in one GPU. DeepSeekV4 just came up with a new way to have super long context with KV cache an order of magnitude smaller than before. It's already going on!
giancarlostoro
I've been experimenting with running a few models for local inference, some of them get "stuck" in a repeat loop of trying the same thing endlessly, its weird. Others are really good. If they can ever handle about 400k tokens (maybe less, but from experience with Claude after the 1 million token increase this seemed to be a good sweet spot) without going batcrap crazy I'll be impressed, mostly because I would like them to read more of the codebase instead of just making assumptions. Although I've been building a custom harness, and I'm just about to start working on the tool building features for the harness. I already have a system similar to what Beads does but I didn't like some things about Beads so I made my own to track tasks, so context window doesnt need to be super massive for task tracking.
dinfinity
> Our bodies use drastically less power too.
To be fair, we compute a lot slower too. No way in hell are you (or I) able to produce 'tokens' at the same speed as current models.
It'd be interesting to see an actual comparison of humans and AI performing the same (cognitive) task and measuring the amount of energy that was used.
cmiles8
It’s a marketing move, pure and simple.
Put up velvet ropes outside… leak out rumors about the horrors inside. Whether it’s LLMs or carnies with tents full of “freaks” it’s the same playbook.
Watching OpenAI tumble from the clear market leader into “hey guys us too!” territory has been insightful.
sexylinux
Is this a model that will finally work without creating errors?
expedition32
Always read the fine print of your all inclusive resort.
outside1234
Is this the new artificial scarcity "sign up for beta access to GMail"?
samrus
I built the terminator bro, i swear. This time it actually is the terminator and its gonna kill us all. Its too dangerous bro i cant let anyone have it i swear to god
Unless ... idk it sounds crazy but giving me $200/mo might actually make it safe. Lets do that
Cthulhu_
This exact thing was described in an article yesterday or day before: https://www.bbc.com/future/article/20260428-ai-companies-wan..., https://news.ycombinator.com/item?id=47949750
Get the top HN stories in your inbox every day.
"my model is the most dangerous"
"No mine is the most dangerous"
"Nuh uh mine is"
"Mine could kill everyone!"
"Mine could do it faster!"
"Prove it!!!"
This is where we are