Get the top HN stories in your inbox every day.
cxie
FloatArtifact
They didn't increase the memory bandwidth. You can get the same memory bandwidth, which is available on the M2 Studio. Yes, yes, of course you can get 512 gigabytes of uRAM for 10 grand.
The the question is if a llm will run with usable performance at that scale? The point is there's diminishing returns despite having enough uRAM with the same amount of memory bandwidth even with increased processing speed of the new chip for AI.
So there must be a min-max performance ratio between memory bandwidth and the size of the memory pool in relation to the processing power.
lhl
Since no one specifically answered your question yet, yes, you should be able to get usable performance. A Q4_K_M GGUF of DeepSeek-R1 is 404GB. This is a 671B MoE that "only" has 37B activations per pass. You'd probably expect in the ballpark of 20-30 tok/s (depends on how much actually MBW can be utilized) for text generation.
From my napkin math, the M3 Ultra TFLOPs is still relatively low (around 43 FP16 TFLOPs?), but it should be more than enough to handle bs=1 token generation (should be way <10 FLOPs/byte for inference). Now as far is its prefill/prompt processing speed... well, that's another matter.
lynguist
I actually think it’s not a coincidence and they specifically built this M3 Ultra for DeepSeek R1 4-bit. They also highlight in their press release that they tested it with 600B class LLMs (DeepSeek R1 without referring to it by name). And they specifically did not stop at 256 GB RAM to make this happen. Maybe I’m reading too much into it.
drited
I would be curious about context window size that would be expected when generating ballpark 20 to 20 tokens per second using Deepseek-R1 Q4 on this hardware?
undefined
valine
Probably helps that models like deepseek are mixture of expert. Having all weights in VRAM means you don’t have to unlod/reload. Memory bandwidth usage should be limited to the 37B active parameters.
FloatArtifact
> Probably helps that models like deepseek are mixture of expert. Having all weights in VRAM means you don’t have to unlod/reload. Memory bandwidth usage should be limited to the 37B active parameters.
"Memory bandwidth usage should be limited to the 37B active parameters."
Can someone do a deep dive above quote. I understand having the entire model loaded into RAM helps with response times. However, I don't quite understand the memory bandwidth to active parameters.
Context window?
How much the model can actively be processed despite being fully loaded into memory based on memory bandwidth?
diggan
> The the question is if a llm will run with usable performance at that scale?
This is the big question to have answered. Many people claim Apple can now reliably be used as a ML workstation, but from the numbers I've seen from benchmarks, the models may fit in memory, but the performance for tok/sec is so slow to not feel worth it, compared to running it on NVIDIA hardware.
Although it be expensive as hell to get 512GB of VRAM with NVIDIA today, maybe moves like this from Apple could push down the prices at least a little bit.
johnmaguire
It is much slower than nVidia, but for a lot of personal-use LLM scenarios, it's very workable. And it doesn't need to be anywhere near as fast considering it's really the only viable (affordable) option for private, local inference, besides building a server like this, which is no faster: https://news.ycombinator.com/item?id=42897205
hangonhn
Do we know if is it slower because of hardware is not as well suited for the task or is it mostly a software issue -- the code hasn't been optimized to run on Apple Silicon?
bob1029
> The question is if a llm will run with usable performance at that scale?
For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.
kridsdale1
Someone has got to be working on a better method than that. Hundreds of billions are at stake.
cxie
Guess what? I'm on a mission to completely max out all 512GB of mem...maybe by running DeepSeek on it. Pure greed!
swivelmaster
You could always just open a few Chrome tabs…
petepete
Give Cities Skylines 2 a try.
gustomksimus25
[dead]
undefined
deepGem
Any idea what the sRAM to uRAM ratio is on these new GPUs ? If they have meaningfully higher sRAM than the Hopper GPUs, it could lead to meaningful speedups in large model training.
If they didn't increase the memory bandwidth, then 512GB will enable longer context lengths and that's about it right? No speedups
For any speedups You may need some new variant of FlashAttention3 or something along similar lines to be purpose built for Apple GPUs.
astrange
I don't know what you mean by s and u, but there is only one kind of memory in the machine, that's what unified memory means.
TheRealPomax
Yeah they did? The M4 has a max memory bandwidth of 546GBps, the M3 Ultra bumps that up to a max of 819GBps.
(and the 512GB version is $4,000 more rather than $10,000 - that's still worth mocking, but it's nowhere near as much)
okanesen
Not that dramatic of an increase actually - the M2 Max already had 400GB/s and M2 Ultra 800GB/s memory bandwidth, so the M3 Ultra's 819GB/s is just a modest bump. Though the M4's additional 146GB/s is indeed a more noticeable improvement.
sudoshred
Agree. Finally I can have several hundred browser tabs open simultaneously with no performance degradation.
nikisweeting
My M1 Max regularly pushes 1000+ tabs without breaking a sweat, I feel like this particular metric is no longer useful now that background tab memory is almost always unloaded by the browser.
nullc
I'm not sure that unified memory is particularly relevant for that-- so e.g. on zen4/zen5 epyc there is more than enough arithmetic power that LLM inference is purely memory bandwidth limited.
On dual (SP5) Epyc I believe the memory bandwidth is somewhat greater than this apple product too... and at apple's price points you can have about twice the ram too.
Presumably the apple solution is more power efficient.
PeterStuer
Is this on chip memory? From the 800GB/s I would guess more likely a 512bit bus (8 channel) to DDR5 modules. Doing it on a quad channel would just about be possible, but really be pushing the envelope. Still a nice thing.
As for practicality, which mainstream applications would benefit from this much memory paired with a nice but relative mid compute? At this price-point (14K for a full specced system), would you prefer it over e.g. a couple of NVIDIA project DIGITS (assuming that arrives on time and for around the announced the 3K price-point)?
zitterbewegung
NVIDIA project DIGITS has 128 GB LPDDR5x coherent unified system memory at a 273 Gb/s memory bus speed.
bangaladore
It would be 273 GB/s (gigabytes, not gigabits). But in reality we don't know the bandwidth. Some ex employee said 500 GB/s.
You're source is a reddit post in which they try to match the size to existing chips, without realizing that its very likely that NVIDIA is using custom memory here produced by Micron. Like Apple uses custom memory chips.
PeterStuer
Yes, but for the price of that single M3 ultra I could have 4 of those GB10's running in a 2x2 cluster with the full NVIDIA stack supported (which is still a big thing)
So M3 preference will depend on whether a niche can significantly benefit from a monolitic lower compute high memory vs higher compute but distributed setup.
MBCook
Unless something had changed its on package, but not the same die.
rlt
Is putting RAM on the same chip as processing economical?
I would have assumed you’d want to save the best process/node for processing, and could use a less expensive processes for RAM.
undefined
RataNova
It's a game changer for sure.... 512GB of unified memory really pushes the envelope, especially for running complex AI models locally. That said, the real test will be in how well the dual-chip design handles heat and power efficiency
resters
The same thing could be designed with greater memory bandwidth, and so it's just a matter of time (for NVIDIA) until Apple decides to compete.
InTheArena
Whoa. M3 instead of M4. I wonder if this was basically binning, but I thought that I had read somewhere that the interposer that enabled this for the M1 chips where not available.
That Said, 512GB of unified ram with access to the NPU is absolutely a game changer. My guess is that Apple developed this chip for their internal AI efforts, and are now at the point where they are releasing it publicly for others to use. They really need a 2U rack form for this though.
This hardware is really being held back by the operating system at this point.
exabrial
If Apple supported Linux (headless) natively, and we could rack m4 pros, I absolutely would use them in our Colo.
The CPUs have zero competition in terms of speed, memory bandwidth. Still blown away no other company has been able to produce Arm server chips that can compete.
hedora
The last I checked, AMD was outperforming Apple perf/dollar on the high end, though they were close on perf/watt for the TDPs where their parts overlapped.
I’d be curious to know if this changes that. It’d take a lot more than doubling cores to take out the very high power AMD parts, but this might squeeze them a bit.
Interestingly, AMD has also been investing heavily in unified RAM. I wonder if they have / plan an SoC that competes 1:1 with this. (Most of the parts I’m referring to are set up for discrete graphics.)
aurareturn
The M4 Pro is 56% faster in ST performance against AMD’s new Strix Halo while being 3.6x more efficient.
Source: https://www.notebookcheck.net/AMD-Ryzen-AI-Max-395-Analysis-...
Cinebench 2024 results.
nick_
Same. I'm not sure what to make of the various claims. I personally defer to this table in general: https://www.cpubenchmark.net/power_performance.html.
I'm not sure how those benchmarks translate to common real world use cases.
PaulHoule
If I read this right, the r8g.48xlarge at AMZN [1] has 192 cores and 1536GB which exceeds the M3 Ultra in some metrics.
It reminds me of the 1990s when my old school was using Sun machines based on the 68k series and later SPARC and we were blown away with the toaster-sized HP PA RISC machine that was used for student work for all the CS classes.
Then Linux came out and it was clear the 386 trashed them all in terms of value and as we got the 486 and 586 and further generations, the Intel architecture trashed them in every respect.
The story then was that Intel was making more parts than anybody else so nobody else could afford to keep up the investment.
The same is happening with parts for phones and TSMC's manufacturing dominance -- and today with chiplets you can build up things like the M3 Ultra out of smaller parts.
hedora
In fairness, the sun and dec boxes I used back then (up to about 1999) could hold their own against intel machines.
Then, one day, we built a 5 machine amd athlon xp linux cluster for $2000 ($400/machine) that beat all the unix and windows server hardware by at least 10x on $/perf.
It’s nice that we have more than one viable cpu vendor these days, though it seems like there’s only one viable fab company.
nsteel
It seems Graviton 4 CPUs have 12-channels of DDR5-5600 i.e 540GB/s main memory bandwidth for the CPU to use. M3 Ultra has 64-channels of LPDDR5-6400 i.e. ~800GB/s of memory bandwidth for the CPU or the GPU to use. So the M3 Ultra has way fewer (CPU) cores, but way more memory bandwidth. Depends what you're doing.
rbanffy
> The CPUs have zero competition in terms of speed, memory bandwidth.
Maybe not at the same power consumption, but I'm sure mid-range Xeons and EPYCs mop the floor with the M3 Ultra in CPU performance. What the M3 Ultra has that nobody else comes close is a decent GPU near a pool of half a terabyte of RAM.
icecube123
Yea ive been thinking about this for a few years. The Mx series’s chip would sell into data centers like crazy if apple went after that market. Especially if they created a server tuned chip. It could probably be their 2nd biggest product line behind the iphone. The performance and efficiency is awesome. I guess it would be meat to see some web serving and database benchmarks to really know.
kridsdale1
TSMC couldn’t make enough at the leading node in addition to all the iPhone chips Apple has to sell. There’s a physical thoughput limit. That’s why this isn’t M4.
hoppp
What about serviceability? These come with soldered in ssd? That would be an issue for server use, Its too expensive to throw it away all for a broken ssd.
galad87
No, the SSD isn't soldered, it has got one or two removable modules: https://everymac.com/systems/apple/mac-studio/mac-studio-faq...
gjsman-1000
Nah, in many businesses, everything is on a schedule. For desktop computers, a common cycle is 4 years. For servers, maybe a little longer, but not by much. After that date arrives, it’s liquidate everything and rebuild.
Having things consistently work is much cheaper than down days caused by your ancient equipment. Apple’s SSDs will make it to 5 years no problem - and more likely, 10-15 years.
notpushkin
Asahi is a thing. For headless usage it’s pretty much ready to go already.
EgoIncarnate
M3 support in Asahi is still heavily WIP. I think it doesn't even have display support, Ethernet, or Wifi yet, I think it's only serial over USB . Without any GPU or ANE support, it's not very useful for AI stuff. https://asahilinux.org/docs/M3-Series-Feature-Support/
criddell
The Asahi maintainer resigned recently. What that means for the future only time will tell. I probably wouldn't want to make a big investment in it right now.
WD-42
It’s only a thing for the M1. Asahi is a Sisyphean effort to keep up with new hardware and the outlook is pretty grim at the moment.
Apple’s whole m.o. is to take FOSS software, repackage it and sell it. They don’t want people using it directly.
lynndotpy
Not at all for M3 or M4. Support is for M2 and M1 currently.
Thaxll
Apple does not make server CPUs, they make consumer low W CPUs, it's very different.
FYI Apple runs Linux in their DC, so no Apple hardware in their own servers.
alwillis
> Apple does not make server CPUs, they make consumer low W CPUs, it's very different.
This is silly. Given the performance per watt, the M series would be great in a data center. As you all know, electricity for running the servers and cooling for the servers are the two biggest ongoing costs for a data center; the M series requires less power and runs more efficiently than the average Intel or AMD-based server.
> FYI Apple runs Linux in their DC, so no Apple hardware in their own servers.
That's certainly no longer the case. Apple announced their Private Cloud Compute [1] initiative—Apple designed servers running Apple Silicon to support Apple Intelligence functions that can't run on-device.
BTW, Apple just announced a $500 billion investment [2] in US-based manufacturing, including a 250,000 square foot facility to make servers. Yes, these will obviously be for their Private Cloud Compute servers… but it doesn't have to be only for that purpose.
From the press release:
As part of its new U.S. investments, Apple will work with manufacturing partners to begin production of servers in Houston later this year. A 250,000-square-foot server manufacturing facility, slated to open in 2026, will create thousands of jobs.
Previously manufactured outside the U.S., the servers that will soon be assembled in Houston play a key role in powering Apple Intelligence, and are the foundation of Private Cloud Compute, which combines powerful AI processing with the most advanced security architecture ever deployed at scale for AI cloud computing. The servers bring together years of R&D by Apple engineers, and deliver the industry-leading security and performance of Apple silicon to the data center.
Teams at Apple designed the servers to be incredibly energy efficient, reducing the energy demands of Apple data centers — which already run on 100 percent renewable energy. As Apple brings Apple Intelligence to customers across the U.S., it also plans to continue expanding data center capacity in North Carolina, Iowa, Oregon, Arizona, and Nevada.
[1]: https://security.apple.com/blog/private-cloud-compute/
[2]: https://www.apple.com/newsroom/2025/02/apple-will-spend-more...
exabrial
I think it's interesting everyone that dissented mentioned power consumption.
Our business "only" sees about 1,000-25,000 req/min, our message brokers transmit MAX 25k msg/s. Easily handled by a rack of 10 servers for redundancy.
We are not Google and we don't pretend to be, so we don't care about power, as the difference is a few dollars a month.
stego-tech
> This hardware is really being held back by the operating system at this point.
It really is. Even if they themselves won't bring back their old XServe OS variant, I'd really appreciate it if they at least partnered with a Linux or BSD (good callout, ryao) dev to bring a server OS to the hardware stack. The consumer OS, while still better (to my subjective tastes) than Windows, is increasingly hampered by bloat and cruft that make it untenable for production server workloads, at least to my subjective standards.
A server OS that just treats the underlying hardware like a hypervisor would, making the various components attachable or shareable to VMs and Containers on top, would make these things incredibly valuable in smaller datacenters or Edge use cases. Having an on-prem NPU with that much RAM would be a godsend for local AI acceleration among a shared userbase on the LAN.
ryao
Given shared heritage, I would expect to see Apple work with FreeBSD before I would expect Apple to work with Linux.
stego-tech
You are technically correct (the best kind of correct). I’m just a filthy heathen who lumps the BSDs and Linux distros under “Linux” as an incredibly incorrect catchall for casual discourse.
hedora
I heard OpenBSD has been working for a while.
I’m continually surprised Apple doesn’t just donate something like 0.1% of their software development budget to proton and the asahi projects. It’d give them a big chunk of the gaming and server markets pretty much overnight.
I guess they’re too busy adding dark patterns that re-enable siri and apple intelligence instead.
barryrandall
Sure, but FreeBSD also has a Linux compatability layer. For a company that's given up on the server market so many times, making MacOS compatible with _THE_ server OS makes a lot of sense.
hinkley
I miss the XServe almost as much as I miss the Airport Extreme.
stego-tech
I feel like Apple and Ubiquiti have a missed collaboration opportunity on the latter point, especially with the latter's recent UniFi Express unit. It feels like pairing Ubiquiti's kit with Apple's Homekit could benefit both, by making it easier for Homekit users to create new VLANs specifically for Homekit devices, thereby improving security - with Apple dubbing the term, say, "Secure Device Network" or some marketingspeak to make it easier for average consumers to understand. An AppleTV unit could even act as a limited CloudKey for UniFi devices like Access Points, or UniFi Cameras to connect/integrate as Homekit Cameras.
Don't get me wrong, I wouldn't use that feature (I prefer self-hosting it all myself), but for folks like my family members, it'd be a killer addition to the lineup that makes my life supporting them much easier.
klausa
>I had read somewhere that the interposer that enabled this for the M1 chips where not available.
With all my love and respect for "Apple rumors" writers; this was always "I read five blogposts about CPU design and now I'm an expert!" territory.
The speculation was based on the M3 Maxes die shots not having the interposer visible, which... implies basically nothing whether that _could have_ been supported in an M3 Ultra configuration; as evidenced by the announcement today.
sroussey
I’m guessing it’s not really a M3.
No M3 has thunderbolt 5.
This is a new chip with M3 marketing. I’d expect this from Intel, not Apple.
klausa
Baseline M4 doesn't have Thunderbolt 5 either; only the Pro/Max variants do.
The press-release even calls TB5 out: >Each Thunderbolt 5 port is supported by its own custom-designed controller directly on the chip.
Given that they're doing the same on A-series chips (A18 Pro with 10Gbps USB-C; A18 with USB 2.0); I imagine it's just relatively simple to swap the I/O blocks around and they're doing this for cost and/or product segmentation reasons.
hinkley
TB 5 seems like the sort of thing you could 'slap on' to a beefy enough chip.
Or the sort of thing you put onto a successor when you had your fingers crossed that the spec and hardware would finalize in time for your product launch but the fucking committee went into paralysis again at the last moment and now your product has to ship 4 months before you can put TB 5 hardware on shelves. So you put your TB4 circuitry on a chip that has the bandwidth to handle TB5 and you wait for the sequel.
kokada
> This hardware is really being held back by the operating system at this point.
Apple could either create a 2U rack hardware and support Linux (and I mean Apple supporting it, not hobbysts), or have a build of Darwin headless that could run on that hardware. But in the later case, we probably wouldn't have much software available (though I am sure people would eventually starting porting software to it, there is already MacPorts and Homebrew and I am sure they could be adapted to eventually run in that platform).
But Apple is also not interested in that market, so this will probably never happen.
ewzimm
There has to be someone at Apple with a contact at IBM that could make Fedora Apple Remix happen. It may not be on-brand, but this is a prime opportunity to make the competition look worse. File it under Community projects at https://opensource.apple.com/projects
naikrovek
> But Apple is also not interested in that market, so this will probably never happen.
they're just a tiny company with shareholders who are really tired of never earning back their investments. give 'em a break. I mean they're still so small that they must protect themselves by requiring that macs be used for publishing iPhone and iPad applications.
hnaccount_rng
Not to get in the way of good snark or anything. But.. Apple isn't _requiring_ that everyone uses MacOS on their systems. But you have to bring your own engineering effort to actually make another OS run. And so far Asahi is the only effort that I'm aware of (there were alternatives in the very beginning, but they didn't even get to M2 right?)
alwillis
I wouldn't be so sure about that.
pjmlp
Apple was once in the server market, they decided a few times actually, that isn't where they want to be.
GeekyBear
I also wondered about binning, so I pulled together how heavily Apple's Max chips were binned in shipping configurations.
M1 Max - 24 to 32 GPU cores
M2 Max - 30 to 38 GPU cores
M3 Max - 30 to 40 GPU cores
M4 Max - 32 to 40 GPU cores
I also looked up the announcement dates for the Max and the Ultra variant in each generation.
M1 Max - October 18, 2021
M1 Ultra - March 8, 2022
M2 Max - January 17, 2023
M2 Ultra - June 5, 2023
M3 Max - October 30, 2023
M3 Ultra - March 12, 2025
M4 Max - October 30, 2024
> My guess is that Apple developed this chip for their internal AI efforts
As good a guess as any, given the additional delay between the M3 Max and Ultra being made available to the public.
jonplackett
I’m missing the point. What is it you’re concluding from these dates?
GeekyBear
I was referring to the additional year of delay between the M3 Max and M3 Ultra announcements when compared to the M1 and M2 generations.
The theory that the M3 Ultra was being produced, but diverted for internal use makes as much sense as any theory I've seen.
It makes at least as much sense as the "TSMC had difficulty producing enough defect free M3 Max chips" theory.
AlchemistCamp
Keep in mind the minimum configuration that has 512GB of unified RAM is $9,499.
stego-tech
I cannot express how dirt cheap that pricepoint is for what's on offer, especially when you're comparing it to rackmount servers. By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP; sure, you get proper redundancy and extendable storage for that added cost, but now you also need redundant UPSes and have local storage to manage instead of centralized SANs or NASes.
For SMBs or Edge deployments where redundancy isn't as critical or budgets aren't as large, this is an incredibly compelling offering...if Apple actually had a competent server OS to layer on top of that hardware, which it does not.
If they did, though...whew, I'd be quaking in my boots if I were the usual Enterprise hardware vendors. That's a damn frightening piece of competition.
kllrnohj
> By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP
That nvidia GPU setup will actually have the compute grunt to make use of the RAM, though, which this M3 Ultra probably realistically doesn't. After all, if the only thing that mattered was RAM then the 2TB you can shove into an Epyc or Xeon would already be dominating the AI industry. But they aren't, because it isn't. It certainly hits at a unique combination of things, but whether or not that's maximally useful for the money is a completely different story.
AlchemistCamp
It's not quite an apples to apples comparison, no pun intended. I guess we'll see how it sells.
cubefox
I assume there is a very good reason why AMD and Intel aren't releasing a similar product.
BoredPositron
Still cheap if the only thing you look for is vram.
adgjlsfhk1
This chip has 0GB vram. It has 8 channel lpddr5.
baq
This is a ‘shut up and take my money’ price, it’ll fly off the shelves.
nsteel
And how is it only £9,699.00!! Does that dollar price include sales tax or are Brits finally getting a bargain?
vr46
The US prices never include state sales tax IIRC. Maybe we're finally getting some parity.
kgwgk
What's the bargain?
There is also "parity" in other products like a MacBook Pro from £1,599 / $1,599 or an iPhone 16 from £799 / $799. £9,699 / $9,499 is worse than that!
mastax
Tariffs perhaps?
DrBenCarson
Cheap relative to the alternatives
jmyeet
I've been looking at the potential for Apple to make really interesting LLM hardware. Their unified memory model could be a real game-changer because NVidia really forces market segmentation by limiting memory.
It's worth adding the M3 Ultra has 819GB/s memory bandwidth [1]. For comparison the RTX 5090 is 1800GB/s [2]. That's still less but the M4 Mac Minis have 120-300GB/s and this will limit token throughput so 819GB/s is a vast improvement.
For $9500 you can buy a M3 Ultra Mac Studio with 512GB of unified memory. I think that has massive potential.
[1]: https://www.apple.com/mac-studio/specs/
[2]: https://www.nvidia.com/en-us/geforce/graphics-cards/50-serie...
hedora
Other than the NPU, it’s not really a game changer; here’s a 512GB AMD deepseek build for $2000:
https://digitalspaceport.com/how-to-run-deepseek-r1-671b-ful...
aurareturn
between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b full model.
3.5 - 4.25 tokens/s. You're torturing yourself. Especially with a reasoning model.This will run it at 40 tokens/s based on rough calculation. Q4 quant. 37b active parameters.
5x higher price for 10x higher performance.
hinkley
Also you don't have to deal with Windows. Which people who do not understand Apple are very skilled at not noticing.
If you've ever used git, svn, or an IDE side by side on corporate Windows versus Apple I don't know why you would ever go back.
flakiness
The low energy use can be a game changer if you live in a crappy apartment with limited power capacity. I gave up my big GPU box dream because of that.
ksec
Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some other M3 model, which I think is plenty for even 99.9% of professional task.
They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.
InTheArena
Every single AI shop on the planet is trying to figure out if there is enough compute or not to make this a reasonable AI path. If the answer is yes, that 10k is a absolute bargain.
ZeroTalent
No, because there is no CUDA. We have fast and cheap alternatives to NVIDIA, but they do not have CUDA. This is why NVIDIA has 90% margins on its hardware.
jauntywundrkind
CUDA is simply not important for modern vLLM and many many others. DeepSeek V3 works great on SGLang. https://www.amd.com/en/developer/resources/technical-article...
Can you do absolutely everything? No. But most models will run or retrain fine now without CUDA. This premise keeps getting recycled from the past, even as that past has grown ever more distant.
Spooky23
> that 10k is a absolute bargain
The higher end NVidia workstation boxes won’t run well on normal 20amp plugs. So you need to move them to a computer room (whoops, ripped those out already) or spend months getting dedicated circuits run to office spaces.
magnetometer
Didn't really think about this before, but that seems to be mainly an issue in Northern / Central America and Japan. In Germany, for example, typical household plugs are 16A at 230V.
someothherguyy
In the US, normal circuits aren't always 20A, especially in residential buildings, where they are more commonly 15A in bedrooms and offices.
827a
Is this actually true? Were people doing this with the 192gb of the M2 Ultra?
I'm curious to learn how AI shops are actually doing model development if anyone has experience there. What I imagined was: Its all in the "cloud" (or, their own infra), and the local machine doesn't matter. If it did matter, the nvidia software stack is too important, especially given that a 512gb M3 Ultra config costs $10,000+.
DrBenCarson
You’re largely correct for training models
Where this hardware shines is inference (aka developing products on top of the models themselves)
internetter
No AI shop is buying macs to use as a server. Apple should really release some server macOS distribution, maybe even rackable M-series chips. I believe they have one internally.
jerjerjer
Why would any business pay Apple Tax for a backend, server product?
NorwegianDude
Not much to figure out. It's 2x M4 Max, so you need 100 of these to match the TOPS of even a single consumer card like the RTX 5090.
jeffhuys
Sure, but if you have models like DeepSeek - 400GB - that won't fit on a consumer card.
wpm
It's 2x M3 Max
alberth
> It's 2x M4 Max
Not exactly though.
This can have 512GB unified memory, 2x M4 Max can only have 128GB total (64GB each).
DrBenCarson
Now do VRAM
HPsquared
LLMs easily use a lot of RAM, and these systems are MUCH, MUCH cheaper (though slower) than a GPU setup with the equivalent RAM.
A 4-bit quantization of Llama-3.1 405b, for example, should fit nicely.
segmondy
The question will be how it will perform. I suspect Deepseek, Llama405B demonstrated the need for larger memory. Right now folks could build an epyc system with that much ram or more to run Deepseek at about 6 tokens/sec for a fraction of that cost. However not everyone is a tinker, so there's a market for this for those that don't want to be bothered. You say "AI Gold rush" like it's a bad thing, it's not.
MR4D
Remember, that RAM is also VRAM, so 1/2 terabyte of VRAM ain’t cheap. By comparison, Apple is a downright bargain!
tobyhinloopen
It doesn't have the bandwidth of dedicated GPU VRAM.
leetharris
Yes it does. It is just short of a 4090's memory bandwidth.
It's still far away from an H100 though.
bloppe
Big question is: Does the $10k price already reflect Trump's tariffs on China? Or will the price rise further still..
dwighttk
Maybe .1% of tasks need this RAM, why are they charging so much?
cjbgkagh
I don't need 512GB of RAM but the moment I do I'm certain I'll have bigger things to worry about than a $10K price tag.
almostgotcaught
This is Pascal's wager written in terms of ... RAM. The original didn't make sense and neither does this iteration.
pier25
Because the minority that needs that much RAM can't work without it.
In the media composing world they use huge orchestral templates with hundreds and hundreds of tracks with millions of samples loaded into memory.
agloe_dreams
Because the .1% is who will buy it? I mean, yeah, supply and demand. High demand in a niche with no supply currently means large margins.
I don't think anyone commercially offers nearly this much unified memory or NPU/GPUs with anything near 512GB of memory.
madeofpalk
Maybe because .1% of tasks need this RAM, it attracts a .1% price tag
Spooky23
With all things semiconductor, low volume = higher cost (and margin).
The people who need the crazy resource can tie it to some need that costs more. You’d spend like $10k running a machine with similar capabilities in AWS in a month.
Sharlin
It enables the use of giant AI models on a personal computer. Might not run too fast though. But at least it's possible at all.
tobyhinloopen
What is stopping us from running these models on a PC with 512GB RAM?
regularfry
The narrower the niche, the more you can charge.
A4ET8a8uTh0_v2
I think the answer is because they can ( there is a market for it ). The benefit to a crazy person like me that with this addition, I might be able to grab 128gb version at a lower price.
jjuliano
Currently, Docker does not support Metal GPUs.
When running LLMs on Docker with an Apple M3 or M4 chip, they will operate in CPU mode regardless of the chip's class, as Docker only supports Nvidia and Radeon GPUs.
If you're developing LLMs on Docker, consider getting a Framework laptop with an Nvidia or Radeon GPU instead.
Source: I develop an AI agent framework that runs LLMs inside Docker on an M3 Max (https://kdeps.com).
rtwld
Podman does support GLU acceleration through libkrun with virtio-gpu (venus) on Mac: https://podman-desktop.io/docs/podman/gpu
asadalt
how is it in practice (if you have tried). I have some vulkan work and I am too lazy to setup a new ec2 for it.
lauritz
They update the Studio to M3 Ultra now, so M4 Ultra can presumably go directly into the Mac Pro at WWDC? Interesting timing. Maybe they'll change the form factor of the Mac Pro, too?
Additionally, I would assume this is a very low-volume product, so it being on N3B isn't a dealbreaker. At the same time, these chips must be very expensive to make, so tying them with luxury-priced RAM makes some kind of sense.
lauritz
Interestingly, Apple apparently confirmed to a French website that M4 lacks the interconnect required to make an "Ultra" [0][1], so contrary to what I originally thought, they maybe won't make this after all? I'll take this report with a grain of salt, but apparently it's coming directly from Apple.
Makes it even more puzzling what they are doing with the M2 Mac Pro.
[0] https://www.numerama.com/tech/1919213-m4-max-et-m3-ultra-let...
[1] More context on Macrumors: https://www.macrumors.com/2025/03/05/apple-confirms-m4-max-l...
undefined
layer8
Apple says that not every generation will get an “Ultra” variant: https://arstechnica.com/apple/2025/03/apple-announces-m3-ult...
agloe_dreams
My understanding was that Apple wanted to figure out how to build systems with multi-SOCs to replace the Ultra chips. The way it is currently done means that the Max chips need to be designed around the interconnect. Theoretically speaking, a multi-SOC setup could also scale beyond two chips and into a wider set of products.
rbanffy
Ultra is already two big M3 chips coupled through an interposer. Apple is curiously not going the way of chiplets like the big CPU crowd is.
aurareturn
I'm not sure if multi-SoC is possible because having 2 GPUs together such that the OS sees it as one big GPU is not very possible if the SoCs are separated.
raydev
Honestly I don't think we'll see the M4 Ultra at all this year. That they introduced the Studio with an M3 Ultra tells me M4 Ultras are too costly or they don't have capacity to build them.
And anyway, I think the M2 Mac Pro was Apple asking customers "hey, can you do anything interesting with these PCIe slots? because we can't think of anything outside of connectivity expansion really"
RIP Mac Pro unless they redesign Apple Silicon to allow for upgradeable GPUs.
jsheard
> Maybe they'll change the form factor of the Mac Pro, too?
Either that or kill the Mac Pro altogether, the current iteration is such a half-assed design and blatantly terrible value compared to the Studio that it feels like an end-of-the-road product just meant to tide PCIe users over until they can migrate everything to Thunderbolt.
They recycled a design meant to accommodate multiple beefy GPUs even though GPUs are no longer supported, so most of the cooling and power delivery is vestigial. Plus the PCIe expansion was quietly downgraded, Apple Silicon doesn't have a ton of PCIe lanes so the slots are heavily oversubscribed with PCIe switches.
lauritz
I agree. Nonetheless, I agree with Siracusa that the Mac Pro makes sense as a "halo car" in the Mac lineup.
I just find it interesting that you can currently buy a M2 Ultra Mac Pro that is weaker than the Mac Studio (for a comparable config) at a higher price. I guess it "remains a product in their lineup" and we'll hear more about it later.
Additionally: If they wanted to scrap it down the road, why would they do this now?
madeofpalk
The current Mac Pro is not a "halo car". It's a large USB-A dongle for a Mac Studio.
crowcroft
Agree with this, and it doesn't seem like it's a priority for Apple to bring the kind of expandability back any time soon.
Maybe they can bring back the trash can.
jsheard
Isn't the Mac Studio the new trash can? I can't think of how a non-expandable Mac Pro could be meaningfully different to the Studio unless they introduce an even bigger chip above the Ultra.
pier25
I've always maintained that the M2 Mac Pro was really a dev kit for manufacturers of PCI parts. It's such a meaningless product otherwise.
toasterlovin
IMO they had plans for a Mac Pro chip that didn’t work out, so they released the M2 version to let their Mac Pro customers know that they’re still committed to the product in the Apple Silicon era.
newsclues
The Mac Pro could exist as a PCIe expansion slot storage case that accepts a logic board from the frequently updated consumer models. Or multiple Mac Studio logic boards all in one case with your expansion cards all working together.
undefined
TheTxT
512GB unified memory is absolutely wild for AI stuff! Compared to how many NVIDIA GPUs you would need, the pricing looks almost reasonable.
InTheArena
A server with 512GB of high-bandwidth GPU addressable RAM in a server is probably a six figure expenditure. If memory is your constrain, this is absolutely the server for you.
(sorry, should have specified that the NPU and GPU cores need to access that ram and have reasonable performance). I specified it above, but people didn't read that :-)
Numerlor
A basic brand new server can easily do 512gb. Not as fast as soldered memory but it should be maybe mid to high 5 figures
la_oveja
5 figures? can be done in 6k https://x.com/carrigmat/status/1884244369907278106
undefined
energy123
What is the memory bandwidth to the CPU cores? Is it competitive with 8-channel DDR5 servers for non-GPU compute?
jeffbee
That doesn't sound right. The marginal cost of +768GB of DDR5 ECC memory in an EPYC system is < $5k.
InTheArena
GPU accessible RAM.
jeroenhd
If you're going to overthrow your entire AI workflow to use a different API anyway, surely the AMD Instinct accelerator cards make more sense. They're expensive, but also a lot faster, and you don't need to deal with making your code work on macOS.
wmf
Doesn't AMD Instinct cost >$50K for 512GB?
codedokode
I don't think API has any value because writing software is free and hardware for ML is super expensive.
internetter
> writing software is free
says who? NVIDIA has essentially entrenched themselves thanks to CUDA
knowitnone
I'd like to hire you to write free software
positr0n
> writing software is free
Don't tell my boss! I still get paid.
baobabKoodaa
Daniel Ek spotted!
undefined
chakintosh
14k for a maxed out Mac Studio
mrtksn
Let's say you want to have the absolute max memory(512GB) to run AI models and let's say that you are O.K. with plugging a drive to archive your model weights then you can get this for a little bit shy of $10K. What a dream machine.
Compared to Nvidia's Project DIGITS which is supposed to cost $3K and be available "soon", you can get a specs matching 128GB & 4TB version of this Mac for about $4700 and the difference would be that you can actually get it in a week and will run macOS(no idea how much performance difference to expect).
I can't wait to see someone testing the full DeepSeek model on this, maybe this would be the first little companion AI device that you can fully own and can do whatever you like with it, hassle-free.
bloomingkales
There’s an argument that replaceable pc parts is what you want at that price point, but Apple usually provides multi year durability on their pcs. An Apple ai brick should last awhile.
NightlyDev
The full deepseek R1 model needs more memory than 512GB. The model is 720GB alone. You can run a quantized version on it, but not the full model.
summarity
You can chain multiple Mac Studios using exo for inference, you'd "only" need two of these. There's a bottleneck in the RMA speed over TB5, but this may not matter as much for a MoE model.
behnamoh
> I can't wait to see someone testing the full DeepSeek model on this
at 819 GB per second bandwidth, the experience would be terrible
coder543
DeepSeek-R1 only has 37B active parameters.
A back of the napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec.
Realistically, you’ll have to run quantized to fit inside of the 512GB limit, so it could be more like 22GB of data transfer per token, which would yield 37 tokens per second as the theoretical limit.
It is likely going to be very usable. As other people have pointed out, the Mac Studio is also not the only option at this price point… but it is neat that it is an option.
mrtksn
How many t/s would you expect? I think I feel perfectly fine when its over 50.
Also, people figured a way to run these things in parallel easily. The device is pretty small, I think for someone who wouldn't mind the price tag stacking 2-3 of those wouldn't be that bad.
yk
I think I've seen 800 GB/s memory bandwidth, so a q4 quant of a 400 B model should be 4 t/s if memory bound.
behnamoh
I know you’re referring to the exolabs app, but the t/s is really not that good. it uses thunderbolt instead of NVlink.
bearjaws
Not sure why you are being downvoted, we already know the performance numbers due to memory bandwidth constraints on the M4 Max chips, it would apply here as well.
525GB/s to 1000GB/s will double the TPS at best, which is still quite low for large LLMs.
lanceflt
Deepseek R1 (full, Q1) is 14t/s on an M2 Ultra, so this should be around 20t/s
teleforce
Thunderbolt 5 (TB 5) is pretty handy, you can have a very thin and lightweight laptop, then can get access to external GPU or eGPU via TB 5 if needed [1]. Now you can have your cake (lightweight laptop) and eat it too (potent GPU).
[1] Asus just announced the world’s first Thunderbolt 5 eGPU:
https://www.theverge.com/24336135/asus-thunderbolt-5-externa...
ben-schaaf
Except that you're stuck with macOS, so there aren't any drivers for NVIDIA, AMD or Intel GPUs.
iamtheworstdev
and that no one is developing games for MacOS.
rafram
(1) That’s obviously not actually true.
(2) “No one” is developing games for Linux either, but the Steam Deck works great. Why? Wine, which you can run on macOS too.
smilebot
Valve supports games on MacOS
wpm
Apple Silicon does not work with eGPU.
emp_
eGPU has a ton of issues on MacOS - I've used it for years, but now on Silicon its prob much worse - but let me give a shout out to the amazing (somewhat new) High Performance screen sharing mode added in Sonoma.
When I connect to my Mac Studio via Macbook I can select that mode, then change the Displays setting to Dynamic Resolution and then my 'thin client':
- Is fullscreen using the entire 16:10 Macbook screen
- Gets 60 fps low latency performance (including on actual games)
- Transfers audio, I can attend meetings in this mode
- Blanks the host Mac Studio screen
All things that were impossible via VNC - RDP is much better but this new High Performance Screen Share is even more powerful.
The thin lightweight laptop that remotes into a loaded machine has always been my idea of high mobility instead of suffering a laptop running everything locally. This works via LTE as well with some firewall setup.
bustling-noose
I wonder if Apple needs to reconsider Xserve. While Apple probably have some kind of server infrastructure teams, making their own server infrastructure out of their own hardware and software sounds like something they could explore. The app ecosystem coupled with apples servers offered in the cloud or ones you could buy would be a very interesting service business they could get into. Apples App Store needs better apps given how much the hardware is capable of now especially with iPads using M chips. A cloud backed hardware and software service specially designed for the app ecosystem sounds very tempting.
The hardware has evolved faster than software at Apple. It’s usually the opposite with most tech companies where hardware is unable to keep up with software.
c0deR3D
When would Apple silicons made natively support for OSes such as Linux? Apple seemlingly reluctant to release detailed technical reference manual for M-series SoCs, which makes running Linux natively on Apple silicon challenging.
bigyabai
Probably never. We don't have official Linux support for the iPhone or iPad, I would't hold out hope for Apple to change their tune.
dylan604
That makes sense to me though. If you don’t run iOS, you don’t have App Store and that means a loss of revenue.
bigyabai
Right. Same goes for MacOS and all of it's convenient software services. Apple might stand to sell more units with a more friendlier stance towards Linux, but unless it sells more Apple One subscriptions or increases hardware margins on the Mac, I doubt Cook would consider it.
If you sit around expecting selflessness from Apple you will waste an enormous amount of time, trust me.
AndroTux
If you don't run macOS, you don't have Apple iCloud Drive, Music, Fitness, Arcade, TV+ and News and that means a loss of revenue.
jobs_throwaway
You lose out on revenue from people who require OS freedom though
dylan604
That’s what’s weird to me too. It’s not like they would lose sales of macOS as it is given away with the hardware. So if someone wants to buy Apple hardware to run Linux, it does not have a negative affect to AAPL
bigfishrunning
Except the linux users won't be buying Apple software, from the app store or elsewhere. They won't subscribe to iCloud.
dylan604
I have Mac hardware and and have spent $0 through the Mac App Store. I do not use iCloud on it either. I do on iDevices though. I must be an edge case though.
tgv
You also lose out on developers. The more macOS users, the more attractive it is to develop for. Supporting Linux would be a loss for the macOS ecosystem, and we all know what that leads to.
cosmic_cheese
Those buying the hardware to run Linux also aren’t writing software for macOS to help make the platform more attractive.
jeroenhd
While I don't think Apple wants to change course from its services-oriented profit model, surely someone within Apple has run the calculations for a server-oriented M3/M4 device. They're not far behind server CPUs in terms of performance while running a lot cooler AND having accelerated amd64 support, which Ampere lacks.
Whatever the profit margin on an iMac Studio is these days, surely improving non-consumer options becomes profitable at some point if you start selling them by the thousands to data centers.
amelius
But then they'd have to open up their internal documentation of their silicon, which could possibly be a legal disaster (patents).
re-thc
> So if someone wants to buy Apple hardware to run Linux, it does not have a negative affect to AAPL
It does. Support costs. How do you prove it's a hardware failure or software? What should they do? Say it "unofficially" supports Linux? People would still try to get support. Eventually they'd have to test it themselves etc.
dylan604
Apple has already been in this spot. With the TrashCan MacPro, there was an issue with DaVinci Resolve under OS X at the time where the GPU was cause render issues. If you then rebooted into Windows with BootCamp using the exact same hardware and open up the exact same Resolve project with the exact same footage, the render errors disappeared. Apple blamed Resolve. DaVinci blamed GPU drivers. GPU blamed Apple.
k8sToGo
We used to have bootcamp though.
WillAdams
Is it not an option to run Darwin? What would Linux offer that that would not?
internetter
Darwin is a terrible server operating system. Even getting a process to run at server boot reliably is a nightmare.
kbolino
I don't think Darwin has been directly distributed in bootable binary format for many years now. And, as far as I know, it has never been made available in that format for Apple silicon.
_alex_
apple keeps talking about the Neural Engine. Does anything actually use it? Seems like all the current LLM and Stable Diffusion packages (including MLX) use the GPU.
gield
Face ID, taking pictures, Siri, ARKit, voice-to-text transcription, face recognition and OCR in photos, noise filtering, ...
cubefox
These have been possible in much smaller smartphone chips for years.
stouset
Possible != energy efficient, which is important for mobile devices.
KerrAvon
Yes, they have.
> September 12, 2017; 7 years ago
gield
Indeed, but the neural engine does this faster and using heavier models. For example, on-device Siri was not possible until the introduction of the neural engine in 2017.
dcchambers
Historically no, Ollama and the like have only used the CPU+GPU.
That said, there are efforts being made to use the NPU. See: https://github.com/Anemll/Anemll - you can now run small models directly on your Apple Silicon Mac's NPU.
It doesn't give better performance but it's massively more power efficient than using the GPU.
anentropic
Yeah I agree.
The Neural Engine is useful for a bunch of Apple features, but seems weirdly useless for any LLM stuff... been wondering if they'd address it on any of these upcoming products. AI is so hype right now it seems odd that they have specialised processor that doesn't get used for the kind of AI people are doing. I can see in the latest release:
> Mac Studio is a powerhouse for AI, capable of running large language models (LLMs) with over 600 billion parameters entirely in memory, thanks to its advanced GPU
https://www.apple.com/newsroom/2025/03/apple-unveils-new-mac...
i.e. LLMs still run on the GPU not the NPU
aurareturn
On the iPhone, it runs on the NPU.
rjeli
Wow, incredible. I told myself I’d stop waffling and just buy the next 800gb/s mini or studio to come out, so I guess I’m getting this.
Not sure how much storage to get. I was floating the idea of getting less storage, and hooking it up to a TB5 NAS array of 2.5” SSDs, 10-20tb for models + datasets + my media library would be nice. Any recommendations for the best enclosure for that?
kridsdale1
It depends on your bandwidth needs.
I also want to build the thing you want. There are no multi SSD M2 TB5 bays. I made one that holds 4 drives (16TB) at TB3 and even there the underlying drives are far faster than the cable.
My stuff is in OWC Express 4M2.
Get the top HN stories in your inbox every day.
512GB of unified memory is truly breaking new ground. I was wondering when Apple would overcome memory constraints, and now we're seeing a half-terabyte level of unified memory. This is incredibly practical for running large AI models locally ("600 billion parameters"), and Apple's approach of integrating this much efficient memory on a single chip is fascinating compared to NVIDIA's solutions. I'm curious about how this design of "fusing" two M3 Max chips performs in terms of heat dissipation and power consumption though