UltraRAM

tomshardware.com

Daily Digest email

Get the top HN stories in your inbox every day.

tromp

With durability claims of

> 10 million write/erase cycles

this is not going to compete with DRAM, which needs to endure trillions of write/erase cycles in its lifetime.

Unless they grossly underestimated its durability, a name like UltraFlash would seem more appropriate?!

IanCal

They have tested it to 10 million cycles with no degradation, so that's where that figure comes from. It's not 10e7 before failures or 10e7 before failures at some particular rate. The assumption is it's somewhere higher than this but you can't tell without more testing.

> The process was repeated five times, resulting in a little over 10^7 program/erase cycles applied to the device. As can be clearly seen in Figure 4d, there is no degradation of the ∆IS-D window throughout these tests, meaning that the endurance is at least 10^7.

https://onlinelibrary.wiley.com/doi/epdf/10.1002/aelm.202101...

runeks

Hmm. If this memory is faster than DRAM, wouldn't it be quick to test, say, ten trillion write/erase cycles?

Why stop at 10M? Is the erase operation really slow?

dahfizz

The paper says they tested the durability of the ram with a 5ms program-read-erase-read loop. Meaning each time they program-read-erase-read, it takes 5 milliseconds.

Ten trillion cycles would take over 150 years.

I'm guessing a silicon lab doesn't have "the rest of the computer" that would allow them to run this ram at full speed constantly. This UltraRAM isn't something they can just slot into their motherboard.

antx

Indeed, the paper says:

> Assuming ideal capacitive scaling[33] down to state-of-the-art feature sizes, the switching performance would be faster than DRAM, although testing on smaller feature size devices is required to confirm this.

So, they have no idea of its performance. Yet.

idiotsecant

Probably a case of don't ask questions you don't want answers to.

Findecanor

That's about the same durability as Intel Optane had, so the first thing it could be would be to replace Optane where it has been used.

Optane did inspire a lot of R&D into persistent data structures, databases and file systems that started to challenge the traditional model of local memory and persistent storage. IMHO, a few of those projects were a little bit overoptimistic, and used NVRAM as DRAM without many restrictions. For NVRAM to be viable, I think it still needs to have overprovisioning, wear levelling, memory-protection and transactions, provided by hardware and/or an OS but not necessarily with traditional interfaces. It is mostly a matter of mapping it CoW via a paging scheme instead of directly, and it will still be at near-DRAM speed.

zozbot234

That's basically the equivalent of a Flash Translation Layer, and having it removes the original selling point of making fsync() a no-op. At that point, persistent memory's only advantage over existing non-volatile storage is possibly higher performance.

Findecanor

To hell with fsync(), I'd want a proper commit()! ;)

The performance is so high that the assumptions that had led to the old file system interfaces don't apply any more. There is opportunity for something better.

CoastalCoder

Why transactions?

two_handfuls

Because it’s the best way to survive failures (such as loss of power). Transactions allow you to know that all your datastructures are in a consistent state.

arc-in-space

Trillions of cycles, what? How can that be?

Maybe I'm confusing something, but to reach a trillion cycles in, say, a year, would take overwriting all your memory 30 times a millisecond. That doesn't sound right?

Or is that trillions of any writes and erases?

Cort3z

DDR ram is refreshed every 64ms (varies by DDR generation and specific chips). Branch Education has an excellent video on this named "How does computer memory work?"[1]. It would still take an exceedingly long time to reach a trillion, but it's still pretty frequent.

[1](https://www.youtube.com/watch?v=7J7X7aZvMXQ)

xnorswap

1 trillion * 64ms is over 2000 years, I think it's unlikely that there's any DDR RAM that old.

mastax

Persistent memory doesn't need to be refreshed though so that's irrelevant.

thereddaikon

This type of memory wouldn't need refresh so you can cut out all of those writes.

jepler

If you have to design for pathological workloads, absolutely you can write to a location in main memory 30 times per millisecond.

Lots of non-pathological workloads might write to a memory location every millisecond, such as a game with a 4-pass renderer running at 240Hz.

mattclarkdotnet

Even at 1GHz, a trillion (10^12) writes is only 1000 seconds of work for a modern CPU. OK latency is a thing, so multiply by 10 and it takes a day. This is for DRAM where cells are individually addressed. For flash with wear levelling the numbers of course get bigger.

jandrese

In practice a memory location being written to that heavily will never escape the cache unless you are doing something exceptionally weird.

pezezin

Modern DRAM doesn't address individual cells. For both DDR4 and DDR5 the minimum burst length is 64 bytes, the width of a cache line of most CPUs.

Etherlord87

I started to think about flipping a single bit in some process a million times per frame inside some loop, but that could only be done in cache…

Still if you only changed the state of the memory once per frame, you would do it in RAM, not in cache. At 1000 FPS (we should consider the worst scenario even if rare) that's 3 hours of playing a game to reach 10 800 000 reads/writes.

Now question is what happens if that bit gets damaged, perhaps the memory just disables it as damaged, and uses another bit for this memory address from now on. Perhaps it makes the ultra ram slower over time as more bits (sectors) get damaged?

tus666

Lifetime of microeletronics is often quoted at around 30 years. So that's once a millisecond. For a refresh cycle that does not seem extraordinary.

tromp

I was thinking of overwriting just a few words of memory over and over again, which DRAM can endure for decades.

pmontra

The clock frequency is GHz, which is a trillion cycles per seconds. There is at least one cache layer between the CPU and the RAM but we are in the same ballpark. And yet it's OK for the typical lifetime of our computers.

benjijay

GHz is Billion, not Trillion

agumonkey

Noob question, aren't the trillions of cycle including 'refreshing' read/write that wouldn't be necessary with persistent memory ?

adrian_b

You can trivially exceed one million write cycles in only a second with a modern CPU, just by incrementing a shared counter (which cannot be cached).

adwn

> just by incrementing a shared counter (which cannot be cached)

That's not true, a shared counter (i.e., an atomic integer) is cached – in fact, there's no guarantee that its value is ever written back to system RAM.

You're probably thinking of non-cacheable memory: the kernel can set the MMU attributes of a memory page such that the CPU will avoid the cache when it accesses addresses in that page. This is completely independent of atomic accesses on memory locations [1].

[1] At least typically – there may well be CPUs which disallow atomic accesses on non-cacheable memory.

londons_explore

Wear levelling on RAM isn't in use today to my knowledge, but I don't think it is technically impossible.

You would probably go for some approach where most memory addresses are direct-mapped, and then the few that have been written most are redirected to new addresses.

The reading of the direct-mapped addresses would be super fast, since you can do the read in parallel with the lookup in the remapping table (just to check that this is a direct-mapped address). Reads of non-direct mapped addresses might take a couple of extra cycles, but that doesn't matter because they are very rare.

To do any of that, CPU memory controllers need to be able to handle per-request variable-latency RAM, which to my knowledge today they do not, although it would not be a big redesign to add.

ahoka

Wear leveling RAM would be trivial with any MMU from the last 40 years. You can just fault on the write and do your wear leveling in the fault handler. This is how virtual memory already works.

londons_explore

Indeed, and this is how "badram" on Linux works.

Tuna-Fish

No, you would keep a write counter for every (4kB) DRAM page, and have the OS move the virtual page to new physical one if the write count of a page grows much higher than the average.

vlovich123

That assumes you still have DRAM. Since this is faster and higher capacity than RAM, it’s potentially viable as a RAM replacement. In that case, you wouldn’t have anywhere to store the counters (but presumably in that case you wouldn’t need to either). I’m not sure you’d need to have a write counter when this replaced RAM though even if this didn’t have the same write endurance. For storage nodes, there’s no value in RAM outlasting storage. And this already has better write endurance than NAND. So on a storage node, you could easily imagine using this as RAM as the number of erases is going to be dominated by storage activity rather than ancillary memory writes managing the storage.

crote

That really depends on your use case, doesn't it?

Assuming a typical 5-year lifecycle, 10 million writes means 1 write every 15 seconds. That's more than enough for executable code, CDN content, or a database index. I can definitely see systems with 75% UltraRAM for read-heavy data and 25% traditional RAM for write-heavy pages acting basically as L4 cache.

benj111

Why does it need to directly compete with dram?

The current set up is based on separating volatile and non volatile memory and adding caches to paper over the slowness. Caches are getting bigger and bigger because of the huge speed disparity. I think you underestimate how much of a game changer this could be.

This is persistent and fast.

If this takes off, and it does only last 10s of millions of cycles, just use cache for fast changing things and ultraram for everything else.

If it lasts trillions of cycles, it potentially would completely change pc architecture. It was the 80s when we had ram/rom that could keep up with the processors of the day. This potentially gets you an instant on computer, no need for caches, no need for memory for the graphics card, no separate hard drives. Just one big simple bucket of bytes for everything.

dan-robertson

If the latency claims turn out to be true, it could still be worth it in various cases, eg with a bit of effort to reduce the number of writes you could get a big hashtable that you initialise once a day or so that gives really fast lookups.

JonChesterfield

Article claims a tenth the latency of dram at 100x lower power, but also says they're trying to fabricate at 20nm. Oh, and also persistent.

If they've done that, awesome. Make it, show that it works, licence how to make the thing to semiconductor companies and retire wealthy. Or maybe the university owns the IP.

hwillis

If they've done that, I think the concept of "turning off" a device goes away. You just unplug it, and the energy needed to dump the stuff in the pipeline to memory can be stored in a capacitor.

The OS can just always be loaded and ready to go; when power is restored it checks to see if the hardware has changed and just loads up the 64 MB of CPU cache. It could take just a few milliseconds. It takes on the order of a millisecond to charge the capacitors in a desktop PSU. "Restarting" becomes basically the same thing as reloading, and takes >100s of times longer than actually restarting the device. That's crazy to consider.

If boot time is 0, stuff will just unplug itself after its been idle for a few seconds. I'd expect the hardware in phones/laptops to become more distributed, with basic vital functions handled by a separate processor. Probably the screen gets taken over by a very simple processor that can only display the time, battery %, cell info (or the current screen buffer, for a laptop) and user input causes the main cpu to wake up in between frames.

stcredzero

If they've done that, I think the concept of "turning off" a device goes away.

...The OS can just always be loaded and ready to go; when power is restored it checks to see if the hardware has changed and just loads up the 64 MB of CPU cache.

The idea, called “Orthogonal Persistence” way back when, has been around quite awhile. Here’s my (probably spotty) idea of the history:

Researchers wanted instant-on for their early visions of tablets. To make sure security and networking would still work properly, there was an idea to use Capabilities (which were around since the 1960’s) to support this and solve the chicken and egg problems that were thought to arise.

Capabilities later became widely adopted just for better security, but Orthogonal Persistence never took off, because never rebooting would have required much higher levels of reliability, which would have been expensive to achieve. So today’s devices still reboot, but also have a fast "wake from sleep."

So I’m not sure if we will ever have true “Orthogonal Persistence.” We might have much slicker “wake from sleep” instead.

I'd expect the hardware in phones/laptops to become more distributed, with basic vital functions handled by a separate processor.

This is already the case!

sowbug

Will "have you tried unplugging it?" still be the ultimate tech-support solution if persistent RAM becomes commonplace?

Weryj

We thought that with Optane, I'm sad to see that discontinued because it was a much better target for swap files or databases.

benj111

Why do you even need a cpu cache?

1ns write operations suggest fast read too.

Tuna-Fish

No matter how fast the device itself is, addressing into a large pool will always be slower than a smaller pool. Both because of increased travel distance, and because every time you double the size of the pool, you add one additional mux on the path between the request and the memory.

This is why CPUs have multi-level caches, even though the transistors in L1 cache and L2 cache are typically the same -- the difference in access latency is not because L2 is made of slower memory, but because L1 is a very small pool very close to the CPU with the load/store units built into it, and L2 is a bit further away.

However, if main memory latency is suddenly a lot lower, it might change what is the most efficient cache level layout. The currently ubiquitous large L3 cache might go away. That would of course require very high bandwidth to the memory chips, because L3 does bandwidth amplification too.

hwillis

Should be stressed that speed is entirely theoretical: https://onlinelibrary.wiley.com/doi/epdf/10.1002/aelm.202101...

> In all of the above tests, the program and erase states were set using between 1 and 10 ms voltage pulses, two times longer than the switching times used in our recent report of ULTRARAM on GaAs substrates.[15] In both cases, the devices operate at a remarkably high speed for their large (20 μm) feature size. Assuming ideal capacitive scaling[33] down to state-of-the-art feature sizes, the switching performance would be faster than DRAM, although testing on smaller feature size devices is required to confirm this.

> Why do you even need a cpu cache?

Cell read time is entirely different from latency and throughput. This stuff still reads in rows like RAM and can't just be accessed freely like registers.

vlovich123

You’d probably still need an L1 cache. L2 and L3 might be superfluous or you could have massive L2/L3 caches made with this rather than traditional SRAM that sit internally within the CPU to avoid the memory bus. Contention for the memory bus could also be a reason to still have SRAM caches that are slower than main memory.

Shifts like this are so impactful it’s hard to predict exactly what good designs will look like until we’ve had 5-10 years hands on for the industry to shake out how the Hw topology will looks like (maybe more since HW dev cycles prevent fast iteration and testing of ideas)

tuatoru

1/100 power, 1/10 latency... in a through-hole chip carrier?? How do they get enough of it close enough to the CPU at those low powers and latencies, at DRAM clocks? Electricity travels 10cm in a tenth of a nanosecond, best case. And it uses quantum...

I'll leave it to the experts.

duskwuff

> 1/100 power, 1/10 latency... in a through-hole chip carrier??

That's a common package for testing ICs -- notice the array of dies inside and the haphazardly placed bonding wires. It isn't the final form factor.

Tuna-Fish

They are talking about the speed of the new type of memory cell, not of the physical implementation they have it in.

If this actually pans out, it will be worthwhile to stack a lot of it on the same package as the CPU. The reason memory is so far in current systems is mainly that having it closer wouldn't actually meaningfully help, because almost all the latency is reading data from the DRAM array anyway. If they suddenly get an economical new memory type that has an access latency of tenth of what DRAM does, they are going to figure out how to get it close enough that the signal travel will not be a meaningful part of the total latency.

tuatoru

Thank you for the explanation! TIL.

alias_neo

My understanding, from reading it, is that the through-hole carrier is for demos.

I doubt they've actually demonstrated the speed/power claims practically; that's what the new test kit, and potential fab partnerships are for.

bitwize

This sounds too good to be true. When Apple buys up all the production capacity for this and makes it available exclusively in Macs and iPads, we'll know it's viable. Till then, my optimism is tempered with caution.

caleb-allen

Apple won't do it until they're proven by integrating with existing manufacturers of some kind

dist-epoch

Apple is too smart for that. If they buy it, they will also sell it separately, just like Samsung manufactures phone screens for Apple.

amelius

Yeah, they will sell it with a special license so you can't use it in products that compete with iDevices and MacBooks.

sorenjan

Will this be used for Harvard architecture where programs are run straight from their storage instead of first being read into RAM? Maybe we can use data stored on this instead of having to stream it from storage to RAM?

Not to be confused with the AMD/Xilinx UltraRAM present in their FPGA fabric.

DoctorOetker

If it is a different technology, wikipedia needs to be corrected:

> The technology has been integrated into Xilinx's FPGAs and Zynq UltraScale+ family of multiprocessor system-on-chips (MPSoC).[7]

Referencing a paper https://www.eejournal.com/chalk_talks/2016033002-xilinx-ultr... FROM 2016 !

Ballas

It seems they(Xilinx/AMD) might also have applied for a trademark, so I don't know if the name as used here will stick...

https://trademarks.justia.com/972/53/ultraram-97253591.html

omneity

Optane Resurgence?

The tech looks super cool if it does get commercialized.

hwillis

Optane made loading the OS super fast, but the OS still has to fill up RAM. No matter what, loading up 2+ GB of ram will always take noticeable time. Even flat out, Optane takes >1 second to boot, and several seconds to restore a session.

Cost permitting, this stuff would replace RAM, not the drive. No more loading into ram; now the bottleneck is loading into cache and that will always be trivially fast just because cache is so small.

Even if its too expensive to replace RAM, if it can fit the minimum bits of an OS then I think cold boot time still goes to 10s of milliseconds. Might take a couple years, but interactivity doesnt need to wait on ram to be filled.

the8472

> No matter what, loading up 2+ GB of ram will always take noticeable time.

Barely so. NVMe sequential throughput is measured in gigabytes per second. So you can get this under 300ms. And you can optimize the order in which things are loaded so that the important ones arrive first, not all in-memory data is hot.

What makes booting take time are serial dependencies between boot stages, timers (boot prompts for humans, but also for hardware to power up), careful device enumeration and initialization and stuff like that.

hwillis

Yep. Keep it all hot. North/southbridge, controllers, everything. Skip POST. If something has changed we can just restart once it's warmed up. Anything that still needs a traditional boot (eg a disk drive) we just be assumed to have not changed until proven otherwise. Fuck the MBR. Fuck ROM and BIOS and CMOS. If you don't need to do it between RAM frames, you don't need to do it until you're told to reboot.

If you've been unpowered all day, or if your hardware has changed, or you're worried about security, then you can choose boot from scratch. The only other reason, IMO, is because the computer has just been put together. If all those parts can restore their previous configuration, all they have to do is signal "yep, I'm still in the same configuration" and we should be able to pick up where we left off (again, except for disk drives/ram/networking etc).

alecmg

definite deja vu from Optane

Capacity and price killed it, no word about these in the article

the8472

Similar claims have been made about MRAM, FeRAM and similar devices for many years, hailed as replacement and unification of both storage and DRAM. MRAM isn't completely vaporware, but it's not available at the prices or densities of DRAM.

So, will it scale down? Will it be cheap to manufacture?

lionkor

Ultra-Random Access Memory? ;)

This seems misnamed either way.

dist-epoch

Lower latency than RAM and more durable than NAND?

Where is the catch? Price? Throughput?

hwillis

Currently, performance is hypothetical. This and DRAM both work by charging up a little capacitor; this tech uses tunnelling so that the capacitor can be very highly isolated. That's why it doesn't discharge.

The smaller the capacitor, the faster it can charge/discharge. This tech has only been tested at sizes ~1000x large than the state of the art, and the speed advantage assumes it scales perfectly with the scaling laws. Reality is never that kind, but it might be mostly that kind.

It's still theoretical, though. There might be some manufacturing quirk that makes it not work as well at small sizes. Defects that don't matter now might be huge at that scale. If power requirements creep up, they may kill longevity, which may require them to sacrifice speed... everything has to go right, or it can become a balancing act.

Assuming everything goes great, it's still somewhat more complex than DRAM- more layers. It will certainly cost more than conventional RAM, but with ICs in particular it's very hard to know if that will be 10x more or .1% more.

aidenn0

It's going to be very expensive (lower densities than NAND and a somewhat exotic process for making it) and it hasn't been proven at geometries smaller than 20nm; it will only be faster than RAM if it continues to scale.

sfink

"in mice."

Or rather, the silicon "in mice" equivalent: in a test sample 1000x the scale, with only hopes and wishes that things won't change too much when they scale down.

All the cool mice these days are running around with memristor-based brain implants. This will be a huge upgrade for them. They'll be able to spend a small fraction of their usual daily time running in the hamster wheel, charging up their symbiote brains.

sipos

Not being made in a way that is usable in current systems, not having a commercial scale manufacturing process yet, and not being proven for long term use yet.

M95D

This kind of memory would make a cold boot attack a child's play.

Also, considering big tech corp. tendency to lockdown stuff, will we need a hack just to do a system reboot?

stcredzero

This kind of memory would make a cold boot attack a child's play.

People have been thinking about that for over 5 decades! This is a part of the history of Capabilities.

M95D

What's "Capabilities"? This term is too general to search for it.

stcredzero

https://en.wikipedia.org/wiki/Capability-based_security

Just google, “capabilities computer science"

Joel_Mckay

Rebrand the name, as currently it is misleading... However, the technology looks interesting for storage devices if it indeed exceeds SLC Flash specifications.

However, after the Violin Systems boondoggle one may find it significantly harder to find growth capital.

Good luck =)

peter_d_sherman

>"Moreover, the UltraRAM researchers asserted that the new memory tech is expected to be capable of 1ns write operations, which is about 10x faster than DRAM."

That'll be really nice if they can get it into production...

Daily Digest email

Get the top HN stories in your inbox every day.