RabbitMQ vs. Kafka – An Architect’s Dilemma (Part 1)

Daily Digest email

Get the top HN stories in your inbox every day.

bob1029

I've seen Tibco Rendezvous used in manufacturing. ~300 megabytes per hour of raw log generated 24/7/365 by tools and control systems in a factory setting. Probably on the order of 10k+ participants in the pub/sub network.

If you are running something like a factory where thousands of independent systems need to communicate in some way, this kind of tech starts to look like the only option.

If you are orchestrating the concerns of 5-10 services, I think you are making your life harder than it needs to be with ESB-style abstractions. Direct method invocation is much more reliable than whatever any one of these vendors could ever sell you. Put all your services into one exe. If you can't be bothered to use one language/repo, there are still ways to achieve this.

The real architect's dilemma is shoving one's edifice-constructing ego into a box long enough to produce a useful shack for the business.

kuchenbecker

Our prod cluster generates that about every minute at O(1M) qps.

We JUST turned on remote Logs because until now Kafka didn't have capacity.

inkyoto

TIBCO Rendezvous is tech from 1998/99, millions of messages per second didn't exist at the time. Only NYSE and NASDAQ were capable of producing millions of events back then (still not by minute nor by second).

TIBCO Rendezvous was one of the first successful large scale, low latency and near real-time pub/sub implementations, and it had a very efficient, Avro like, wire level serialisation format that made messages very compact and efficient to deliver. It was very popular in finance, banking and manufacturing, and is all but legacy now.

kuchenbecker

$$$ not capability. We have ~50 hosts that generate up to TB per day in just logs and 50k hosts that generate O(200mb/day). The large hosts ssh and grep works surprisingly well, but the smaller hosts is the real benefit.

Hard to justify a team of 7 burning several million in just Logging infra costs.

blindriver

My previous company's Kafka cluster was handing 20 million messages per second 5 years ago, and dozens of petabytes of data per day. Maybe your particular cluster that didn't have the capacity to handle 1M qps, but Kafka easily had that capacity years ago.

_a_a_a_

I have to ask, what value is this adding business-wise to store so much?

kuchenbecker

It was quota and hardware, not ability. This is a single service onboarding and they need the hardware.

And at that scale, we need to grep the logs so the downstream need the ability to process that volume, which it couldn't until about 2 years ago.

gunapologist99

NATS was developed by an ex-TIBCO engineer.

continuitylimit

It’s not apparent as a dilemma until the said architect has spent years -convinced- that the grand edifice is “good architecture”, and finally matured as a practitioner. Only after that phase passes is there an actual ego-driven dilemma, strictly speaking.

xupybd

I use NATS to create a secure pipe between buildings and the cloud. I don't need speed but I do need routing and I do need a security layer. Nats just works and took very little setup.

strangemonad

Or NATS

relay23

or kafka, or any other publish/subscribe system

fnxjdjc

[dead]

opentokix

Your usecase is EASY - its EASY!

This is a SOLVED usecase.

robertlagrant

> one is a message broker, and the other is a distributed streaming platform

I think this is an odd way of putting it. One is smart messaging; dumb clients. The other is dumb messaging; smart clients. It turns out the latter (i.e. Kafka) scales wonderfully so you can send more data, but you add complexity to your clients, who can't just now pluck messages off a queue to process, or have messages retry upon the first 3 failures, as they could with RabbitMQ.

Having said that, Kafka lets you keep all your data, so you don't have to worry about losing messages to unexpected interactions between RabbitMQ rules. But having said that, now you have to store all your data.

raducu

> who can't just now pluck messages off a queue to process

The problem is you cannot mark individual messages as read, for a given consumer&partition you can only update the offset for a partition.

If a certain message processing takes very long, all other messages in that partition will have to wait.

Also, with kafka, the max read concurrency is equal to the number of partitions, for something like rabbitMq it is much higher; but you do get nice message ordering for any given partition in kafka which you do not get in RabbitMq (afik); you are also get some really nice data locality with kafka because unless the consumers get the partitions re-assigned, all messages for the same key are served on the same physical consumer.

ryanjshaw

> The problem is you cannot mark individual messages as read, for a given consumer&partition you can only update the offset for a partition.

Hence "smart clients". If you MUST process every message at least once, you will anyway be tracking messages individually on the client (e.g. a DB or file system plus logic for idempotent message processing) and thus disable auto-offset commits back to the cluster for your consumer.

RabbitMQ says "let me track this for you", Kafka says "you already need to track this so why duplicate the data in the cluster and complicate the protocol".

If you don't have reliable persistent storage available and insist on using the Kafka cluster to track offsets, you can track processed offsets in memory and whenever your lowest processed offset moves forward, you have your consumer commit that offset manually as part of its message loop.

If your service restarts your downstream commands need to be idempotent of course because you will reconsume messages you may have previously processed, but this would be the case regardless of Kafka or RabbitMQ unless you're using distributed transactions (yuck).

> If a certain message processing takes very long, all other messages in that partition will have to wait.

You can stream messages into a buffer and process them in parallel, and commit the low watermark offset whenever it changes, as described above. I've implemented this in .NET with Channels and saturate the CPUs with no problem.

raducu

You've made very good points about smart clients, but at some point one has to ponder if it's worth it or one should just not use kafka in the first place.

I've seen databases used as messaging queues and if it was up to me, I'd never do that. It's usually "but we already have kafka + db, why burden ourselves with another messaging technology?", which is fair.

> You can stream messages into a buffer and process them in parallel, and commit the low watermark offset whenever it changes, as described above. I've implemented this in .NET with Channels and saturate the CPUs with no problem.

That is very nice -- certainly seems better than just batch processing of kafka messages, but you're still just kicking the can down the road. How large do allow the buffer to become and what do you do when it's getting too large?

You probably use a DLQ.

Don't get me wrong, I think the buffer idea probably works most of the time.

lmm

And there are libraries that will manage all this for you e.g. https://github.com/line/decaton

waynesonfire

If you have idempotent messages, why can't you use auto offset committing?

math

Worth noting that Kafka is getting queues: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...

facorreia

And also Rabbit has streams[1]. There's a lot of overlap.

[1] https://www.rabbitmq.com/streams.html

pnpnp

Just my 2c but for anyone unaware, you should check out NATS.

It combines the best of both Kafka and RabbitMQ IMO.

doctor_eval

I thought NATS didn't actually store messages, am I mistaken?

Looking at Wikipedia (https://en.wikipedia.org/wiki/NATS_Messaging) I see that I'm technically right, it's JetStream that does the storage layer - but it's part of the NATS server.

From memory, I really liked the philosophy of NATS but found the nomenclature confusing.

pnpnp

I think they call that part of NATS “Jetstream” if I’m not mistaken. I haven’t used it, but I believe it has some form of message persistence.

I have used it mostly for message-first services, and found subject-based messaging a breath of fresh air to decouple services. You can do the same thing with RabbitMQ topic exchanges, but it requires quite a bit more hand-waving.

cbsmith

You're getting to the key thing:

You don't want to classify them by what they do. You want to classify them by what the clients must do/experience.

continuitylimit

It is not odd, it is basically accurate. You are making a fetish of the S-C interaction but the essential matter is that Kafka is designed to store & distribute logs, whereas Rabbit is designed to route & send messages. The ‘store’ bit is very much a part of Kafka’s mission statement but not for Rabbit.

supermatt

> One is smart messaging; dumb clients. The other is dumb messaging; smart clients.

All the smartness of the messaging can be implemented in the smart clients. Then you can expose that as a smart messaging api to dumb clients.

The most obvious example is kafka streams which exposes a "simple" api rather than dealing directly with kafka, but obviously you could create a less featurefull wrapper than that.

eddythompson80

I can't help but think that this just gives you the worst of both worlds. You are now on the hook managing that non-standard "smart" wrapper which will quickly just become the status quo for the project. Anyone wanting to change how it works needs to understand exactly how "smart" you made it and all the side effects that will come with making a change there.

I pushed against knative in our company particularly for that reason. Like we wanna use kafka because [Insert kafka sales pitch], but we don't want our developers to utilize any of the kafka features. We're just gonna define the kafka client in some yaml format and have our clients handle an http request per message. It didn't make sense to me.

supermatt

Thats kind of like saying dont use any software libraries because they all use the standard lib indirectly so you may as well just use that?

Its just an abstraction layer to make things less effort.

robertlagrant

This would be my instinct too.

waynesonfire

And reimplement rabbitmq? Great idea. Let's do it in rust too.

undefined

[deleted]

debadyutirc

That's a neat way to put it!

- RabbitMQ: Smart messaging, dumb client - Kafka: Dumb messaging, smart client

Have you heard of Fluvio? Fluvio: Smart messaging, Smart Client, Stateful Streaming

Kafka + Flink in Rust + WASM Git Repo - https://github.com/infinyon/fluvio

NovemberWhiskey

>All the smartness of the messaging can be implemented in the smart clients.

How do you do, for example, a queue with priorities client side without it being insanity? That's a relatively basic AMQP thing. Or managing the number of redeliveries for a message that's being repeatedly rejected.

You can absolutely try to build some of this with a look-aside shared data store that all clients have to depend on in order to emulate having the capability in the broker, but you just introduced another common point of failure in addition to the messaging infrastructure. Life is too short for this.

supermatt

I totally agree that you cant do a lot of AMQP stuff. As you noted, you can build some of it by managing state via transactional producers, etc - but you definitely cant do everything. The biggest gripe for me is actually dynamic "queue" creation, patterns for topics, etc. So I use an MQ for an MQ ;)

I'm just saying you can "dumb down" the client side on kafka by creating an abstraction layer (or one of the many higher level libs that already do that).

robertlagrant

Those requirements would definitely be examples of those that are fulfilled by smart messaging.

debadyutirc

Every decision has a consequence. There are a lot more options depending on the use case.

onetimeuse92304

If you have a throughput problem you are most likely doing it wrong. If your knee jerk reaction is to scale your messaging up you may want to reconsider. Messaging systems are usually hard to scale up and always very costly to do so compared to the amount of data they are transferring.

The simplest thing you can do is to realise WHY you are using messages. Messages are there to trigger a process. Usually, you don't need a lot of data to trigger a process, the message just needs to let the system know enough to locate all necessary information.

Also, when you are sending information at an extremely high rate, there usually is no difference if each message is processed separately or in batches.

So what you can do in practice?

1) Get the producer to batch the messages. For example set rules like "batch up to 10.000 messages, up to 100ms, up to 100MB of data, whichever comes first". 2) Serialise the batch (for example, if it makes sense, create a compressed JSON file) 3) Upload the file to some high throughput, scalable, cheap storage (for example S3) 4) Send a message to the queue / topic / whatever else you are using with just enough to locate and process the message -- usually just the link to the S3 object.

This usually can be modified depending on specific project needs.

Now your messaging only ever sees a small number of very small messages and you will never have any scaling problems, at least not on messaging side.

usrusr

If your use case can stomach the added failure modes this implies, yeah, that's what you can do.

onetimeuse92304

Everything has some cons, some failure modes.

All engineering is about knowing, understanding and making tradeoffs.

In my practice, I am happy if I can get rid of hard problems (my messaging platform being unable to process X messages per second) and replace them with relatively easier problems (my persistence might sometimes fail and then I can't send a message).

I would argue that distributed persistence solutions are usually more reliable than messaging platforms and also what is a very large throughput for a messsaging solution is usually nothing much for monsters that are engineered to take much larger volumes of data. And so, in my experience, reducing load to messaging and increasing load to persistence is net positive for the overall reliability.

Beached

1) some designs can't tolerate the producer sending messages at such a delay. 3) s3 is not cheap storage, it is significantly higher cost than most on prem solutions when talking about large scale storage. (pedabytes scale)

onetimeuse92304

1) If you can't accept and process the the load of messages with your messaging the discussion of whether 100ms is or is not acceptable delay is very much pointless.

Messaging middleware is by design not suited well for architecting real time systems. If you require real time guarantees you would benefit from some other communication channel.

3) S3 is orders of magnitude cheaper compared to messaging platforms like RabbitMQ or Kafka. If you take load off of your RabitMQ or Kafka and put it on S3 you should see a significant reduction in cost.

S3 might be more expensive than other persistence solutions, true. Just choose whatever else you have. I used S3 as an example because it is extremely easy to implement and get going.

Again, all engineering is about tradeoffs. You compare two solutions, they will always have some cons. You just decide which cons you can live with. If your platform can't process messages at all and you don't know how to scale it up to do so that's pretty large problem in my book.

jpgvm

If you want features of RabbitMQ (specifically queue like behavior) but the scalability of Kafka then you probably want Apache Pulsar.

To elaborate on that a bit the main things Pulsar gives you are:

1. Still underlying distributed stream based architecture, this is what makes it able to do Kafka like things.

2. Broker side management of subscription state which allow out of order acknowledgement, this means you can use it like a queue. (Subscriptions sort of act like AMQP mailboxes but without the exchange routing semantics). Vs Kafka which can only do cumulative acknowledgement, i.e head of line blocking.

3. Separated "compute" and storage. By storing data in Bookkeeper you can scale your needs to support a lot of consumers separately from how you stash the data those consumers need to read vs Kafka where these 2 are coupled and an imbalance between the two becomes awkward.

4. In built offload with transparent pass-through read. When your data falls off the retention cliff for your standard broker cluster the data can be archived to object storage. The broker can transparently handle read request for these earlier messages though, just with higher startup latency to pull the archived ledgers.

5. Way more plugability than Kafka, in-fact similar plugability as RabbitMQ. You can implement your own authz/authn, a different listener to support a different protocol (there is a Kafka one, MQTT, AMQP etc).

6. Much greater metadata scalability. Before the new KRaft implementation the layout of metadata in ZK meant that you couldn't feasibly have more than about 10k topics. Especially because of how long the downtime would be on controller failover. Pulsar can easily support much larger numbers of topics which prevents needing to use a firehose design when you would prefer individual topics per tenant/customer/whatever.

mianos

While pulsar on paper seems a superior solution, in my experience it is very still very immature and very buggy. I really want to use it over kafka but I would not bet my business on it.

I am not a fan of Kafka, it's kinda old, and the code is a bit messy, a lot of the once only semantic problems 100% solved by Pulser are sorta kinda in Kfaka these days. All the newer stuff like built in RAFT makes it competitive with anything.

But, anyone who has used Kafka at scale has to say it 100% does what it says on the box. Many people are used to it's idiosyncrasies at scale and can get it to scale.

Now RabbitMQ. I have been burnt so badly by it breaking at scale I'll never be touching it again. Maybe my fault, maybe not, but replacing it with Kafka solved all my flakey issues and I never looked at it again.

jpgvm

I have run all 3 at big scale. Kafka is still great as long as everyone using it understands it's a stream, not a queue and using it like a queue is going to get them burnt. I don't touch RabbitMQ with a 30ft pole anymore, too many lost days or nights to split brains and other chaos.

Pulsar has mostly replaced Kafka for me because I don't need to worry about people coming along and changing requirements after the fact and saying, actually yeah we do need queue semantics. Or actually yeah we need data larger than the ~48hrs we want to store in Kafka and we don't want to teach our application to read from the archive.

Pulsar is definitely greener than Kafka in a lot of ways but the underlying stuff is very solid, BookKeeper in particular is tough so you aren't really at risk of data loss but you might run into bugs that make brokers do silly things and that can be annoying. Generally speaking though if you validate both the paths you are using and new releases it's been fairly OK to me.

The big thing was being able to directly connect external clients into Pulsar using the Websocket listener on the proxy and plugging in my own authz/authn logic. The eliminated a layer that would otherwise need to be implemented separately.

So far I have been happy with Pulsar, if you haven't tried it for a while you should give it another go. It will only get mature if people use it. :)

continuitylimit

Pulsar is a very interesting architectural case study. The cost of the greater clarity and flexibility is the greater management burden. The server-side functions are nice (and remind of JEE MBeans) but the direct challenge to Kafka is the decoupling of storage from servers via Bookkeeper (which adds the lower layer cluster management burden) which addresses the rebalancing headaches with Kafka type of solution (where the server and storage are unified).

ssd532

I am contemplating this exact topic for my project at this moment. It would be great if you can briefly explain what, as per your understanding, stream vs queue semantic are. I am studying it and got somewhat confusing discussions on the internet and in person forums.

mianos

Your comments seem to closely reflect mine. I'll have to take your advice and give it another go. It's everything all the others want to be. The architecture with distributed bookkeeper underneath seems so much more advanced.

protortyp

We also switched to Pulsar after running some benchmarks for our use cases. We use these services primarily as worker queues for image tasks that require low latency. And Pulsar turned out to have a 20x lower latency than Kafka in our setup.

tannhaeuser

I think this is an excellent article. The only thing I'd add is that RabbitMQ is an implementation of AQMP (optionally v1.0) as a standardized broker service protocol so is designed to be interchangeable with other extant implementations such as Apache Active MQ and Cupid whereas Kafka is one-of-a-kind software. Beyond that RabbitMQ has standardized client libs and frameworks in Java land if that matters to you - it did matter in the original context of message queue middlewares and SOA from where AMQP originated and where enterprise messaging sees major use. OTOH Kafka, with caveats, is in principle more "web scale" - though that is far from a free ride.

dividedbyzero

NATS (https://nats.io/) is another option, though I'm not sure if it's still considered a viable Kafka replacement.

klabb3

It’s true FOSS, and the server is standalone Go binary that’s so small it can even be embedded. Lots of language bindings for clients. Has persistence, durability, and nicely aligns into a raft-like cluster in a DC without a separate orchestrator.

I’m a big fan – never understood why it’s not at the top of the list in these tech reviews.

nhumrich

Rabbitmq is FOSS, has lots of language bindings. It has persistence, durability, and doesn't require a separate orchestrator.

klabb3

I was mostly comparing against Kafka but yes I should def take a look at RabbitMQ again. I remember there was some reason it wasn’t a good fit for me but can’t recall what it was.

Are the horizontal scaling issues solved now?

fuzzy2

NATS is something else, but it's awesome. It has awesome throughput and latency out of the box (without Jetstream), while using little resources.

I'd recommend considering it, especially as an alternative to RabbitMQ.

speedgoose

I only tested NATS using JetStream and I struggled with the throughput in Python. I probably used it wrong. But your comment may imply that jetstream is slow.

to11mtm

I think sometimes the client bindings are/were in need of improvement.

As an example, the C# API was originally very 'go-like' and written to .NET Framework, didn't take advantage of a lot of newer features... to the point a 3rd party client was able to get somewhere between 3-4x the throughput. This is now finally being rectified with a new C# client, however it wouldn't surprise me if other languages have similar pains.

I haven't tested JetStream but my general understanding is that you do have to be mindful of the different options for publishing; especially in the case of JetStream, a synchronous publish call can be relatively time consuming; it's better to async publish and (ab)use the future for flow control as needed.

fuzzy2

I didn’t mean to imply that Jetstream is slow. It’s just that I did my benchmarks without it. On a local PC, with 10 KiB messages sent (synchronously) in a loop, I could transfer 3.2 GiB over 5 seconds with 0.2 nanoseconds latency. Performing the same test with RabbitMQ, I got even better throughput out of the box, but way worse latency.

Those numbers are for server 2.9.6 and .NET client 1.0.8.

KaiserPro

we used it for some low latency stuff in python. it was about 10ms to enqueue at worst. However we were using raw NATS, and had a clear SLA that meant that the queue was allowed to be offline or messages lost, so long as we notified the right services.

pgorczak

I’m not familiar with the Python lib but it could be waiting for streams to acknowledge each message reception/persistence before sending the next one. Some clients allow transactions to run in parallel e.g. with futures.

KaiserPro

If I'm not buying a message bus in as a service, then NATS is great for pub/sub and or message passing system

it is simple to configure, has good documentation, and excellent integration into most languages. It guarantees uptime, and thats about it. It clusters really well, so you can swap out instances, or scale in/out as you need.

vorpalhex

Different set of promises. NATS is great but has a different tradeoff bargain from Rabbit or Kafka.

kumarvvr

Could you expand on this a bit more? I am curious.

vorpalhex

NATs has a decent-ish guide here: https://docs.nats.io/nats-concepts/overview/compare-nats

A few things they get wrong mostly about Rabbit:

+ RabbitMQ does support replay, and also has a memory only mode which will support persistance in a cluster

+ RabbitMQ doesn't have that sensitive of a latency between cluster members (no more sensitive than NATS in some setups).

+ RabbitMQ also supports Prometheus

A good (but incomplete) rule of thumb:

+ Kafka is a distributed Append-Only-Log that looks like a message bus. It allows time travel but has very simple server semantics.

+ RabbitMQ is a true message broker with all the abilities and patterns therein. It comes with more complexity.

+ NATs is primarily a straight forward streaming messaging platform.

Also consider Redis' message queue mode, zeromq, mqtt brokers (Eclipse Mosquitto) and the option of just not using a message broker/queue system. Even as someone who really likes the pubsub pattern, there's a good chance you don't need it and you may be heading to a distributed monolith antipattern.

rcombatwombat

Also see https://www.youtube.com/watch?v=C4BnJ5QLeTY

whalesalad

One is a tomato, the other is an orange. From a distance they might look alike but they really are two completely different tools. This is a pretty solid explanation of the differences with good illustrations.

Rabbit can do everything Kafka does - and much more - in a more configurable manner. Kafka is highly optimized for essentially one use case and does that well. Nothing in life is free, there are trade-offs everywhere. I am not privy to which one is theoretically faster - but once you reach that question methinks the particular workload is the deciding factor.

KaiserPro

Rabbit is an arse to scale past one broker. It was possible, but a pain, that might have changed.

Kafka is just a pain full stop.

icedchai

At a previous company, about 10 years ago, we had roughly 10 RabbitMQ instances (brokers), all isolated. The system was essentially partitioned by queue server. We had a directory-ish service that would associate clients with their assigned node. It worked well, except if a client got too large we might have to move them to another queue.

nhumrich

The official rabbitmq controller for kubernetes is a breeze. Scales wonderfully without almost any effort.

relay23

>> Rabbit can do everything Kafka does - and much more - in a more configurable manner. Sure, if you're doing like 10's of MB/s. RMQ is fast compared to AK if you're not adding durability, persistence, etc. Try to run gigabytes per second through it though, or stretch across regions, or meet RTO when the broker gets overloaded and crashes.. Get your shovel ready! ;)

Kafka itself is dumb but scalable and resilient, it's the client ecosystem that's massive compared to RabbitMQ. Count 10 stream processing, connectivity, ingestion or log harvesting platforms that use RMQ as it's backend, then name 10 languages that have supported libraries for RMQ.. then compare that to Kafka.

undefined

[deleted]

JohnMakin

I am not an expert in either and have only worked with Kafka. At a past job I had to write a connector job to parse and sanitize some extremely dirty, unstructured data and pass it along somewhere else. RabbitMQ supports this? What is the one use case of kafka? I think you have it backwards.

whalesalad

> parse and sanitize some extremely dirty, unstructured data and pass it along somewhere else

can you be more specific? that to me sounds like hello world for either of these tools. "santize data" is an application level concern that neither rabbit or kafka would handle. as far as "pass along somewhere else" again both tools can do.

JohnMakin

It was a Sink Connector. I don’t know what it was or wasn’t supposed to do but I was asked to do it, as is often the case in tech. I could have done any number of transformations in that process though, which I’m not sure rabbitmq supports

rafaelturk

Nice post! RabbitMq is battle tested, exceptionally fast and low resources app. Capable of handling millions of transactions/second. RabbitMQ will handle vast majority of usecases. I'm puzzled why often startups, or even banks use Kafka, soley because is hype. Kafka on the order hand requires massive CPUs, Memory, often requiring its own K8S cluster just to be alive.

ceencee

Pretty much every bank uses kafka ad the central messaging layer. What people are missing in almost every post here is the write once read many without data duplication and with different offsets is the killer app for Kafka beyond just the near infinite scale which is also super appealing. The failure modes are way way better than Rabbit as well. Note: I owned the streaming platform for a top 5 bank in the us.

leetbulb

Yeah, I'm sorry to others, but if you require the guarantees and compliance that Kafka provides, Kafka wins, especially at this kind of scale. I'd love to see RabbitMQ scaled out to handle hundreds of trillions of events per day and able to retain years worth of highly durable, immutable, and replayable event storage.

Ultimately, this comparison is apples vs oranges...

dalyons

Rabbitmq sucks to scale. clustering and partitioning were terrible for a long time, maybe still is. Clusters dying in split brained ways, nodes crashing terribly / unrecoverable if they exceed iops or storage limits. You couldn’t pay me enough to run a high volume rMQ cluster again.

Never mind that the persistent durable log pattern of kakfa enables a lot replay type use cases that are very beneficial in financial systems specifically.

It’s not solely because of hype at all, it’s objectively better for many use cases.

rafaelturk

If your have a clean event-driven architecture, ie messages are completely agnostic and decoupled from one-another you don't need Kafka.

inkyoto

Event-driven architecture is an architectural principle, and Kafka, RabbitMQ/ActiveMQ, Pulsar, NATS and so forth are implementations that support the event-driven architectural principle. Yet, all of them range in a variety, complexity and extent of features they provide which may or may not be a good fit for a particular use case.

Traditional message brokers (RabbitMQ and similar) do support the event-driven architecture, yet the data they handle is ephemeral. Once a message has been processed, it is gone forever. Connecting a new raw data source is not an established practice and requires a technical «adapter» of sorts to be built. High concurrency levels is problematic for scenarios where the strict message processing ordering is required: the traditional message brokers do not handle it well in highly parallel scenarios out of the box.

Kafka and similar also support event-driven architectures, yet they allow the data to be processed multiple times – by existing (i.e. a data replay) and, most importantly, new or unknown at the time consumers (note: this is distinct from the data replay!). This is allows to plug existing data source(s) into a data streaming platform (Kafka) and incrementally add new data consumers and processors over the time with the datasets being available intact. This is an important distinction. Kafka and similar also improve on the strict processing order guarantee by allowing a message source (a Kafka topic) to be explicitly partitioned out and guaranteeing that the message order will be retained and enforced for a consumer group receiving messages from that partition.

To recap, traditional message brokers are a good fit for handling the ephemeral data, and data streaming platforms are a good fit for connecting data sources and allowing the data to be ingested multiple times. Both implement and support event-driven architectures in a variety of scenarios.

rcombatwombat

NATS with JetStream provides _both_ queuing like a traditional message broker and multiple data replay from offset (plus KV, and request/reply)

ceencee

This is a ridiculous statement if you really build an EDA. Kafka is what enables the decoupling.

purpleblue

If someone is asking if they should decide between RabbitMQ vs Kafka, they should 100% use RabbitMQ. It means they have no idea what they're dealing with, the architectural differences, and the investment that the company needs in order to use Kafka.

So use RabbitMQ.

patrec

How do you create anything with RabbitMQ that a) has performance characteristics under load you can reason about and b) can handle individual node or networking failures without data loss?

Kafka is overkill in most scenarios and you should probably just see if postgres isn't enough for your needs first (especially since you will almost certainly already need a database anyway). Kafka is more pain to setup and run than it ought to be. But underlying it is a useful and sensible abstraction for building robust distributed systems.

dventimi

> Apache Kafka is not an implementation of a message broker. Instead, it is a distributed streaming platform. Unlike RabbitMQ, which is based on queues and exchanges, Kafka’s storage layer is implemented using a partitioned transaction log. Kafka also...

This seems like an important passage, drawing the crucial and long-awaited distinction between RabbitMQ and Kafka, and yet without having defined a "partitioned transaction log" the author strands the reader without any help in absorbing the distinction.

estiller

Hi, this is the article author here. Thanks for the feedback! I've written this article 3.5 years ago, and it could definitely deserve a shake-up. I agree this should be cleared up a bit.

zerbinxx

This is sadly commonplace in tech blogs: rather than taking a great opportunity to hook the reader, the writer will drop a vocab term in bold and move on.

FridgeSeal

I’m personally a fan of Kafka. I think the design of persisting the messages, and tracking offsets for progress instead of message acknowledgments is a much cleaner and more versatile design.

You can get all the same advantages of message acknowledgments, but now you can also replay queues, let different applications use the messages (handy for cross cutting event/notification systems) and you get better scaling properties-which doesn’t hurt at the small scale, and provides further scaling when you need it.

whalesalad

> You can get all the same advantages of message acknowledgments, but now you can also replay queues

with rmq you can reject/nack a message and have it put back on the queue. rmq is not well suited for long term historical retention inside queues a-la kafka's logs but it is possible to do.

> let different applications use the messages (handy for cross cutting event/notification systems)

rmq also does a publish once and fanout to multiple queues to support this. data is replicated so that could be a deal breaker, but it is possible.

how often have you had to diagnose a stuck consumer or some other kind of offset glitch where a consumer is unable to resume where it left off?

not knocking kafka here but I do think it is a tool you should reach for when you need to solve a very hyper focused problem, while rabbit is a tool more suited to most cases where queuing is required. kafka is a code smell in a lot of organizations from my experience - most do not need it.

FridgeSeal

> with rmq you can reject/nack a message and have it put back on the queue

I know other systems have semi-similar mechanisms, however most of them retain the “someone is the sole owner of this message” style design, which I think is fundamentally limiting. Owning application dies, is it acked or not? Acks but never gets around to putting it back on the queue? Who takes priority if 2 separate applications wish to watch the same stream of events?

I think Kafka’s “nobody owns it, acks are consumer group level” give you the same advantages for the application itself, without a number of the more difficult complications.

> rmq also does a publish once and fanout to multiple queues to support this

Which is probably fine for small volume or velocity topics, but is going to cause all sorts of load issues at higher scale.

whalesalad

> Who takes priority if 2 separate applications wish to watch the same stream of events?

each app would get its own queue, the messages would hit a fanout exchange that would route the same message to both queues.

raducu

> afka is a code smell in a lot of organizations from my experience - most do not need it.

Kafka is really nice if you don't care that much about latency during peak load and you don't have absurd processing times for messages.

ceencee

Kafka can sustain sub 20ms at millions or even billions per second scale. Processing time delays is bad consumer code and partition design smell. Aka , your consumer shouldnt depend on a slower resource within an ordering domain. This can also be mitigated with an async consumer

FridgeSeal

These sound like consumer issues to me.

Kafka had been extremely reliable with latency, even under load in my experience.

If you’ve got badly lagging consumers that are trying to read from very old points in the topic while everyone else is at the head, you’ll definitely see some increased resource usage, but again, that’s mostly a consumer issue, and I’ve need seen performance degrades that much.

monksy

If you're concerned about latency you might want to consider zeromq. Stream processing doesn't really have a time expectation to it.

KaiserPro

> now you can also replay queues

yeahnah, that leads to people treating queues like databases (I'm looking at you new york times, you know what you did wrong)

its either a queue, or a pubsub, either way its ephemeral. Once its gone, it should stay gone. thats what database, object stores or filesystems are for.

Kafka is a beast, has lots of bells and whistles and grinds to a halt when you look at it funny. Yes, it can scale, but also it can just sulk.

rabbit has it's own set of problems, and frankly it's probably not choose either anymore.

serallak

What would you choose today ?

KaiserPro

It depends on the context.

Currently I'm using DDS, specifically from eprosma. I would avoid that implementation unless you're using java.

I really like NATS. However I would probably use what every is bundled with the cloud system I'm using, unless its super critical.

MQTT is quite nice for things, as is rabbit.

officialchicken

> (I'm looking at you new york times, you know what you did wrong)

You're going to have to be a tiny bit more specific here. NYT is THE factory of wrongness for sure. In every dimension. Are we talking "yellow cake" wrong, or somewhere else on the severity of f'up scale...

KaiserPro

https://www.confluent.io/blog/publishing-apache-kafka-new-yo...

^ this.

All they needed was a database, or possibly a DB that supports row signing. I mean actually they could have done it with git. They don't publish that many stories an hour.

Everything about this setup is just plain wrong, and to then boast about it, absolute madness.

joking

They wrote a post on how they disabled the deletion and compaction of the data in Kafka and used it as the source of truth.

raducu

> You can get all the same advantages of message acknowledgments.

Maybe 95% of cases, but not all.

Long message processing time really kills kafka in a way it doesn't kill Rabbit Mq. Combine it with inherent read paralelism being limited to the number of partitions. Add in high variability of message rates and bingo, that's like 90% of the issues I've had with kafka over the years.

bicijay

Message ordering is an illusion. Unless you track/store the messages on the client and are willing to deal with stuck queues due to failures in one "poisoned" message.

klabb3

There are different kinds of order. Yes, there’s no total order in a distributed system, but you can have certain partial order guarantees. It’s nice if something is added before it’s updated, for instance.

bicijay

Could you expand on that? How would you achieve "partial order" guarantees?

klabb3

One type of partial order would be that a producer puts all the messages that are related in the same queue, so that A always precedes B. Basically the invariant becomes:

If a consumer sees an event B, it will have certainly have seen the event A before that.

Assuming business logic is correctly written, that saves you from having to write certain retry logic on the B handlers. This requires the message queue to be always available. If it goes down, the system would not make progress (like a db - in fact the MQ is a db).

Once you add more actors/nodes to the same related events, maintaining a “causal order” can be very tricky and subtle, especially if you have an MQ and a DB as multiple sources of truth. So I’m not exactly endorsing it, even though MQ-as-a-DB (aka event sourcing) is a very interesting idea.

CogitoCogito

Let's say you have events coming in that result in inserts, updates, and deletes on a table with a certain primary key. Assuming no dependencies external to this table, you only need events involving a specific key to be ordered. I.e. it doesn't really matter if row_a gets updated before or after row_b. In either case, you end up with the same thing. So if you do something like kafka partitions and you send events to certain partitions based on their primary key, then those partitions will be ordered which will be enough.

That doesn't fix your example of dealing with individual errors, but in many cases that's enough.

rafaelturk

Couldn't agree more, messages should be completely agnostic from one-another. If you have a decent event-driven architecture, you don't need kafka. and you can be happy with Redis or RabbitMQ

Daily Digest email

Get the top HN stories in your inbox every day.