Engineering

Event Driven Architecture with Kafka and RabbitMQ

Posted September 20, 2021
By Brandon Palmer

With the continuing journey towards microservice architectures, Event-Driven Architectures have become highly popular. Its flexibility is rooted in a structure centered around the concept of Events which gives a product the ability to organize disparate services asynchronously with fewer overall dependencies. The volume and variety of data processing make EDA a good candidate for many platforms. This post will discuss some of the pros and cons of using EDA high-level architectures, how EDA could work in an example application, and implementation considerations when using Kafka or RabbitMQ.

From monoliths to Bounded Contexts with REST APIs.

For many years, we built large services which all shared one central database. Before we had docker containers and cloud hosting, operating a large fleet of independent processes was very challenging. Because of this, frequently there would only be a single monolithic process deployed onto large farms of physical or virtual servers. Databases were centrally managed and run on expensive servers to most efficiently take advantage of the central database systems.

Over time, software and teams became larger and most everyone moved to an agile methodology of continuous integration and continuous deployment. With this shift, the monolith started to be broken into pieces (Bounded Contexts) which were owned and controlled by independent teams. This allowed for decoupled development and deployment.

This transition moved us away from building single large applications and towards building platforms of decoupled components, all trying to work together to solve business needs. These new components need to communicate with each other — REST APIs were one of the main ways this was accomplished. With this architecture, a bounded context was wholly responsible for its data, generally stored in a persistent data store backend like a database or KV store. One bounded context was not permitted to directly access the database of another bounded context, all-access needed to be done via exposed APIs. 

Although this design allowed the monolith to be broken up, it also added new challenges:

What is an Event-Driven Architecture and where can it help us?

Overcoming these challenges brought us to the Event-Driven Architecture. What is it?

I define it as an architecture having decoupled services that mainly communicate with each other asynchronously by publishing and consuming data from queues. 

When an event happens in a bounded context (e.g. a user changes their email or password, an order is placed, a customer adds something to a shopping cart, the backend count of an object in our inventory changes), those events are published to queues. Interested services within a bounded context (including that of the publisher) can subscribe to those queues and receive those events, to be acted upon as needed.

One of the biggest advantages of this architecture is we get is the ability to do decoupling: 

An Example Application

To help us understand some of the use cases, we are going to walk through a fake online shopping application from monolith to microservices with EDA.

The Original Monolith

In our sample application, we have one service with tables on a shared database. The three tables we are discussing:

Services and REST API

As the business expands and grows, we want to break out the single service into 3 independent bounded contexts which are designed, maintained, and operated by independent teams. This gives each team the ability to adjust their scale, performance needs and schema independently. The remainder of the product capabilities remains in the monolith. 

With this new design, we need some way to get information from the monolith to the reservation and audit services. 

One common way would be to provide a REST-based API interface for the new services which the monolith would then connect to and POST events when they occur. 

There are a few problems with this technique:

Event Driven Architectures with the Trilateral API

To get around these struggles, we move to a Trilateral API:

This architecture solves many of the design changes we saw earlier:

With the Trilateral API, we get the following updates to our design:

How to implement this with Kafka and Rabbit MQ

So this sounds good and something we want to do, how do we use queueing technologies to make it happen?

Two very popular queueing systems are Kafka and RabbitMQ. Both are powerful, mature, and well-maintained tools with broad support from industry and cloud providers. There are many others, with different design goals, which we will not be discussing in this post. This will be an oversimplification of both Kafka and RabbitMQ; they are both pretty complicated and have a ton of nuances. Before either of these should be used in production, there will be a ton of reading and testing that should be done depending on the business needs and engineering skills.

While I’ve used Kafka heavily in production, I have only done lab-based work with RabbitMQ. If there are any errors or omissions with my RabbitMQ descriptions, please contact me with corrections.

Publishing and Consuming

Both Kafka and RabbitMQ have producers and consumers. The data stored in the middle is called a Topic for Kafka and a Queue for RabbitMQ . In the below image, a producer has published (produced) 7 events to a queue (A through G). The consumer has consumed event A and is ready to consume event B. 

With Kafka, a producer writes to topics that are usually split into multiple partitions and then spread over a group of brokers. Consumers connect to the brokers as part of a consumer group persists a recorded offset.

With RabbitMQ, a producer writes to an exchange which then pushes the events into queues on a single host for specific consumers. Consumers reading from a queue are competing consumers.

For both systems, an offset (pointing to B in the previous image) is tracked within the clusters to know which events the consumer has completed processing. When the consumer fails or is restarted, it would resume from the same offset because the cluster knows where it left off.

One important difference between tools: Kafka is intended to keep (persist) a history of events  (topic retention) while RabbitMQ is designed to delete the event it once processed from a queue.  

Most of these queueing systems are intended to work with small events (a few KB) but I’ve used Kafka to persist 100Mb events (patient medical histories in this case).

Kafka Innards

When data is written to Kafka, it’s written to a topic. A topic is made up of one or more partitions that are hosted on a broker. Generally, there are a minimum of 3 brokers in a Kafka cluster (3 is the minimum to allow a quorum election for leaders). Kafka scales by adding more brokers to a cluster with more partitions for a topic and with a replication setting for a topic. For a topic with 3 partitions and a replication factor of 2, there will be a total of 6 logical units (physically folders full of segments on disk) which could be spread over 2 to 6 brokers (Kafka won’t allow the replica of a partition to exist on the same broker as the primary partition.)

The producer service generally does a round-robin write of data to partitions but data could also be explicitly written to specific partitions based on keys. In the example image above, 12 events were written to the topic with 5 ending up on partition 1 and 7 on partition 2.

A consuming service registers with the cluster and specifies a Consumer Group (CG) name. Information about the CG is stored on the cluster in another system topic and records the offsets consumed per partition per topic. In the example image, two CGs exist; CG A has consumed up through event A on partition 1 and through event B on partition 2. CG B has consumed up through event C on partition 1 but no events yet on partition 2. The events are persisted to disk with a partition in FIFO order but not across partitions. Consumers can only have guaranteed FIFO order within a single partition (or a topic with only a single partition). If a topic only has one partition, that means there can only be a single consumer — parallel processing of FIFO data can be a real challenge. Once events have been marked as consumed, they remain on the topic until the retention setting on a topic has passed (by default 7 days).

Within a CG, there are members (generally unique containers) which are each assigned to be the owner of partitions. A partition can not be owned by more than one container at a time. This means that if we had 3 containers accessing the example topic, one would be idle and not consuming any data.

RabbitMQ Innards

With RabbitMQ, a producer writes data to an exchange. The exchange is then responsible to filter data based on logic rules and write to queues. A set of consumer containers connect to a queue and remove events from the queues once completed. The consumers are competing consumers — whichever container completes the work first gets the next unit of work. 

It’s possible to set up a consumer so it doesn’t mark work as completed. This is generally the preferred method for events when the consumer just needs to know something happened vs treating the event as a unit of work that needs to be processed at most once.

RabbitMQ traditionally stores events in memory which can be a limiting factor. There are certainly ways around this. Setting up RabbitMQ for replication and high availability is an added task.

Some high-level considerations for using Kafka vs RabbitMQ

Possible Queue Usage Examples

There are main event types; Queue events and Pub/Sub events:

Real-Life Examples of Queue and Pub/Sub:

Live Use

Redox has been a heavy user of Kafka for several years now. Our primary use case is for queue events after our real-time processing has been done on patient records. We will queue the objects into one of several Kafka topics for downstream consumers for analysis and archiving needs. We run around a dozen AWS MSK clusters with many TBs worth of data being persisted every day.


Join us!

Redox is constantly hiring and we have a ton of open positions all across the company. Come check our open positions and help fix healthcare!