Explaining event-driven architectures

#MicroServices #Messaging #EventSourcing #DataLakes

Published on 27 July 2022 by Andrew Owen (4 minutes)

Modern software development is all about automation, continuous integration, continuous delivery and software-defined life cycles. The idea is to maintain quality while enabling features to be delivered as soon as they are production ready. You’re also probably familiar with the move away from monolithic systems to a microservices architecture. The goal there is to build the system out of components that can be swapped out without having to rebuild the whole thing.

But on the bleeding edge, there’s the event-driven architecture (EDA). Put simply, this is a design where the system as a whole exists as a state, and any changes to that state produce events. Instead of storing that state as a single datum, EDAs use event sourcing. Data is stored as a series of immutable events over time in an event store. Essentially, it’s a lot like a ledger. After they are created, events can’t be modified. Any changes to the data (state) happen as the result of a new event. There are several advantages to this approach:

The database is treated as an event log, simplifying and speeding up reads and writes.
Events provide a historical record that enables you to recreate the current state of the system from any point in the past.
Changes are published to the wider system, but individual parts of the system need only subscribe to events relevant to their function.

A wise woman once said: “The main problem with event-driven architectures, is that you have to repeatedly explain what they are to customers.”

Microservices

The building blocks of an EDA are its microservices. Typically, these are self-contained and don’t interact directly with each other. This is often implemented by having each microservice run in its own Docker container, managed by Kubernetes. Each microservice should have its own bounded context (for example, orders, cart, products and customers). When communication between microservices is required (for example, between orders, customers, and products), this is carried out by having each microservice subscribe to the events that directly affect it. Events are processed in the order that they occurred, and a checkpoint records the last event processed. This means:

The individual microservices are not dependent on each other.
If a microservice is stopped, when it is restarted it can resume from the last event processed.
New functionality can be deployed incrementally without dependencies on other microservices.
When new functionality is added, the checkpoint can be reset to reprocess the event history, for example, to provide new reporting insights.

Event relationships

Event relationships usually fall into two categories:

API-event relationships: there’s a one-to-one or one-to-many relationship between a public API call and an event or set of events.
Schema-event relationships: events created by internal API calls change data according to a particular schema.

Typically, in API-first systems, events relate to a specific API call. These calls may be generated by one or more components of the system, through user interaction through a web interface or by a microservice.

In my experience, you may well end up with customers who want to see the events log. That can include events that you never intended to be public, such as those relating to the inner-workings of the system, for example private APIs. These are unlikely to provide useful data points, and ideally you’ll want to find a way to scrub these before you pass the data to your customers.

Client applications

EDAs typically use RESTful APIs to provides methods for accessing data such as customer information. User interactions are managed through client applications. Each application has its own API. Typically, the API exists in the same network as the microservices it calls, to reduce latency and increase performance. The API also determines what should happen in the case that a specific microservice isn’t available.

A variety of secure authentication options can be integrated into the APIs:

OAuth
OpenID Connect (OIDC)
Social login (for example, Facebook, Google, Twitter)
Two-factor authentication
Web Services Federation

Scheduled reporting can be provided through event subscription, while dashboards can be supported through stream-based aggregations. Typically, this takes the form of Extract, Load, and Transform (ELT) processing. The original events are never modified, only the output.

Data lakes

The biggest problem I’ve found in explaining EDAs is when a business analyst asks for the database schema. EDAs are typically developed around data lakes, where data is stored in its raw format. The database on the backend is likely to be a NoSQL solution such as Mongo DB. I prefer to think of it as a data skip. You throw stuff in there, and it’s in there somewhere, but you don’t know exactly where. The beauty of the ledger is, you really don’t need to know.

Image: Original by kooikkari.