Consistency and Aggregates in Event Sourcing

Learn how we ensures data consistency in event sourcing with effective use of aggregates, enhancing system reliability and performance.

Black Friday is coming soon, so let’s talk about warehouse management and event sourcing.

When developing a system for event retrieval with aggregates, several very different concepts are possible. If you think of an aggregate as a transaction boundary, then each decision has its own implications.

The aggregate can also be a lifecycle boundary - events in a global uniform stream can often only be discarded by the aggregate stem.

In this sense, it is always very interesting when people come up with completely different solutions to the same problem. This is exactly what happened when Christian Folie and I were talking about an event-driven inventory problem.

Warehouse Management Domain

Let’s say we have a warehouse management solution that handles products, locations and sales, linking them together.

In this kata, we focus on locations. Each location is a place where products can be placed. A location can be a shelf, a table, a bin, a box, or one of many other variations.

For the purposes of this kata, we will assume that we are only concerned with box locations for now. These are the cartons that are placed on a picking cart:

Boxes are short-lived:

A customer order comes in.

The warehouse employee starts picking the items for the order. He takes an empty box or bin and creates an ID for it. At this point, we run "AddLocation."
Usually, warehouse workers have a batch picking cart, so they put away a dozen boxes in advance. This way, they can go through the warehouse only once and prepare a dozen orders in one go. So the system creates and prints labels for a dozen boxes at once.
When picking is complete, the boxes are transferred to quality assurance and then shipped. They disappear from the system. In rare cases, if something goes wrong, they can live on for a few more days until the problem is fixed.

API for managing boxes

Let's define an API for the system that can handle site creation in the proto3 specification (why proto3? see below):

message AddLocationsReq {
  repeated string name = 1;
  // location details contents go here
}

message AddLocationsResp {
  repeated uint64 id = 1;
}

Note, the repeated field which means that each message could have multiple location items. This allows to create a batch of locations at once. Warehouse management loves batching.

The service itself could look like this:

service InventoryService {
  rpc AddLocations(AddLocationsReq) returns (AddLocationsResp)
}

Given this design, we could have different implementations with different tradeoffs: both an ultimately consistent system that treats each site as a separate aggregate, and an immediately consistent system that treats the entire warehouse as a single large aggregate.

Let's ignore the implementation and focus on the API for now.

Consistency Semantics

Regardless of the implementation, this API can both support eventual and immediate consistency because of the behavior contract:

when executing a request, client will pass “idempotency-key” header - a unique uuid. In case of failure-retry. See Stripe documentation on idempotency.
If service returns status code 202 (same as HTTP accepted for processing) or in case of a transient failure, client should send the same request with the same idempotency key.

Eventually consistent implementation can then always return status code 202 at the first attempt and instruct the client to try again with the same idempotency key. The client keeps querying until the status is OK. With such results, the response data is also available (e.g., IDs for the newly created entities).

An immediate consistent implementation will always return OK on the first attempt. In case of a network problem, clients might still rely on the idempotency key to retrieve results.

Kata

How would you design a site part for an event-driven (because warehouse management loves audit logs and replication) warehouse management system.

We assume the following constraints:
We only deal with box locations.
Such sites are usually short-lived, existing for 1-2 days at most.
A single medium-sized warehouse can handle 10000 orders per day. So there are very many sites
A single site will probably have 10-30 events in its lifecycle.

How would you design such an API? What would the aggregates look like? What stack would you use?

Footnote: Why gRPC/proto3 spec?

Because it is ambiguous and could be used to generate code contracts in any common language. For example, one could implement service testers in Golang, someone else could implement a consistent server in F#, and someone else decides to implement a consistent flavor in Python. We could then plug these together, see and talk!

But that is not necessary. The logic remains the same whether the service is implemented in plain HTTP/JSON or something else. The only thing that would be lost here would be seamless interoperability between implementations written in different languages.

Blog 2/21/22

The Power of Event Sourcing

This is how we used Event Sourcing to maintain flexibility, handle changes, and ensure efficient error resolution in application development.

Blog 7/14/21

Building and Publishing Design Systems | Part 2

Learn how to build and publish design systems effectively. Discover best practices for creating reusable components and enhancing UI consistency.

Blog 7/14/23

Event Sourcing with Apache Kafka

For a long time, there was a consensus that Kafka and Event Sourcing are not compatible with each other. So it might look like there is no way of working with Event Sourcing. But there is if certain requirements are met.

Blog 1/29/20

Tracing IO in .NET Core

Learn how we leverage OpenTelemetry for efficient tracing of IO operations in .NET Core applications, enhancing performance and monitoring.

Blog 12/22/23

ADRs as a Tool to Build Empowered Teams

Learn how we use Architecture Decision Records (ADRs) to build empowered, autonomous teams, enhancing decision-making and collaboration.

Blog 7/21/20

Understanding F# applicatives and custom operators

In this post, Jonathan Channon, a newcomer to F#, discusses how he learnt about a slightly more advanced functional concept — Applicatives.

Blog 9/27/22

Creating solutions and projects in VS code

In this post we are going to create a new Solution containing an F# console project and a test project using the dotnet CLI in Visual Studio Code.

Blog 7/13/21

Composite UI with Design System and Micro Frontends

Discover how to create scalable composite UIs using design systems and micro-frontends. Enhance consistency and agility in your development process.

Blog

Techniques and pitfalls for ML training with small data sets

Discover techniques for training ML models with small datasets. Learn to avoid pitfalls like overfitting and explore methods to achieve reliable results without big data.

Blog 7/9/25

Open-sourcing 4 solutions from the Enterprise RAG Challenge

Our RAG competition is a friendly challenge different AI Assistants competed in answering questions based on the annual reports of public companies.

Schild als Symbol für innere und äußere Sicherheit

Branche

Internal and external security

Defense forces and police must protect citizens and the state from ever new threats. Modern IT & software solutions support them in this task.

Headerbild zur offenen und sicheren IT bei Versicherungen

Service

Open and secure IT

Just a few years ago, insurers were reluctant to move into the cloud or platform world. Concerns about security and governance often prevailed. The paradigm has changed.

Kompetenz

Sourcing Strategy, Spend Management & Compliance

The right service providers + Costs under control + Ensure vendor compliance ► Together we develop the right strategy

Übersicht

Events & Webinars

Atlassian & catworkx events, virtual, hybrid or on-site: We cordially invite you and share our knowledge and experience with you.

Branche 9/5/25

Digital Pole Position for Transport and Logistics

We create transparency, automate processes and ensure compliance – for IT that puts your logistics in the lead.

Blog 10/6/21

Designing and Running a Workshop series: An outline

Learn how to design and execute impactful workshops. Discover tips, strategies, and a step-by-step outline for a successful workshop series.

Blog 4/16/24

The Intersection of AI and Voice Manipulation

The advent of Artificial Intelligence (AI) in text-to-speech (TTS) technologies has revolutionized the way we interact with written content. Natural Readers, standing at the forefront of this innovation, offers a comprehensive suite of features designed to cater to a broad spectrum of needs, from personal leisure to educational support and commercial use. As we delve into the capabilities of Natural Readers, it's crucial to explore both the advantages it brings to the table and the ethical considerations surrounding voice manipulation in TTS technologies.

Blog 10/7/21

Designing and Running a Workshop series: The board

In this part, we discuss the basic design of the Miro board, which will aid in conducting the workshops.

Blog 5/5/23

How we discover and organise domains in an existing product

Software companies and consultants like to flex their Domain Driven Design (DDD) muscles by throwing around terms like Domain, Subdomain and Bounded Context. But what lies behind these buzzwords, and how these apply to customers' diverse environments and needs, are often not as clear. As it turns out it takes a collaborative effort between stakeholders and development team(s) over a longer period of time on a regular basis to get them right.

Blog 11/27/23

Part 4: Save Time and Analyze the Database File

ChatGPT-4 enables you to analyze database contents with just two simple steps (copy and paste), facilitating well-informed decision-making.