System Design: Synchronizing large sums of Shared Data from Centralized Server with Multi-tenant Solution with separate SQL Server Databases (and maybe other Services with NoSQL).

Event sourcing for data synchronization

Published on Wednesday, October 28, 2020

General Problem

Synchronize data between a server and multiple clients.

  • Clients can be offline.
  • Some of the clients must be updated on-demand only and only up to a certain point.
  • Data can be updated only on a server.

Synchronizing with Event Sourcing

Event Sourcing can be used for data synchronizing between servers and multiple clients (and between clients as well). Because you are building the current state (or state up to a point), you only have to exchange events between server and clients (or between clients themselves). Events must have an order number (just an int), and every next event should have an order number higher by one because each event should know which event comes before and after.

The ultimate source of truth is the server** because it is the only place you can update the data. However, clients can exchange missing events (data) between themselves because events are immutable and have an order.

Any client can send data to other clients if it has more events than that client (e.g., Client A's last event is number 5, Client B's previous event is number 8 -> Client B can send 6, 7, 8 events to A). The server is the master and always has the latest event, so it can continually update any client. However, sometimes it is cheaper (faster, easier) to have an option of clients updating each other, so it is necessary to have an order between events (like a timestamp) and an actual event number.

Specific Problem

Synchronizing large sums of Shared Data from Centralized Server with Multi-tenant Solution with separate SQL Server Databases (and maybe other Services with NoSQL).

  • New data can only be added to a Centralized server.
  • Clients should all have the same version, but only up to a point.
  • Test machines should have newer data than clients, but not necessarily the latest version.
  • Clients and Test machines should be updated on demand.
  • Clients share the same App but not DataBases.
  • The app has (rest) API for internal communication.

The core idea is to have a shared code on each app/server that stores events to the event store in a single SQL transaction (each database has its event store) and creates/updates/deletes data in the selected table. The event can be executed only if the previous event (by order number) is already in the event store (and data has been updated with new values).

The only centralized server can create events. It is essential to standardize how the event will look and how the data will be stored. Suppose we want to make this approach universal - to make it possible to update any single table or even multiple tables with a single event. In that case, we could use some unstructured data format like JSON.

Event Data - Create Location and Person in a single transaction (optimized for relation tables)
      "name":"Land of Oz"

The only thing left is to think about how Servers and clients will communicate. The simplest solution is to call the server from each client and get the missing events (up to a selected point). But, this depends on the architecture of the system. Many other approaches are also viable:

  • Events can be pushed to any client from the centralized server.
  • We could use some creative shenanigans and make clients copy from themselves after one of them receives the newest data from the server.
  • It would be possible to expand the solution with Message Queues.
  • ...

NoSQL solution is very similar, but we cannot use the same shared code (that calls SQL Server) to store data.

Fun things to think about

  • How would the solution be different if any client could update data?
  • What if you only need to update the client up to a certain point but have to "hotfix" only some of the updates (skip several updates)?