The Next Best Action for Banks - Chapter 2: Event Driven Architectures
An example
Event Driven Architecture enables companies to capture time series facts or "events" and react to them in ways that fulfill business needs. Once a sufficient threshold of the events that define how a business operates have been captured, building functionality that can react intelligently becomes intuitive.
Take, for instance, a customer facing personal finance application powered partially by two distinct backend systems: transaction processing, which collects and records a user's credits and debits across multiple accounts and saves them for the user to visualize, and a personalization service, which allows the user to define certain saving and spending goals. Let's assume both services have REST API's and their own backing databases like good little microservices.
Now, in a non-event-driven paradigm, the personalization service is extremely dependent on transaction processing because it needs to know when users' account balances change in order to check against certain thresholds to give feedback to the users. This usually manifests in some set of endpoints on top of the transaction processing service that the personalization service must then integrate with and maintain. This is completely fine for our simple system where only one other service is dependent on transaction data, but when a new third service is brought online to support peer-to-peer payments that are now dependent on both of these existing services... or a 4th or 5th service, it can get out of hand very quickly as the web of inter-dependencies grows and soon enough engineering teams are questioning why they ever went down the path toward microservices in the first place.
In an event driven architecture, as the transaction processing system ingests balance changes, it can write these changes to a separate event streaming platform as "BalanceChange" events
and any other service that cares can connect to that platform and receive those updates as they happen. This means that the personalization service, or any other service, doesn't even need to know that the transaction service exists, much less how to integrate with its API.
Kafka
In the world of event streaming, few technologies have become as ubiquitous as Apache Kafka. Being open source, hugely scalable and ecosystem agnostic, Kafka has been adopted by over 80% of fortune 100 companies. In Kafka, data is stored in "topics" in the order they are written and can be persisted for as long as they are deemed useful. Once written, any number of systems, processes or users can read this data back and utilize it however they see fit. This means events can be written once (BalanceChanged, UserClicked, TransferRequested) and downstream systems can react by taking specific actions or storing that data off into their own microservice database, data lake or transaction log as they please. The Kafka ecosystem is thriving with a myriad of purpose-built products that can be layered on top of its open-source core to assist in bringing true data platforms online.
Data pipeline
Onboarding data
Defining and capturing business critical events requires a strong understanding of the underlying data and the systems that create them. Unfortunately, much of the data making up these valuable events is held hostage by legacy databases that can only be accessed by systems integrating directly with them, essentially coupling themselves to them. Setting up Change Data Capture (CDC) on these existing systems is an excellent place to start down the path of building out an event driven system. Using tools like Kafka Connect, data can be fed into Kafka as it changes in the upstream database and with some curation and transformation can easily become those events that power the real time architecture.
Data streaming
Once data is flowing, either via CDC or applications writing directly to Kafka, new applications can be built with the underlying expectation that they will be fed by consistent, believable data in a timely fashion as users create it. Now we can harness the power of data in motion. There are tons of ways to interact with Kafka data directly to solve problems. Apache Flink and Spark Streaming and Samza are some common open-source platforms that integrate well with Kafka and can stream processing at a massive scale. Kafka Streams is the "Kafka Native" platform that is part of the Kafka project itself. With Kafka Streams you can easily build applications to do transformations, aggregations, joins across multiple Kafka Topics and more, all in real time, depending only on Kafka itself for backing infrastructure.
An example (continued)
Looking back at the personal finance app example, it could actually use Kafka Streams to calculate account balances in real time by listening to "BalanceChanged" events and even emit alerts onto a separate Kafka topic when balances reach certain thresholds, all with a very small amount of code. Alternatively, the app can implement some fraud detection heuristics based on that same set of "BalanceChanged" data.
In a slightly more complex example, let's assume the use of a Kafka topic called "user.profile" that contains users' account data. The desired action is to send users an email every time their account balances drop beneath a configured amount. Kafka Streams can be used to join the "user.profile" topic into the stream process that was calculating account balances in real time and then add the relevant user email address and any other relevant information needed into an output topic. The result is that the service responsible for sending these email alerts can listen to this output topic and immediately form and send that email alert without being dependent on any other parts of the overall system.
Next steps
Event driven architecture helps deliver on many of the core promises of microservices in ways that request-response driven architectures often fall short. Asynchronous processing of data in the event driven world has its own challenges around things like data consistency, but it is likely to cover these issues eventually. Ultimately, there are benefits of applications being able to model their responsibilities around data that closely resembles the real world, as it allows them to focus on accomplishing their given tasks efficiently since they, and those who maintain them, can spend less of time interpreting data at rest and more time reacting to real-time customer events. Once a system is operating such that it more directly models reality, there is much greater leverage to influence that reality to suit an organization's needs.