Need for Speed: Leverage Real-time Data on the Fly with Evolutionary Data Streaming Architectures
In this article
Digital transformation and the enterprise's increasing reliance on data analytics to make business decisions across nearly every aspect of the organization is driving unprecedented growth in data production.
In fact, the amount of data generated globally by 2025 is projected to reach 175 zettabytes, the equivalent to 175 trillion gigabytes. Predictably, the enterprise's role as a data steward continues to grow more prominent as well, according to research from IDC.
Collecting and analyzing that data is table stakes for organizations looking to remain relevant in an increasingly digitized and connected economy. Enterprises wanting to differentiate themselves are analyzing data while it's "on the move."
The faster data moves about an organization, the more value it can return. Business leaders aren't interested in stale data sets — they want real-time insights that are accurate, easy to digest and lead to concrete business outcomes.
This need for speed is leading to the broader adoption of data streaming — processing data based on continuous event streams instead of traditional commands.
This shift to an event-first approach decouples data-consuming applications from the data source, allowing for more flexibility and scalability moving forward. Event-based streaming forces an inversion of responsibility and represents a fundamental shift in how applications are developed. It centralizes an immutable stream of facts while decoupling from the software applications allowing them to act, adapt and change independent of one another.
Consider a small online clothing retailer.
Traditionally, the purchase process starts with a purchase request — a command — for an item. The processor then reserves the item, takes funds from the buyer, ships the item and notifies the buyer of the purchase all within the same transaction.
This old way of doing things is incredibly synchronous. It requires buyers to wait for every event in the purchasing chain to complete before receiving feedback that his or her item was purchased.
This method, detailed below in Figure 1, hinders scalability, limits the retailer from potentially adding new sub-services to the mix and will undoubtedly create headaches when new applications need to be integrated.
Figure 1: Traditional purchase processing architecture.
Recognizing the pain points of the traditional transactional process, the industry began processing transactions like this with an event bus — middleware that receives an action and sends messages to a set of sub-systems. Instead of the processor handling each purchase request, the processor sends a purchase event to the event bus, which then goes to each sub-system to coordinate the remaining chain of events.
This separates the business logic from the purchase processor, which lends itself to scalability should the retailer ever desire. But it's not a true decoupling because the business logic still effectively resides in the event bus.
While an event bus (detailed in Figure 2 below) provides more scalability than the traditional way of processing data, it still lacks flexibility and can cause headaches when it's time for app integration.
Figure 2: Event bus architecture.
"New" never seems to be good enough when it comes to IT. "Modern" is what you should be striving for.
A modern event-driven data streaming architecture allows for increased flexibility and superior scalability while enabling an evolutionary architecture that can be built upon over time.
In our online retailer scenario, a purchase event is sent to a topic that is read by the inventory system, which in turn emits an event the account system consumes to be notified of the reservation. Once funds from the purchaser are received, the account system reacts by creating a single event that each sub-system reacts to on its own.
This modern approach, detailed in Figure 3, allows for each sub-system to decide what do to and when, instead of being told what to do by the processor or event bus, thus truly decoupling the processor from the business logic. This is a key component to achieving scalability.
Figure 3: Data streaming architecture.
The decoupling is accomplished by starting with an event. In our scenario, the event is the item being purchased from the online retailer.
Consider this rudimentary explanation of the difference between traditional and modern approaches from Confluent, a key partner of World Wide Technology's (WWT) in the data streaming space who built Apache Kafka for LinkedIn in the early 2000s:
- Event-command analog (traditional) approach: I walk into a room, manually flip the light switch and the light turns on. This is a command.
- Event-first analog (modern) approach: I walk into a room, generate an "entered room" event and the light turns automatically on. This is a reaction to an event.
With a modern approach, if the online retailer wants to add or modify its sub-services, it can do so in isolation of the other sub-services through an existing API that's ready to be built upon.
Data streaming platforms allow for real-time data processing, act as the central nervous system for other services to be guided from, easily integrate with apps and systems, and enable scalability as companies grow.
A streaming platform, according to Confluent, establishes huge benefits that can help accelerate digital transformation efforts:
- Large and elastic scalability regarding nodes, volume and throughput — all on commodity hardware, in any public cloud environments or via hybrid deployments.
- Flexibility of architecture. Build small services, big services and even monoliths.
- Event-driven microservices. Asynchronously connected microservices model complex business flows and move data where it's needed.
- Openness without commitment to a unique technology or data format. The next new standard protocol programming language or framework is definitely coming. The central streaming platform is open, even if some sources or sinks use a proprietary data format or technology.
- Independent and decoupled business services, managed as products, have their own lifecycle regarding development, testing, deployment and monitoring. Loose coupling allows for independent speed of processing between different producers and consumers, on/offline modes and can handle backpressure.
- Multi-tenancy to ensure only the right user can create, write to and read from different data streams in a single cluster.
- Industrialized deployment using containers, DevOps, etc., deployed where needed, whether on premise, in the public cloud or in a hybrid environment.
Modern data streaming architectures have been adopted by many of the most disruptive companies in today's economy, such as Airbnb, Uber and Lyft. But it's not just for companies on the leading edge. Data streaming has practical use cases across a variety of markets — from education, banking and retail, to transportation and manufacturing.
Example use cases are detailed in Figure 4.
Figure 4: Potential use cases across industries of data streaming.
Before diving headfirst into a data streaming transformation effort, it would be wise to first get a good handle on where your organization is in terms of readiness.
Most companies still operating with legacy systems will struggle with what a robust modern architecture should look like or what tangible benefits it can provide.
WWT is adept in working with customers to identify gaps in existing architecture and illustrating how data streaming can benefit them moving forward.
We've found the best approach to data streaming starts by identifying a specific use case and understanding how an event-first architecture will work in that scenario. Thinking of the value up front and mapping out use cases can make a world of difference during implementation and help drive momentum for future adoption.
As companies grow and evolve, they can swiftly add new features within the architecture instead of doing it all at once thanks to data streaming's flexible nature. Companies become more data mature as more features and applications are built into their architecture, which enables innovation and better-informed on the fly decision making.
Big data technologies are complex, which can make them particularly difficult to deploy and manage.
"Add to that the exponential growth of data and the complexity and cost of scaling these solutions, and one can envision the organizational challenges and headaches," said Jessica Goepfert, a program vice president of customer insights and analysis at IDC.
WWT has a proven track record in developing, deploying and operating data streaming applications with some of the world's largest companies across a variety of industries. Given our experience with Confluent, we are one of the few partners capable of architecting, implementing, advising and executing enterprise-grade data streaming solutions.
Consider this demo, in which we deployed Confluent Kafka to Kubernetes to create streaming applications from the ground up.