MLOps and Drift: Reducing Risk and Ensuring Robust ML Models
In this article
As MLOps becomes more popular, a common question is where will have the biggest impact. Integrating MLOps for the AI Advantage: Top Trends in 2022 looked to answer this question by exploring emerging trends and their outcomes in the space. Drift detection was a major trend identified where an MLOps approach could offer significant benefit. This article further explores drift and how MLOps can mitigate its risk and frequency.
What is drift?
Before diving into the benefits MLOps can offer, it will be helpful to define drift. Drift, also called model drift, refers to model performance degrading over time due to a change in the relationship between the model's inputs and outputs.
Model input change is referred to as data drift, which happens when the data feeding into the model changes and expectedly impacts model performance. However, concept drift occurs when the model itself is at fault and is no longer the most optimal version. Here, the inputs stay the same, but the outputs do not provide the same business value. Data drift and concept drift are largely thought of as the two types of drift but understanding a few additional subsets can help narrow down the exact causes for drift.
A material change in input data can cause inaccuracies in any insight gained from a model. For example, let's say Yelp uses an ML model to categorize restaurant images efficiently. This model will process tens of millions of images, and while new restaurants are entering the fold and other businesses are closing shop, the total number of pictures being input into the model will remain mostly constant. Data drift can occur in Yelp's model when these input images start to bleed further and further away from the original model's intent. At the start of 2020, COVID greatly impacted the restaurant business, and customers being unable to eat at restaurants caused a drastic decrease in the number of input images to Yelp's model, making it much less reliable. This input change is an example of data drift affecting a model.
In another example, let's assume the Yelp model classified user-uploaded images into two categories: carryout and dine-in. The classification of "carryout" vs. "dine-in" restaurants changed when COVID hit, as restaurants had to switch to a contactless business model. This example of concept drift, where the definition of the output is changed, completely alters this model's viability.
Model drift can occur over time or instantaneously. The speed of drift is referred to as transition speed. Here are some of the different speeds:
- Gradual – Gradual drift occurs over time. For example, user preferences such as favorite movies or genres of music can all change over time.
- Sudden – Sudden drift happens instantly, like a new insurance policy applied to claim handlers or a replaced motion detection sensor with a new calibration.
- Incremental – Incremental drift happens in a sequence of small steps. For example, consumer behavior changed incrementally after COVID-19 lockdowns were lifted in an area—some were hesitant to return to normal. Each small behavior change is an intermediate concept between behavior during lockdown (noted as G), and "normal" behavior after restrictions are lifted (G'). Gradual drift is slow and steady over time, whereas incremental drift is more sudden and occurs in defined steps, or increments.
- Blip – Blip drift is like sudden drift, but the duration of the drift is short and temporary. Blip drift is often referenced as unpredictable exceptional events such as war or natural disasters.
The transition speed informs how long drift can go undetected but there should be no inference about which type is better or less consequential. Whether gradual or sudden, any drift can seriously impact your business and must be addressed.
The benefits of MLOps
After identifying what drift is, let's articulate the benefits of MLOps.
Exhibit 1: Core Principles of MLOps
MLOps introduces a set of practices and techniques to develop, deploy, and maintain ML models. These practices are necessary to continuously develop and refresh models in response to changing data and business needs.
Exhibit 1 details the different steps from Discovery through Production. Each stage gives visibility into the components of the model and provides insights. These steps allow thorough data analysis to inform resilience and blind spots. After completing these steps, your organization will be clear on data integrity and its relevance to your model.
In addition, the preproduction steps involve collaboration through multiple teams. Data scientists, data engineers, and ML engineers will work together throughout these steps. This close alignment does not innately exist in traditional ML life cycles. Often each stakeholder works separately and has no exposure to others' developments. MLOps reduces the unexpected roadblocks for data scientists by standardizing operations throughout a team, which reduces time spent on data engineering, pipelining, and rework and provides both cost-savings and process integrity. Additionally, after production, the model performance and data feeding into it are continuously monitored, which can provide actionable insights to reduce the risk of or mitigate the impact in the case of drift.
MLOps can provide higher data visibility and continual insights into model performance. Together in terms of drift, these benefits materialize in the following ways:
- Cost-savings: Undetected drift is expensive and disruptive to businesses. MLOps can reduce the frequency of drift and save money. However, even in the case of drift, early detection from MLOps monitoring can provide actional insights to resolve the issue quickly and cheaply.
- Operational standardization: MLOps enables organizations to standardize ML workflows and align stakeholders. This close collaboration will ensure the integrity of organization-wide ML life cycles. Introducing standardization will lower the risk of drift across the organization.
- Scalability: The threat of drift and its potentially catastrophic consequences can often hinder the scalability of ML. MLOps disciplines ensure the integrity of all ML workflows, providing a solid base to scale.
MLOps can simplify performance monitoring for model drift and perform refreshments as necessary because of its core principles (Exhibit 1): Continuous integration (CI), continuous delivery (CD), and continuous testing (CT). For instance, once the model-monitoring component detects a drift in the data, the information is forwarded to the scheduler, which triggers the automated ML workflow pipeline for retraining. A change in the adequacy of the deployed model can be detected using distribution comparisons to identify drift. Retraining is not only triggered automatically when a statistical threshold is reached; it can also be triggered when new feature data is available or scheduled periodically.
Another consideration is that with an MLOps approach, not all changes are classified as drift or require action. For example, our Yelp image model would see an uptick in pumpkin pies during autumn. This is a seasonal change, meaning the data will trend up and down at the same point every year, which is normal expected behavior. Continuously monitoring the incoming data would provide this seasonal context. Understanding why these changes occur can eliminate overaction from these cues while maintaining the model's validity.
Where to go from here
The past few years have been filled with extraordinary and unprecedented change. COVID-19 brought devastation to humanity and economic shutdowns overnight. The pandemic accelerated the trend of embracing virtual experiences and transformed how businesses serve customers' experiences. Now the lingering supply-chain problems are worsened by the war in Ukraine and global inflation.
MLOps delivers robust ML solutions and allows organizations to continuously improve and update their models to meet the evolving needs of their businesses. Businesses today must increasingly manage risk and assess their exposure to once unthinkable change. Now is the time to rethink your organization's ML strategy and embrace the benefits of MLOps.