Designing the People, Process and Technology to Scale MLOps at a Financial Services Company

Situation

At WWT, we emphasize that MLOps is a transformative undertaking involving people, process, and technology. We highlighted this approach in Top Considerations that Impact Decision-Making and many other articles geared toward helping companies begin their MLOps journey. Our unique approach was ideal for a Financial Services Company looking to unlock its MLOps potential.

Our client is a highly competitive financial services firm that provides personalized investment advice, financial planning, and other related services to individual investors. As a multi-billion-dollar business, our client is dependent on accurate insights and predictions that stem from large sets of data. In such a competitive and rapidly evolving market, the client expressed a need for MLOps. Improving the observability, scalability, and reliability of the company's ML infrastructure was needed to manage larger volumes of data and serve a broader group of clients.

After a successful initial engagement that served as a lighthouse example for ML productionization, the client was ready to take the next step to scale their MLOps capabilities and adoption. We engaged with our customers to deliver a clear pathway forward for each of the three MLOps pillars: people, process, and technology.

Solution

As a trusted partner to our client, we defined a pathway forward by evaluating and identifying various roles & responsibilities, scalable frameworks, and technology solutions aligned with the client's mission. The delivery team leveraged the MLOps model depicted below to follow through on these various items of client enhancement.

We led this MLOps project by breaking down workstreams into the people, process, and technology methodology. This helped drive solutions within the organization and is segmented by this methodology below:

People

Creating roles & responsibilities for an end-to-end ML process

High-Level Roll Up
- Each phase of the ML process is broken down into phases & sub-phases
RACI Document
- Responsibilities are assigned to each sub-phase task

Through both internal operations and client engagements, we've observed that investing time upfront to define clear roles and responsibilities yields significant efficiency and cost savings. Therefore, as part of the "People" workstream aimed at streamlining ML solution development and accelerating MLOps adoption, we collaborated with our client to develop a detailed breakdown of roles and responsibilities associated with the end-to-end ML process.

We began by organizing ML solution development into phases, sub-phases, and tasks, ensuring the necessary granularity level was reached. This approach allowed for the assignment of roles to each task, with supporting documentation that describes and highlights all responsible, accountable, consulted, and informed resources. We then went a step further to create a RACI document that detailed responsibilities by role.

Taking the time to carefully define roles and responsibilities allowed us to establish a clear approach for the entire end-to-end ML process.

Process

Developing documentation on how to operationalize a model through a scalable framework

MLOps Playbook
Leveraged MLOps framework and six use case deep dive designs
Updated & maintained playbook with new client specific tools and technologies

Our client needed to streamline its path to ML deployment to accommodate the company's desired capabilities and future growth. We implemented a solution aimed at unifying the company's approach to MLOps while simultaneously accelerating its adoption. The centerpiece of this solution was a comprehensive playbook created during the "Process" workstream of the project.

The MLOps playbook overlays the full suite of ML technologies currently used by the firm onto a modular guide for ML development, deployment, maintenance, and governance. Its purpose is to provide scenario-specific instructions for the implementation of key MLOps components that encompass all existing tools and use case requirements.

By creating an MLOps playbook, we provided a critical stepping stone for our client. It serves as a foundation for integrating new tools and technologies in the future and enables them to stay ahead of the curve while remaining highly competitive.

Technology

Conducting a gap analysis of the client ecosystem and providing solution recommendations

Tailored Technologies
- Designed future state architecture to meet the firm's specific needs

Centering the client's MLOps vision required a thorough assessment of the firm's current technology ecosystem. We conducted a gap analysis study to resolve specific hurdles to operational readiness according to the firm's prioritized capabilities. A reference architecture was designed to enable key functionalities like segmented staging and production environments and model deployment gating. Additionally, we developed recommendations for the implementation of security and governance policies using existing tools and developed a framework to guide evaluating new MLOps tools in the future.

Business outcomes & benefits

Our deliverables allowed our client to take advantage of the core MLOps principles.

Improved model accuracy and reliability: Improve the accuracy and reliability of ML models by automating the process of monitoring, testing, and validating models in production environments.
Faster model development and deployment: Expedite the process of developing and deploying ML models by automating tasks such as data preparation, feature engineering, and model training.
Scalability: Ensure that ML models can scale to handle large volumes of data and high levels of traffic, by automating the process of deploying and managing models in production environments.
Increased collaboration: Facilitate collaboration between data scientists, ML engineers, software developers, and operations teams by providing tools and processes for sharing code, data, and models in a controlled and secure manner.
Reduced operational costs: Reduce the operational costs associated with managing machine learning models in production environments by automating routine tasks and streamlining processes.

Areas of expertise

Application Discovery
Change Management
Process Documentation
Partner Management
Tool evaluation

How we did it

We started with the customer's goals: Our Client had identified use cases driving their MLOps vision. We worked with them to understand these scenarios and developed a plan to align with their goals and deliver the most value.

We doubled down on our holistic approach: We analyzed the three pillars separately but at each step, we thought about how they complemented and balanced one another. The technology recommendations were informed by existing skill sets and documented processes. This allowed us to keep iterating and deliver a uniquely optimized and personalized solution.

We thought about the future: We deliberately included future expansion into our plan to deliver a timeless solution that our client could continue to build on. For example, we considered how potential future cloud migrations would impact existing processes. Planning for the future improved our solution and increased client satisfaction.

How we can help you

Machine learning continues to be an integral component of decision-making for our clients and has elevated the necessity for MLOps. We help our clients deploy, monitor, and update models in production while fostering proper communication between data scientists, data engineers, ML engineers, and consultants/domain experts. We can deliver value at each stage of the MLOps process:

Discovery
- Identify and select a use case model(s)
- Refactor and coordinate models into automated pipelines
Model Training
- Integrate artifact tracking, parameter tuning and model selection into ML pipelines
Model Validation
- Establish use case-specific KPIs to track model performance and apply stage gates to model redeployment
Production
- Create visibility for ongoing model stats for leadership
- Establish performance monitoring so that models self-heal when necessary
Model Refresh
- Update the existing machine learning model with new data and/or incorporate new features into the model