MLOps: Why Machine Learning is Unique within DevOps
- diciembre 15, 2022
NTT DATA helps hundreds of clients apply DevOps practices in software engineering by adopting well-defined processes, modern tooling, and automated workflows. We have streamlined the process of moving from development to robust production deployments. In a previous blog post, I shared how we built CI/CD pipelines for an e-commerce business with Keptn and Kubernetes. In this post, I share why experts consider using DevOps practices for developing Machine Learning (ML) solutions and explain how software engineering and Machine Learning engineering differ.
DevOps in Machine Learning
Let's start with the question, “What is DevOps in Machine Learning?”
Machine Learning DevOps (MLOps) is an organizational change that relies on a combination of people, processes and technology to deliver Machine Learning solutions in a robust, scalable, reliable and automated way. So far, it seems similar to DevOps, so why do we distinguish MLOps and DevOps? Let’s deep dive into Machine Learning uniqueness.
Machine Learning includes multiple steps from data preparation to model training to model deployment. Each step may include one or many tasks. Some of these tasks may be in sequence and some of them in parallel.
Data Preparation
The first Machine Learning step is Data Preparation. There is a special warning for Machine Learning: “Garbage-In, Garbage-Out.” This concept has plagued analytics and decision-making for generations. Bad data can negatively affect the Machine Learning workload twice – first in the historical data used to train the ML model and second in the new data used by that model to make future decisions.
Model Training
The second Machine Learning step is Model Training. The step may look like either a single model training or a set of experiments. For example, hyperparameter optimization looks for the best model automatically by focusing on the most promising combinations of hyperparameter values within the ranges that you specify.
Model training is used to validate the accuracy of the model on a new data that was not used for training to test if the model is generalized and not just remembering the training data set. If the model passes a threshold based on metrics, the model is stored in central model registry. The registry keeps all metadata about the model, like model lineage, parameters, metrics and so on.
Model Deployment
The last step is model deployment. It is common to have multiple environments, such as staging and production environments, similar to software development. Also, you can inject different controls into the deployment pipeline, like manual approval and quality gates.
So, as you can see there are disparate pipelines and workflows with a lot of different steps and dependencies that must work together to provide the ability to iterate through the lifecycle. Another goal of these pipelines is to provide end-to-end traceability, so you can track how the models were trained, what metrics were used, and where the models were deployed.
DevOps Practices
All these steps and tasks that we have just discussed can be challenging by themself. But when you start to incorporate DevOps practices on top of all of these, each of these steps and tasks have a source code with version control. So, in order to do these, you need to package your code and execute it.
Versioning source code is not a new practice if you familiar with software development or DevOps, but we have ML specific considerations, like data. We need to track and version our data that becomes input for different steps in our overall Machine Learning workflow. In addition, model artifacts that are results of the model training step need to be tracked from end-to-end in the pipelines. As you can see there are lot of considerations that need to be stitched together to build end-to-end pipelines. In reality, there are a lot of combinations of different CI/CD tools and integrations that are not necessarily purpose-built for Machine Learning. You may use something that is purpose-built for ML (for data preparation, for model training) and have a CI/CD orchestration layer for implementing CI/CD practices.
To make it even more complicated, there is no single golden pipeline for MLOps just like there is no single golden pipeline in DevOps. The reason is because technical implementation, organizational structure, different integration requirements, regulatory requirements may vary. But we can standardize those Machine Learning steps that we discussed earlier.
Customer Journey
Our clients usually start with task automation within the ML workflow and then apply some level of orchestration to determine what task should be executed in order. The goal is to execute all tasks in a pipeline.
In parallel, or after building an automated ML workflow, clients incorporate MLOps practices such as:
- Code repository
- Data version control
- Feature store (for sharing and discovering curated features)
- Model artifact versioning (for managing model version at scale and/or establishing traceability)
As you can see, there are two distinct phases: model building activities and model deployment activities. These two phases have different personas, so the idea is to establish as much automation between those steps, orchestrate these automations, establish automated quality gates. DevOps practices are not only source code, versioning, automated model building and deployment, it is Quality Gates as well.
This blog post shows core components of ML workflows in a very technology-neutral way. My next blog post will discuss how to build MLOps workflow in the AWS Cloud. Check out this post by NTT DATA’s Gyan Prakash in the meantime: AWS MLOps Framework Pre-Packages ML Model Deployment Pipelines.
Subscribe to our blog