MLOps - What is it, and why it matters?
With the rise of AI and ML, a new acronym became popular - MLOps. MLOps stands for Machine Learning Operations. This concept focuses on streamlining the process of bringing ML models to production in an easy, safe, and organised fashion.
In 2019, executives at Gap said that 87% of data science projects do not go to production. You may ask, “How is it possible?”. This is mainly because, at the time, there were no excellent practices, and the whole deployment in most companies was in chaos. To mitigate that, MLOps practices emerged.
MLOps operates at the intersection of 3 concepts: Data Engineering, Machine Learning, and DevOps.
In this blog post, I will explain in simple terms what MLOps is, how it is applied to Machine Learning pipelines, and why it matters. Then, I will outline MLOps practices and how they relate to DevOps. Finally, I will show how different the system is with and without MLOps.
What is MLOps?
MLOps is a set of practices for collaboration between data scientists and operation professionals (between people working on models and people using models as software components). Applying these practices:
- increases solution quality,
- simplifies management,
- automates the deployment of models in a production environment,
- easily aligns with business objectives and regulations.
The chart above shows relationships between different parts of the Machine Learning cycle. It starts with Data Sourcing and Data Labelling, and then we have Experiment tracking, Model Versioning, and Model Deployment. Finally, we monitor predictions to control how our model is used. To ensure high quality, MLOps controls all the segments of the Machine Learning process. Now, let's talk about why it is needed.
Why does it matter?
MLOps provides a structure that eases model training and deployment and lowers operational costs.
Using it may significantly reduce risks as it provides monitoring tools. With monitoring tools in place, multiple problems can be caught, such as data drift (when your data diverges from your training data and your model performs significantly worse, e.g., the majority of Machine Learning models during the COVID-19 pandemic collapsed after such an unforeseeable anomaly).
Applying MLOps also allows you to scale your solution once you gather more data or want to support 10x more users. It allows you to perform the soft release of new versions of the models, test them (i.e., A/B testing) or dynamically scale when needed (effectively reducing the cost of operation when dealing with varying numbers of requests). Additionally, you can control changes in datasets and experimental setups.
MLOps practices
The whole MLOps paradigm circles the topic of its practices. Now, I will outline where those practices are applied in the Machine Learning pipeline:
- Exploratory data analysis - explore and prepare data for machine learning by creating reproducible datasets, tables, and visualisations. The aim is to give more insight into the data you plan to use in further steps.
- Data preparation and Feature Engineering - iteratively create features from transformed data. In this step, you want to create a feature store, which is easy to access and shared storage with pre-computed features. Using such a component can make features shareable across the teams. Additionally, versioning the dataset or features is necessary to employ good dataset practices and track its changes.
- Model training and hyperparameters tuning - consider open-source libraries like Scikit Learn, PyTorch, or TensorFlow. As a simpler alternative, you can consider AutoML.
- Model evaluation and governance - consider platforms like MLFlow to manage model artifacts, versions, and transitions through the model’s lifecycle.
- Model inference and serving - Monitor inference times, delays of the requests, and many more production-related QA matters. Use CI/CD tools like orchestration and repositories to automate production-related pipelines.
- Model deployment and monitoring - Automate permissions, route the traffic, provide REST API endpoints to the model, and dynamically scale the service when needed.
- Automated model retraining - With all the blocks and monitoring tools ready, you can add automated retraining of the model when you observe some shift in data, the model experiences reduced performance, or a fresh batch of data is available.
MLOps vs DevOps
MLOps was highly inspired by DevOps, its success, and its positive influence on Software Engineering. With a rapid and interactive approach to creating applications, according to DevOps principles, MLOps similarly applies its rules to ship machine learning models to production.
On the other hand, MLOps faces some more challenges. First, MLOps is more experimental as the created model is not probabilistic in its definition. When considering that, it is harder to guarantee reproducibility with Machine Learning than Software.
Secondly, teams in Software are composed of less technical roles like UI/UX and PMs and more technical code-focused roles like Software engineers. In ML teams, you have all of the above and an extension of the technical staff by Data Scientists, Machine Learning Researchers, and Machine Learning Engineers, each focusing on a different aspect of the solution while closely collaborating with others.
Model testing is significantly more challenging and less predictable than Software testing. Designing ML tests is still not formalised, while with software, it is a well-established field with division into different scopes (unit, integration, system and acceptance). In ML systems, you must also add a model validation step on top of all software tests.
Automated deployment is not as easy in the ML world as in the Software due to many small decisions of Data Scientists during model training. To have full processes automated, you have to mimic their reasoning and add some form of monitoring to avoid shipping models of questionable quality.
With ML systems, you can experience seasonal changes or slow evolution of how the data provided looks, which is uncommon in Software systems. Models performance can decay in more ways than software systems, so you should consider that when designing and operating them. Such problems with changes in incoming data can be caused by the following:
- different handling of training and production data.
- seasonal discrepancy between when the model was trained and when it was deployed. (i.e., if the model was trained on summer data, it could perform poorly during snows of winter)
- wrong assumptions when you recollect data and introduce bias to a retrained model.
Monitoring is more complicated than classical software-related matters because you also have to account for all the statistics of how the model behaves, have alerts when its performance degrades, rollbacks to previous versions, and triggers for training a new model.
To MLOps or not to MLOps
To show you better how MLOps affects a project, its problems, and shortcomings, let's consider an example where you want to predict tech stock price via a stock price prediction model.
When we have a simple model trained on available data, we optimise it according to a fixed test data. Usually, ML projects are as follows: You have a pipeline that works in your Jupyter Notebook. In our example, we trained a well-performing model that can predict the price of Tesla stock in 2021-2022 based on historical data. Unfortunately, outside of your Notebook, you cannot access it, as it is not yet deployed to the cloud.
With MLOps, the process looks more complex but has more utility.
- You want to create dataset versions because you know your system will be like a living organism and change over time, so you version your data. This is very important for the reproducibility of our ML solution, as we might want to train model 122 using data up until the end of 08.2023, while next month, we might want to extend it till the end of 09.2023 and train model 123. Without data versioning, it would be tricky to recreate model 122.
- You use an experiment tracking tool because you try a lot of different hyperparameter combinations. Possible hyperparameters for our example are all the decisions regarding training: what architecture (XGBoost, SVR, or autoregressive network), what learning rate, how long you plan to train, which augmentation to use, etc. Ultimately, you want to choose the best performing and stable solution among many experiments and control it, and you need experiment tracking.
- Your model has to be accessible by users or other developers, so you deploy your stock forecasting model as REST API using the cloud (i.e. Google Cloud or AWS).
- You need monitoring when your model performance drops significantly because you might lose money on wrong trading calls. In such a situation, rollback to the previous version is needed, or fallback to a rule-based system.
- When the model performs poorly, consider retraining it with fresher data. To do so, you add monitoring, automated dataset creation, model training, and deployment. If done correctly, the model can perform very well for a long time.
Summary
In this blog post, I have covered MLOps, its benefits, its practices, and an example of a stock prediction model and its transition from a basic stage without any MLOps to a fully-fledged MLOps system.
For more technical information and hints on applying MLOps successfully, please refer to the MLOps 101 blog post.