From strategy to operational excellence: the elements of machine learning projects
How do standard software engineering projects compare to machine learning? What are the common elements of ML projects, and where’s room for MLOps? We talk to Lisa Knolle, Senior ML Engineer & ML Consultant at Zalando, about teamwork in ML projects, knowledge sharing, management, operational excellence in machine learning projects, and more.
Lisa Knolle — Lisa is a Berlin based Software Engineer that specializes in the field of ML Engineering and ML Ops since 2018. Prior to that, she worked in UX research and built microservices as a Backend engineer. In her free time, she likes making and playing video games, electronic music and enjoying a good coffee.
Can you tell us a little about your experience with machine learning so far? How did you start your career in this field?
I originally started out as a backend engineer at my company, which was around six years ago. I was very happy with my role but about three years ago, I became interested in machine learning. At that time, I found a team in my company that was building tools for the entire Machine Learning lifecycle. I educated myself in the theoretical part, learned more about the math behind ML, and then transitioned to the ML team. The goal of my team is to make it easier, simpler, and faster for teams across Zalando to solve business problems with ML/AI techniques.
We do that by providing tooling for experimentation and productionization as well as in-depth consultation. In such a case, one of us joins a team that is working on a machine learning project and supports them in different ways from architecture reviews to pair programming, most often focused on the deployment/productionization of their ML pipeline.
Can you also share some details about some most exciting projects that you were involved in?
Sure! Since I joined the team, I’ve worked together with four different teams, supporting them with their machine learning projects. I find all of these projects exciting, to be honest, and one of the most interesting was fraud detection. There was also a project where we classified and understood different reasons why people return items. Yet another project we worked on had the goal to predict if customers are going to contact us for delivery-related issues so that we can reach out to them proactively. These may be little things, but they make a difference in the customer experience. Overall, being able to have a positive impact on the customers and also employees is something that I really appreciate about my work.
And what about the skillsets required in ML projects? When you enter a team as a consultant, do you make sure that there are specific roles in such a team?
I would say it’s very different for every project. When we enter a team, there can be different setups in place. Some teams are already very experienced, have models in production and want to rebuild their architecture to be more scalable. Other teams are just at the end of the experimentation phase and want to start building automatic ML pipelines to train and serve their model. What the teams usually have in common is that they have a data scientist that holds the necessary subject matter expertise of the problem they want to solve and almost always Software Engineers that are working on the existing (not necessarily ML) application. Usually Software Engineers show a lot of interest and are very motivated to learn more about how to build an ML system (just as I was too) and then, often naturally, move in directions of more specialized roles such as ML Engineer or Data Engineer. This is also a goal of my department — to allow other teams in Zalando to build ML solutions no matter if there is a specific ML Engineer present.
More in the 'Managing Machine Learning Project' series:
Having these many perspectives of different team members, how do you find a balance between what’s important to each of you to build a great solution?
I think it's — like in all multidisciplinary teams — very important to be open-minded and also try to understand the other professions and not get a tunnel view on your own silo. To be curious and try to understand the challenges and objectives of the other disciplines. For example, as a Data Scientist, also be mindful about the challenges of model deployment to better understand what information for example Engineers need when deploying a model, but also the other way around: that for example Engineers make themselves familiar with basic tasks in machine learning. So you have a shared vocabulary. I found it very beneficial and enjoyable to pair-program on certain tasks together with Data Scientists and was really enlightened about the learnings I could gain by it (and it was really fun too!). What I also found very valuable are regular knowledge sharing sessions where team members can share in more detail what they are currently working on, what they learned or new technologies they are excited about. It's a bit like in traditional Software Engineering where you have Frontend and Backend, the more you understand about the other world, the better the collaboration goes. So the more diverse perspectives you get in your work, the better the results are I would say. It's all about communicating with each other, sharing knowledge, and also not being afraid of raising concerns and asking questions.
When we compare standard software projects and machine learning projects, would you say that they are in any way different in terms of project management?
These projects are definitely different to some extent. I once read an analogy that describes it best: in Software projects, if you can make a wireframe of it, it's very likely you can also build it. In machine learning, at the beginning of the project, the question is often not how to build this, but will it actually work? There is an idea, but we also need to find a model that serves this purpose and will solve the problem we’re dealing with. There’s this first question we face when working with ML: Can it be done? Also, is an ML model the right solution, not only from an ethical point of view but is it justified? Because there are definitely cases where a large and expensive Neural Network might solve the problem, but maybe a way simpler solution would do that too. This question, however, can sometimes first be answered after experimentation.
The next challenge to solve, and that is different, is Data. Although a big success factor for machine learning is that we have more and more Data at hand, finding the right Data Set and making sure it's reliable is the first step you have to take and shouldn't be underestimated. Also having enough labeled Data can be a critical problem. And even once you have the right Data, it needs to be constantly monitored for quality and drift in later stages of the project.
A less technical but actually even more important data-related point when it comes to Machine Learning having an actual impact on humans is to be sure that your model is trained on data that represents your whole user spectrum. Otherwise, your model could carry bias and mistreat underrepresented groups.
Given the experimental nature of machine learning, it is also very important from the beginning to understand your business problem and setting clear KPIs to aim for. This allows early testing and validation.
You might also be interested in:
Once you have a promising model candidate and enter the phase of productionization, there are also a couple of aspects that are, from my point of view, very different from classic Software Engineering. I very well remember the first time I deployed an ML model to receive actual live traffic. I had to take extra care to ensure a fair distribution of data before deploying the model to ensure there was no obvious bias. Therefore, it’s very important to continuously monitor both the data and the models when training and deploying, to assure you can react to changes due to e.g. seasonality or simply changes in the users' behavior over time, but also to validate the quality of your input data. Because in almost all cases, the data that goes into your System is not owned by your team.
Another aspect is connected to the fact that you very often operate on personal data. Ensuring customers' privacy, not only because of GDPR, by protecting their data during the whole lifetime of a, ML project adds a lot of additional effort when implemented too late.
So when you work on a complex project like that, where there’s a lot to consider, how do you approach estimating your work?
That’s a great question! Like in normal Software Engineering, estimations are hard and the more experience a professional has, the better they would get at it. But as mentioned, the beginning phase includes a lot of explorational and experimental work which is harder to estimate. The more important thing is to have clear KPIs that indicate your progress. Also reaching out to other teams in the company that solved a similar problem can be extremely helpful.
As my role is more on the engineering side, I can often benefit from having solved a similar task in a past project and can therefore better ask concise questions in the very beginning of a project, simple examples like specific requirements to latency and availability, how shall the model be served, specific requirements to the hardware, are there specific technologies used that are not available out of the box and such. But so far, each project had its own new and specific challenges that allowed me to dive deep into unknown areas, which I appreciate a lot.
But out of my view, the main gain of estimating tasks together in a team is not just to put T-Shirt sizes on problems but rather to be a starting point for communication and alignment. To discuss a task, get a common understanding, and solve unclarities.
Do you also have project managers in your teams?
That depends. In many of the projects that I was part of, project manager is not a fixed standalone role anymore. Larger projects in the company that require extensive collaboration between multiple teams have specific project managers. In team internal projects, the classic project management tasks are distributed between Product Managers and Delivery Leads. If teams need help coordinating their work, they can always reach out to an internal pool of agile coaches that help setting up team rituals and such.
Given that you are to some extent self-managed, does it require more seniority of team members to work in a system like that?
It definitely requires some sort of seniority but even more a sense of ownership. Teams can decide on their own if they e.g. follow more of a Kanban or Scrum-oriented workstyle and can autonomously adjust their processes. Having the possibility to learn more about project management practises by that and taking responsibility for processes and team rituals can also be very motivating to people no matter what background.
More in the 'Managing Machine Learning Projects' series:
Now when you mentioned motivation, this is one of the elements that have to be managed in the project too, isn’t it?
That's true! Of course, ideally, everybody is on their own intrinsically motivated but there are definitely aspects that can support this. To appreciate each other in a team, celebrate success, while still being humble and open about failure and conduct learnings from it is I think very important. Having a culture where everybody feels safe to ask questions, request help, make mistakes, and learn together was always something that I found very important in the teams I worked in. For example, regular collaboration by e.g. pair programming is a great tool that supports this, and also - it's fun !
Another aspect is having clarity and awareness of the impact one's contributions have to the team's and company's goals. In parallel, of course, your work should also be in line with your personal and professional development goals and give you perspective. I, for example, appreciate a lot that my company supported me in transitioning between job families during my career.
Throughout our conversation, you’ve mentioned many different aspects of working in ML projects. Just to wrap things up, I was wondering if you could name some good practices that should be followed in every such project?
I don't think there is one answer that fits all projects, but there are some commonalities to pay attention to. As already mentioned, I think it's critical to be very clear from the beginning about the business problem that you want to solve and have clear metrics on how to measure it. Second, plan enough time to find and explore your data and to go very deep in understanding and validating it. Keep close contact to stakeholders and keep validating your model candidate as early as possible.
When it comes to productionizing - having a strategy for data/model validation and monitoring in mind is very valuable. Same counts for implementing GDPR-related measures from the beginning on. And I think, finally, one should be very honest and realistic in evaluating if an ML solution (especially if it's a very cost and energy intensive one) is really justifying the value it adds.
Looking for more great content around Machine Learning and Big Data? Subscribe to Data Times, a newsletter curated by SoftwareMill's engineers.