Rescue project - how to help & approach?
Some IT projects fail, mistakes happen. KPMG Project Management survey showed that more than two-thirds of organizations suffered at least one project failure in the previous year. Today let's focus on less severe cases, which can still be saved!
When do you need to look for help for your business?
What we’ve learned when completing many software projects and helping various businesses put their ideas to life using code, is that problems are usually caused by multiple coexisting factors. It was often the case that we took a project over after the previous software development team. Among these situations some were classic project rescue cases where we step in to clean up and make things work.
How do you usually know that your software project is in trouble? You may be encountering some of the following signs:
- There are stability or performance issues
- There is no clear documentation
- You have problems communicating with the project team members
- People are leaving the project/software company
- The people left in the project are missing the required skills
- Poor project management/overall mess
- There are issues with delivering new tasks and business value on time leading to missed deadlines
How are companies responding to such situations? Sometimes they attempt to add new people to the team. In other cases they look for external help in the form of consultancy. In edge cases, they seek out a new team or switch software vendors.
How to help the project?
Analyze the situation
The first step when taking over a project or joining as a consultant is to understand the situation. Talk to different stakeholders to understand the system's role and how it works. Knowledge transfer from the previous team is always a good thing, but not always possible. If there is an option to ask questions to people leaving the team/who have already left, leverage such possibility as much as it is possible. Analyze the documentation, wikis, readme and of course the code. Ask different parties, what are the biggest current issues with the system, to get the whole image.
This is the moment several unpleasant things may be discovered, e.g. reasons why the development team decided to leave the projects. There may be plenty of them: legacy stack, bad project quality, issues with project management or team lead or lack of proper skills. All of this will allow you to understand better why the project reached a given situation. It is a rather uncommon situation to find out that a vendor left a perfect project.
Short-term actions
The first most important thing is to learn how to build and deploy a given system. This is a necessity just in case any failure appears. What is more, you should get access to the logs and metrics to be able to analyze appearing issues.
The second step is to check if there are any low-hanging fruits - discuss if there are any big issues with the system, which can be resolved with a small time investment. Those won’t be the best, most clean and most efficient solutions, but may resolve customers' biggest pains. Examples:
- In one project we took over we encountered a situation where the system processing incoming data was suddenly “hanging” after a few days from deployment. There were no tests, no errors were visible in the logs, and it was difficult to rapidly identify and fix the issue's root cause. The simplest workaround was to schedule daily restarts, before the beginning of customers' business day. This allowed the business to operate and gave us more time to solve the situation correctly.
- Another project had huge performance issues, which made the system unusable. A short debugging session showed that a single library had a bug causing threads to hang. A simple dependency update has made the system faster. In the long term, it was still not enough, but in the short term, it made a huge difference.
The third step is to do small cleanups around the work organization. Check if there are hanging, not resolved branches or pull requests. Check the way issues are arranged; if scrum or kanban were used, perhaps it is necessary to examine the backlog items and remove any that are out of date and obscure the project status. Verify if it is easy to find the latest documentation (if such exists), maybe there are a few versions put in a few different places. Write down your discoveries from previous steps and knowledge transfers. The goal is to make work easier and more accessible for new team members who may join later.
Long-term plans
Long-term roadmap requires a deep discussion with the customer. You need to understand what are the driving factors of the business - is it more important to invest in stability and scalability to prepare for new customers or maybe there are missing features that will be revolutionary for a given product? Unfortunately but some trade-offs will need to be made.
Step by step improvements
It may be possible to take over the project and improve it step by step, trying to develop new features and improve the old codebase in parallel. This is the optimal situation, however, not the easiest one. You tackle problems one by one, write new tests, docs, code, deploy, move to the next one. This approach may include splitting the codebase into separate microservices and changing the overall architecture and methods of inter-service communication.
Example:
In one of the mentioned earlier projects, after the initial quick fixes we agreed to continue operating the project, allowing us to develop new features. At the same time, we started rework of the computational-heavy part of the system which was a bottleneck. In the end it became a separate microservice.
Abandon and rewrite from scratch
There are cases where projects are actually in a disastrous state - no tests, no solid docs, bad code quality, legacy technologies, overall mess and mistakes made on the architectural level. Making any change in such an environment is very risky and adding tests is also challenging when code was written without any thought about them. This situation is a total edge case and causes a lot of issues. You may decide to leave the old system in maintenance mode, fixing only the most important bugs and start writing a new system from scratch. Unfortunately, new system creation takes time and it may take years to bring it to production, depending on product complexity. Such a situation is risky from a business perspective, because it would mean that new features will reach customers later, but what is more it is risky from a development perspective, because you have to do a huge migration between old and new and verify if it works the same from a feature point of view.
Example:
We have encountered a production system whose major part was covered exactly by 3 tests. It had issues on the code level (it was visible that a major part was written by people still learning a given programming language), structural level (issues with running existing tests locally and writing additional ones) and architectural (large infrastructure costs, where at the same time there were performance issues). The decision was made together with the customer to abandon the old code and do a rewrite using different technologies which were a better fit to the given problem, which would later be easier to maintain by the customer and allow to reduce the monthly system costs.
Check:
Great software solutions need great teams
5 ways how to verify a competent (remote) programming team
Summary
Rescuing projects is not trivial. It may take time to introduce the required improvements and enable further business feature development. It’s good to start such an initiative with a skilled team, who knows the leveraged technologies very well, so they can spot potential problems faster. It's rarely a lost cause, sometimes it takes time and patience to see a positive outcome.
Do you need help with your project? Let’s talk!