Marcin Baraniecki - Developer Experience done right
In the previous part of this series, developer experience (DX) was mentioned as one of the key terms of a modern software development project. As the author described it:
(...) all things related to ease of working with software and when building software, productivity, and efficiency when doing so. It’s about everything ranging from development tools, languages, through documentation, workflows, automation to deployments, running and maintaining the software in production (...)
This time, let’s talk about lessons learned the hard way. In this article, I will enumerate some of the most common difficulties that pose a real problem for a good developer experience. We’ll discuss the consequences and possible ways of mitigating these issues.
Difficult onboarding
Onboarding a new engineer or a team of engineers to an existing project always takes time. Newcomers must learn about the services’ architecture, infrastructure, coding practices and teams’ topology. It is natural that getting up to speed can take several days. However, it becomes problematic when it turns out that the existing environment is not ready to accept new developers - for a number of reasons.
I stumbled upon a project where the strictly technical onboarding process was kind of documented, but the existing instructions were highly outdated and required a lot of corrections. It took lots of help from the limited SRE team to setup a working development environment, mainly due to the fact that the provided instructions were either for tools that were not in use anymore or that many of the ideas relied on quite a few “hacky” workarounds and solutions that were meant to abstract some portion of the heavy infrastructure away, but in reality, were cumbersome to maintain in the long run. For a bitter example - it took around 40 (forty!) minutes on average to spin up a (highly overengineered) development environment - for every developer every morning.
I witnessed that as more and more people gradually joined the project, one could expect that those further down the timeline would have had a smoother experience. While it is true that most of the issues of the onboarding process were fixed due to their predecessors’ findings, new problems have kept arising. Like for example, the company running out of licenses for a key development tool or the SREs becoming a bottleneck - having duties related to daily maintenance of the infrastructure and addressing the needs of newcomers turned out to be a bit too much for a relatively small team. In the most extreme case, an engineer did not have a 100% properly working development environment after a month of trials & errors.
The average time to productivity for a newcomer turned out to be unusually long. The company was clearly not ready for rapid growth; it felt like close to no lessons were learned since the first “batch” of new engineers joined. Should more care be put into the onboarding experience, as well as documentation and best practices, the process could have been smoother. It also seems like too much of a burden has been put on the very small SRE team (consisting of just two engineers) alone. Maintaining the platform, the development & production environments, as well as addressing the overcomplicated nuances of the newcomers’ onboarding, can easily clog the backlog of things to do on a daily basis; this is where the technical management should shine and try to mitigate the issue. There are many possible ways, like delegating some experienced developers to help or directly lean over the active problems in the ongoing onboarding process.
Some engineers became quickly frustrated. Others felt wrong with the fact that they couldn’t be as productive as they were expected to be. The company’s costs were going up, while the revenue stayed low.
Knowledge silos
The term “tribal knowledge” describes a situation where a significant portion of the technical know-how is held by a “tribe” - usually the most experienced members of the team - at the same time being hardly accessible to others outside it. It’s also related to another term you might have heard of - “knowledge silos”.
Again, it is natural that the “elders” will know much more about the architecture, the code, and the challenges than the newcomers; however, good communication and leadership are the key aspects of success. It is especially important to try to allocate the most experienced leaders’ availability equally across the teams - should anybody have questions or doubts, those equipped with the best knowledge shall be accessible and ready to answer. Throwing a bunch of newcomers at a problem without the context is a recipe for disaster, or at least poor results. The same goes for the established patterns (if any) - open & clear communication between the teams (consisting of varying degrees of experienced members) is the only way of making sure that the approaches to solving software problems do not diverge significantly in the long run.
Creating knowledge silos should be actively mitigated - and these tend to form over time on their own due to the boundaries between the groups focused around everyday tasks. Teams can start with some form of knowledge sharing activities, like internal presentations (e. g. “how we solved X, Y or Z”), Q&A/AMA sessions with tech leads, but also engaging engineers with new challenges or periodically making them explore new technological territories.
Teams change over time. People join and leave, and so do their knowledge & experience. Organizations should make sure that as much as possible of their know-how stays within and is accessible to others in various forms - documentation, recordings, clear architecture and self-explanatory, maintainable code.
Lack of common standards
The last paragraph led us to the issue of the lack of common standards. This is mostly about
the agreed tech stack, templates/scaffoldings for new services & delivery processes. While the technologies are a subject of constant change & improvement, it is at least a good idea to keep a recommended set of libraries and patterns to be used for a fresh service in the organization.
Choosing the n-th variant of e.g., HTTP framework for another node.js-based microservice without a clear justification doesn’t sound like a well-thought decision; moreover, starting anything (as long as it resembles existing pieces of the infrastructure) and not using a well-known template - at least at the company level - begs for improvement.
The list goes on: unified delivery strategies, centralized documentation, common reporting & metrics tools, well-developed standards for runbooks - all of these aspects contribute to a sense of good developer experience. These just make teams’ collaboration easier, more predictable and maintainable in the long run.
I argue that maintaining established technical standards and rules around an organization's development is equivalent to having a clear strategy. Yes, the overall course must and will be adjusted over time—it’s never set in stone. But it’s one of the key drivers of avoiding unpredicted costs—be it time, frustration, or money.
Security chaos
Lest we forget security! Chaos on the platform standards level is often accompanied by a lax attitude towards care for safety. Lack of strict rules for sharing secrets quite often results in folks exchanging private keys, passwords, and other sensitive information over common communication channels; people just stop thinking about these issues, assuming nobody cares much.
I once stumbled upon a developer who wanted to “quickly” debug the production environment by throwing a bunch of console.log(...)
statements. But guess what - these logs were about to print values of live authentication tokens… he was that close to a serious leak! And not to mention that even the most experienced engineers didn’t care much about using centralized password managers or AT LEAST one-time expiring messages - secrets were shared casually on Slack - and I’m not talking about private messages, but quite often public threads!
I’m pretty positive that the security standards should be strictly enforced by the platform. This goes in many flavors - from introducing vaults & secure communication channels for secrets sharing, through company-wise password managers, automated code audits, down to periodic penetration tests. Trust me - security is a topic where it is virtually always better to be “safe than sorry”. Rotating all the private keys just because someone carelessly leaked the secret in their commit clearly doesn’t lead to satisfaction over developer experience!
A security-oriented culture is one of many signs of a mature approach to platform engineering. Take it out, and the overall developer experience in the organization will significantly deteriorate.
Summary - frustration over poor DX
Frustration over poor developer experience grows in teams over time. In extreme cases, it may reach a point where folks become hesitant to even try to improve anything over the fear of making things worse than before. This is clearly an unhealthy situation, quite often resulting in talented engineers starting to leave the team, in doubt that any positive change will ever happen.
This again shows why DX is a key driving force behind the success. Ideally, it should be cared for from the very beginning; but even if the team has been formed around a startup-like culture and there was little to no time other than that spent on rapid prototyping of a new, revolutionary product, at some point investment in good developer experience and vital platform engineering seems unavoidable. There’s just too much at stake here.
Teams acting blindly, without the support of well-established standards across all areas of the day-to-day software development and maintenance are on a collision course with a huge wall. I’ve seen companies & projects fail like this - and those were quite bitter lessons. Yet it’s so important to share these experiences, identify root causes and do the best to avoid failures in the future.
Special thanks to Michał Ostruszka, Grzegorz Kocur and Anna Zabłotna for their invaluable feedback and peer review of this blog post.