Self-documenting code is a myth

Adam Warski

07 Feb 2024.3 minutes read

Self-documenting code is a myth webp image

Or rather: it's a great goal, sometimes achievable, but it doesn't remove the need to comment your code. All the non-obvious places should still carry comments—and there's more of these than you think! After all, our main concern as code authors is the convenience of the readers: we want the code to be understandable by others or by our future selves, first and foremost.

The self-documenting code principles are, of course, valid (they are also good for other reasons): writing small functions/methods, spending half of our days picking the right names for things, crafting appropriate abstraction boundaries, writing tests that not only verify the code's behavior but also serve as compiled documentation.

But in the end, there will always be those code fragments, where a reader will scratch their heads, wondering how best to express their "gratitude" towards the author. (Coincidentally, these might be the code fragments that must be hand-written and not generated by ChatGPT.)

For example, such a code fragment might be an integration with a quirky third-party service: write about that quirk. Or a business rule, which translated into code, turns out as weird bit manipulation, but in fact, must be written that way because it's on the critical path: write about the "why". We've got quite expressive programming languages, but they have limits—and where the limits are reached, don't hesitate to reach out for a comment.

The above might be obvious, but how to find other code fragments where a comment is needed? As a first approximation, use your common sense (e.g., documenting Person getPerson() as gets the person shouldn't pass this filter). Still, we might find ourselves over-optimistic about the legibility of our beautiful new abstraction.

Self-documenting code
Hence, as a second approximation, look for code review comments, e.g., as pull request (PR) / merge request discussions. And encourage your code reviewers to point out where they had trouble following the code flow.

If there's a discussion in a PR about a particular code fragment, this should always end up with a refactoring of the code, a comment in the code, or both. The mere existence of any kind of comments or questions to a PR indicates that the codebase is not, in fact, self-documenting.

Discussions in PRs and issues are convenient and a good way to tighten the feedback loop. Use the information that's contained there, and don't let it die! Don't worry about duplicating information: the future lucky person tasked with modifying the code will only read the code—they won't browse the entire PR merge history with their associated discussions. That is … unless they get really angry and desperate—but let's try to avoid that.

As a bare minimum, leave them some context in the form of an issue number or a backlink. But better, just include the relevant information in the code. The issues and PRs become forgotten as quickly as they are merged.

"But!", you will say, "documentation gets outdated!". Yes, and comments get out of sync with the code that they are commenting. Sure, that's a problem. But that's especially true if the documentation lives far away from the documented entity. Comments are in a better place: you can put them inline, close to the logic, so the chances that they become obsolete are smaller.

To add one more argument to the value of code comments: the source code should be the central source of truth when it comes to the source code itself. We've already discussed that information dispersed across multiple places: PR comments, issue descriptions, and source code itself is hard to browse. But it's also partially vendor locked in. Git is a distributed version control system, meaning everybody has a full, frequently updated backup. To the extent, of course, to which the information exists in git in the first place.

There's a lot of information in PRs, issues and in our heads: don't keep it there! And don't let it die! Summing up:

  • even the cleanest code bases need comments
  • use code review comments as indicators as to where refactoring or a comment is needed
  • include the "why" using context provided through the issue/bug report that you are working on
  • anything important contained in an issue or a PR/MR discussion should be transferred to the code base—either through naming, code structure, or comments
  • make the source code the central source of truth on the source code.

Reviewed by Michał Matłoka

Blog Comments powered by Disqus.