On Readability

Programs must be written for people to read, and only incidentally for machines to execute. — Abelson & Sussman, Structure and Interpretation of Computer Programs

Code readability gets talked about a lot these days. I haven't yet heard from anyone who's opposed to it. Unfortunately, learning to read code is a skill rarely discussed and even more rarely taught. As the SICP quote above points out, readability is perhaps the most important quality to aim for when writing. But what is code readability, exactly?

I propose there are three essential levels of code readability:

  • "What is this code about?"
  • "What is this code supposed to do?"
  • "What does this code do?"

The first level is important when you're skimming code, trying to develop a picture of its overall structure. Good organization into modules tends to help a lot with this; if you have modules named util then this is harder than it has to be. Supporting docs describing architecture and organization can assist with this as well, along with usage examples if the code you're reading is a library.

The second level is what you encounter once you've found some code and you want to use or modify it. Maybe it's a library with weak or missing documentation and you're trying to discover how a function wants to be called or which methods to override in a class you need to inherit from. Good style, good class/function names, docstrings, and comments can all be very helpful in making your code readable for this case. There's been some research which associates poor quality identifier names and obvious bugs, so even if your code works well, poor naming can make it look like code that doesn't.

However, neither of these are the most important sense in which readability matters. They're more about communicating intent, so that later readers of your code can figure out what you were thinking. Sometimes it's a way to share hard-won insight that leaves indelible scars. But the final level is what matters most and matters longest.

To understand a program you must become both the machine and the program. — Alan Perlis, Epigrams In Programming #23

The most important level of readability is being able to look at code and understand what it actually does when run. All the factors discussed above — naming, style, documentation — cannot help with this task at all. In fact, they can be actively harmful to this. Even if the comment accurately described the code when it was written, there's no reason it has to now. The most obvious time you'll need to engage in this level of code reading is debugging — when the code looks like it does the right thing, but actually doesn't.

Language design plays a big role in supporting or detracting from the creation of readable code. As the link above shows, C provides a myriad of features that either fight against or outright destroy readability. So when picking a language, include readability as a factor in your decision making — and not just how the syntax looks. Mark Miller provides some excellent thoughts on how to design a language for readability in his notes on The Power of Irrelevance. There are also several people studying how to build tools to help us read code more effectively; Clarity In Code touches on some of the issues being considered in that field.

But given the constraints of the language you're currently using, what can we do to improve readability in our code? The key is promoting local reasoning. The absolute worst case for readability occurs when you have to understand all the code to understand any of the code. So we want to preserve as many barriers between portions of the program as are needed to prevent this.

Since local reasoning is good, global state is bad. The more code that can affect the behavior of the function or class you're looking at right now, the more work it takes to actually discern what it'll do, and when. Similarly, threads (and coroutines) destroy readability. since they destroy the ability to understand control flow locally in a single piece of code. Other forms of "magic" like call stack inspection, adding behavior to a class from other modules, metaclass shenanigans, preprocessor hacks, or macros all detract from local reasoning as well.

Since this perspective on programming is so little discussed or taught, it leads to a communications gap between inexperienced and veteran programmers. Once an experienced programmer has spent enough time picking apart badly designed code, it can become the dominant factor in his assessment of all the code he sees. I've experienced this myself fairly often: code that could be easily made a little more readable makes me itch; code that can't easily be fixed that was written with no thought for readability can be quite upsetting. But writing readable code is extra work, so people who haven't spent dozens of hours staring at a debugger prompt are sometimes baffled by the strong emotions these situations inspire. Why is it such a big deal to make that variable global? It works, after all.

Every program has at least one bug and can be shortened by at least one instruction — from which, by induction, it is evident that every program can be reduced to one instruction that does not work. — Ken Arnold

When considering how to write readable code, choice of audience matters a lot. Who's going to read what you're writing? When? For writing prose, we do this all the time. We use quite different style in a chat message or email than in a blog post, and a different style again in a formal letter or article. The wider your audience and the longer the duration you expect the message to be relevant, the more work you put into style, clarity, and readability. The same applies to code. Are you writing a one-off script that's only a few dozen lines long? Using global variables and one-letter identifiers is probably not going to hurt you, because you will most likely delete the code rather than read it again. Writing a library for use in more than one program? Be very careful about using any global state at all. (Also pay special attention to the names you give your classes/modules/functions; they may be very difficult to change later.) If new programmers were taught this idea as well as they're taught how to, e.g., override methods in a class or invoke standard libraries, it would be a lot easier for both new programmers and those mentoring them to relax.

So when you do set out to write readable code, consider your audience. There are some obvious parties to consider. If you're writing a library, your users will certainly read some of your code, even if it's just examples of how to use your library. Anyone who wants to modify your code later will need to read it. If your code is packaged by an OS distribution or other software collection, the packager will often need to read parts of your code to see how it interacts with other elements of the system. Security auditors will want to read your code to know how it handles the authority it's granted or the secrets it protects. And not least, you're writing for your future self! So even if those other people don't exist or don't matter to you — make life easier on that last guy. He'll thank you for it.

0 comments: