Coroutines reduce readability

A recent email thread was brought to my attention which suggested adding greenlet-style coroutines to the Python standard library. I felt like this would be a good time to go into why coroutines are a bad idea.

"Readability counts."

Many keyboards have been worn out debating how to make code more readable, and what affects readability. One of the reasons I've enjoyed using Python so much is that it doesn't fight (much) my efforts to write code that's easy to read. Proponents of coroutines, as used in libraries such as gevent, have claimed that a major advantage is that they make networking code easier to read, compared to other concurrency mechanisms such as generators or callbacks. I am going to argue instead that coroutines make code harder to read. Before I get into that, I'm going to propose this definition of readability:

A program is readable when you can look at its code and understand what it does.

Note particularly that this is different from looking at code and understanding what the author intended the program to do. Readability counts most when you're reading code that doesn't work (such as when debugging) or code that might not work the way it should (such as when doing a security audit). Designing for readability means designing for adversarial review of code.

As Mark Miller and Dave Herman have pointed out, when first learning to program in a language like Python, there are some basic assumptions we make about control flow. The main one I want to talk about here is that it's possible to understand what happens when you call a function by reading the code of the function.

Consider this trivial example.

self._foo.a = self._foo.b
self._foo.b = baz()

Suppose you want to determine whether any code can see self or self._foo while its internal attributes are disarranged — in this case, the time during which its a and b attributes are set to the same value. Normally in Python we'd be able to answer this question by reading the source for baz. However, in the presence of coroutines this isn't sufficient! If baz, or anything it calls, invokes something that causes the current coroutine to suspend, then any other code can be invoked at that point, thus making it impossible to keep this internal mutation from being exposed.

"In the face of ambiguity, refuse the temptation to guess."

There's many different situations where this sort of problem arises. In general, any kind of imperative code needs to be able to preserve invariants for its data structures, while still being able to do work that might temporarily violate those invariants. This is why Python has the with and try/finally structures; being able to express some level of transaction-like behaviour is useful, so you can worry about cleanup and invariants at a single place.

These are only useful for operations that aren't extended in time, however. When using coroutines, it's possible to write code where finally blocks don't get a chance to run before something in another coroutine interferes. More distressingly, the finally block may not run at all! When a coroutine is suspended, there's no guarantee it will be resumed before the program terminates.

If this sounds a lot like using threads, it's because it is. Coroutines are a form of threads; they're the foundation for what are called "green threads" in some language runtimes, such as early versions of Java and Ruby. The problems with threads are well documented, and various tools developed to deal with the problems they introduce, such as mutexes, locks, and queues. Not all coroutine libraries provide these tools, and the ones that do don't encourage their pervasive use. The only salient difference in behavior is that OS-provided threads can be interrupted at more points. On the other hand, OS threads can be scheduled on multiple processors at once, providing parallelism. So, in conclusion: coroutines are strictly worse than threads, because they have the same kinds of problems (non-determinism, loss of code readability) and do not offer any unique advantages.

Superior options for concurrency are use of Deferreds to manage callbacks, or generators. The primary historical objection to callbacks is the "pyramid of doom", where functions get nested to ridiculous depths. Deferreds make callback-invoking code composable, and help flatten out the functions used, as David Reid has ably shown. Use of callbacks/Deferreds lets you keep all your normal assumptions about control flow. Invoking a function can return a Deferred, but it can't do anything to suspend your code calling it. Once a function is exited, it can't be re-entered without calling it again. So in a very useful sense, Deferreds make concurrent code much more readable.

Generators let you keep most of your assumptions, but they add an extra rule: a function can be suspended and (maybe) later re-entered when a yield keyword is encountered. This provides the same amount of information as callbacks, but does enable some cases that require a good bit more squinting and head-scratching to figure out.

I believe that better syntax can provide the convenience of generators and the clarity benefits of Deferreds. More about that in a future post.