Coroutines reduce readability

A recent email thread was brought to my attention which suggested adding greenlet-style coroutines to the Python standard library. I felt like this would be a good time to go into why coroutines are a bad idea.

"Readability counts."

Many keyboards have been worn out debating how to make code more readable, and what affects readability. One of the reasons I've enjoyed using Python so much is that it doesn't fight (much) my efforts to write code that's easy to read. Proponents of coroutines, as used in libraries such as gevent, have claimed that a major advantage is that they make networking code easier to read, compared to other concurrency mechanisms such as generators or callbacks. I am going to argue instead that coroutines make code harder to read. Before I get into that, I'm going to propose this definition of readability:

A program is readable when you can look at its code and understand what it does.

Note particularly that this is different from looking at code and understanding what the author intended the program to do. Readability counts most when you're reading code that doesn't work (such as when debugging) or code that might not work the way it should (such as when doing a security audit). Designing for readability means designing for adversarial review of code.

As Mark Miller and Dave Herman have pointed out, when first learning to program in a language like Python, there are some basic assumptions we make about control flow. The main one I want to talk about here is that it's possible to understand what happens when you call a function by reading the code of the function.

Consider this trivial example.

self._foo.a = self._foo.b
self._foo.b = baz()

Suppose you want to determine whether any code can see self or self._foo while its internal attributes are disarranged — in this case, the time during which its a and b attributes are set to the same value. Normally in Python we'd be able to answer this question by reading the source for baz. However, in the presence of coroutines this isn't sufficient! If baz, or anything it calls, invokes something that causes the current coroutine to suspend, then any other code can be invoked at that point, thus making it impossible to keep this internal mutation from being exposed.

"In the face of ambiguity, refuse the temptation to guess."

There's many different situations where this sort of problem arises. In general, any kind of imperative code needs to be able to preserve invariants for its data structures, while still being able to do work that might temporarily violate those invariants. This is why Python has the with and try/finally structures; being able to express some level of transaction-like behaviour is useful, so you can worry about cleanup and invariants at a single place.

These are only useful for operations that aren't extended in time, however. When using coroutines, it's possible to write code where finally blocks don't get a chance to run before something in another coroutine interferes. More distressingly, the finally block may not run at all! When a coroutine is suspended, there's no guarantee it will be resumed before the program terminates.

If this sounds a lot like using threads, it's because it is. Coroutines are a form of threads; they're the foundation for what are called "green threads" in some language runtimes, such as early versions of Java and Ruby. The problems with threads are well documented, and various tools developed to deal with the problems they introduce, such as mutexes, locks, and queues. Not all coroutine libraries provide these tools, and the ones that do don't encourage their pervasive use. The only salient difference in behavior is that OS-provided threads can be interrupted at more points. On the other hand, OS threads can be scheduled on multiple processors at once, providing parallelism. So, in conclusion: coroutines are strictly worse than threads, because they have the same kinds of problems (non-determinism, loss of code readability) and do not offer any unique advantages.

Superior options for concurrency are use of Deferreds to manage callbacks, or generators. The primary historical objection to callbacks is the "pyramid of doom", where functions get nested to ridiculous depths. Deferreds make callback-invoking code composable, and help flatten out the functions used, as David Reid has ably shown. Use of callbacks/Deferreds lets you keep all your normal assumptions about control flow. Invoking a function can return a Deferred, but it can't do anything to suspend your code calling it. Once a function is exited, it can't be re-entered without calling it again. So in a very useful sense, Deferreds make concurrent code much more readable.

Generators let you keep most of your assumptions, but they add an extra rule: a function can be suspended and (maybe) later re-entered when a yield keyword is encountered. This provides the same amount of information as callbacks, but does enable some cases that require a good bit more squinting and head-scratching to figure out.

I believe that better syntax can provide the convenience of generators and the clarity benefits of Deferreds. More about that in a future post.

6 comments:

shanewholloway said...

I think you misattribute to coroutines & continuations the problems of greenlets/greenthreads. Both coroutines and continuations resume a specific target destination — not an unknown one such as in the greenlet implementation.

Connecting back to your example, the same potential for problems exists if the bar routine invokes an (overridden) method, a callback, or any other dynamic code path such as resolving a deferred. Your partially mutated state could be observed by other code. The logical extent of protecting yourself from mutable state is found in the pure functional languages, where state simply doesn't exist.

Overall, your comments with regards to greenlets are spot-on, although I wouldn't call them strictly worse than threads. After all, running 10,000 threads just isn't practical, but running 10,000 greenlets is quite feasible.

Allen Short said...

Resuming a specific target destination doesn't change the analysis any, because it's still context switching to a different call stack that you can't directly observe by reading the code.

JimJJewett said...

Even the original code can certainly switch context on you. If threading is enabled, you may hit the tick-count. If threading is not enabled, the attributes *could* be properties rather than normal attributes.

Assuming you've dealt with those concerns via coding standards, why can't you do the same in a co-routine? "Ensure all variants before yielding control."

Allen Short said...

I agree. Coroutines (as provided by Python's greenlet module, and similar things) are a form of threading, and have similar problems. I recommend avoiding other forms of threads too. :-)

Ryan Kulla said...
This comment has been removed by the author.
Anonymous said...

this artile is funny in two ways

first, it says that “coroutines are strictly worse than threads, because they have the same kinds of problems (non-determinism, loss of code readability) and do not offer any unique advantages” but doesn't mention the lesser overhead and that manual yielding eliminates a good part of synchronization minefield, which is probably the very point of greenlets

second, it praises deferreds but forgets that these can have the same problems with readability. any deferred's callee can alter global objects and bring the same confusion. besides, what bars you from using deferreds with greenlets?

moreover, knowing “what code does” is not always more important than knowing “what the author intended the program to do”. you can have a __getattr__ or __getattribute__ called on self._foo and __set__ on self._foo.a which clearly shows that python wants you to be able to make your application behave the way you like as long as you know how to use your API. of course it's easier to debug an application that has only one thread of action, but if you have to have some concurrency, there's little difference between twisted-like and gevent-like frameworks on the level of finding your exceptions