Allen's Weblog: 2011

Exocet: A Second Look

So what I didn't really talk about last time is that more than just letting you directly express module loading, Exocet also implements:

Parameterized Modules

In Python, classes and functions take parameters, but modules don't. When you load a module, you don't have any opportunity to tell it what you want, or what context it's being loaded in. A lot of times, this matters.

For example, some code has optional dependencies. This idiom can be seen a lot in some parts of Twisted:

try:
    from OpenSSL import SSL
except ImportError:
    SSL = None

Code following this checks if 'SSL' is None to decide whether to define methods and classes that provide support for SSL connections in Twisted.

Other code can depend on one of multiple providers of an interface. Here's the famous example from the docs for the magnificent lxml library:

try:
  from lxml import etree
  print("running with lxml.etree")
except ImportError:
  try:
    # Python 2.5
    import xml.etree.cElementTree as etree
    print("running with cElementTree on Python 2.5+")
  except ImportError:
    try:
      # Python 2.5
      import xml.etree.ElementTree as etree
      print("running with ElementTree on Python 2.5+")
    except ImportError:
      try:
        # normal cElementTree install
        import cElementTree as etree
        print("running with cElementTree")
      except ImportError:
        try:
          # normal ElementTree install
          import elementtree.ElementTree as etree
          print("running with ElementTree")
        except ImportError:
          print("Failed to import ElementTree from any known place")

The effect of this code is to try to import, in order, one of:

lxml
cElementTree from the stdlib
ElementTree from the stdlib
ElementTree installed separately
cElementTree installed separately

I don't think it's a stretch to say this is rather silly. How would you feel if you saw a function that tried to access five different global variables in a row in order to decide what to do?

And though all of these modules implement the same interface, they're still different code, and you might hit some edge case where their behaviour differs. How do you test your code using each possible ElementTree implementation? As Glyph points out, it's always sunny in Python, and every piece of bad code can be worked around by writing worse code; you could have your unit tests fool around with sys.modules. But can't there be something better?

Here's a different way of thinking about it entirely.

In languages that aren't as good as Python, dependency injection is a technique that gets used to deal with this. Dependency injection has many forms and can be rather complicated, but the general idea is code declares the thing it needs, and something else (unfortunately, often an XML file!) describes what objects to provide to satisfy those dependencies.

With Exocet, we hijack the import statement to describe named parameters, indicating the dependencies our code has.

from exocet.parameters import etree

x = etree.parse(open("mydata.xml"))

Now, if you look in the Exocet tarball, you won't find a parameters.py file. This name doesn't correspond to anything on the filesystem.

So how does this help us? Well, it means you can load your ElementTree-using module like this:

m = exocet.pep302Mapper.withOverrides({"exocet.parameters.etree",
                                       xml.etree.cElementTree})
my_etree_using_module = exocet.loadNamed("my_etree_using_module", m)

Or this:

m = exocet.pep302Mapper.withOverrides({"exocet.parameters.etree",
                                        lxml.etree})
my_etree_using_module = exocet.loadNamed("my_etree_using_module", m)

This way, you can test your code without having to resort to sys.modules hackery, and you can better factor your applications by separating configuration and environment concerns from the rest of your code.

As you may recall from last time, pep302Mapper provides the default Python implementation of module loading. In this case, we've just added another name that can be imported, exocet.parameters.etree.

Being able to provide parameters to modules opens up the possibility to eliminate all use of global objects from your code, and pass objects only to the code that needs them. I believe that with experimentation and refinement of these tools, this technique is going to enable a lot of simpler methods for organizing complex applications and reduce a lot of complications people have to deal with in Python now.

Introducing Exocet

Last time I talked about the deficiencies of Python's module system. Now I'd like to talk about a solution to them.

There are two questions related to Python's module system that come up repeatedly on #python, and don't have obviously good answers.

How do I reload a module?

How do I create a plugin system?

Exocet was written primarily to answer these questions.

Download Exocet 0.5

What is Exocet?

Exocet is a new way to load Python modules. It separates the act of naming a dependency from the act of creating a module object. As a result, more than one instance of a module can be created from the same source file. Also, when creating a module object, precise control can be exerted over what it's allowed to import.

Let's start with a code example.

>>> import exocet, httplib 
>>> urllib = exocet.loadNamed("urllib", exocet.pep302Mapper) 
 
>>> def HTTP_Ex(host): 
...     print "making HTTP connection to", host 
...     return httplib.HTTP(host) 
... 
 
>>> class _ModuleProxy(object): 
...     def __getattribute__(self, name): 
...           if name == 'HTTP': 
...               return HTTP_Ex 
...           else: 
...               return getattr(httplib, name) 
 
>>> httplib_plus = _ModuleProxy() 
>>> overriddenMap = exocet.pep302Mapper.withOverrides({"httplib": httplib_plus}) 
>>> urllib_plus = exocet.loadNamed("urllib", overriddenMap) 
 
>>> print urllib.urlopen("http://python.org").read(121) 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
 
>>> print urllib_plus.urlopen("http://python.org").read(121) 
making HTTP connection to python.org 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Quick breakdown of what's going on here: loadNamed takes a module name and a mapper object, and returns a new module object. Note that this is very different from what import does; each invocation of loadNamed returns a new module object, unrelated to any previous ones.

The mapper object is responsible for intercepting all import calls made by the module being loaded. The pep302Mapper object used here just invokes Python's normal importing behaviour, caching loaded modules in sys.modules. But its withOverrides method creates a new mapper, one that looks in a dict first. The urllib_plus module is almost like the urllib module in this example, with one difference: the latter got the real httplib module when it executed the statement import httplib; urllib_plus got the httplib_plus module when it ran that statement. The other mapper that Exocet includes by default is emptyMapper, in which all import statements will fail. You can construct your own, providing any object under any name to be available to import statements in modules you load.

So after constructing these two modules, they're usable as normal. The only difference is that urllib_plus invokes our wrapper function when it calls httplib.HTTP(), producing the printed line before proceeding with its work.

So, how does this behaviour answer the questions I mentioned earlier?

Module reloading

As we saw last time, reload isn't suitable for real-world use; it changes things around too much in some places, not enough in others. With Exocet, you don't have to reload modules because you can just load them. When you want a new module with updated code, you just load it and have a new module object. All the old objects still exist as long as you want them to; there's no orphaned-instance problem. If you need new versions of the module's dependencies you can load them first and put them in the mapper so that they're visible to your new module. Old instances, classes, and modules can be removed in the normal way — by Python's garbage collection, when nothing uses them any longer.

Plugin systems

The normal use of Python packages and modules involves importing specific ones by name. However, sometimes you want to load a bunch of modules and call something in each. Since Exocet borrows code from twisted.python.modules, it's possible to iterate over a package and load each module in it:

>>> import exocet 
>>> [exocet.load(x, exocet.pep302Mapper) for x in 
...  exocet.getModule("twisted.plugins").iterModules()] 
[<module 'twisted.plugins.cred_anonymous' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_anonymous.py'>, 
 <module 'twisted.plugins.cred_file' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_file.py'>, 
<module 'twisted.plugins.cred_memory' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_memory.py'>, 
<module 'twisted.plugins.cred_unix' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/cred_unix.py'>, 
<module 'twisted.plugins.twisted_conch' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_conch.py'>, 
<module 'twisted.plugins.twisted_ftp' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_ftp.py'>, 
<module 'twisted.plugins.twisted_inet' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_inet.py'>, 
<module 'twisted.plugins.twisted_lore' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_lore.py'>, 
<module 'twisted.plugins.twisted_mail' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_mail.py'>, 
<module 'twisted.plugins.twisted_manhole' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_manhole.py'>, 
<module 'twisted.plugins.twisted_names' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_names.py'>, 
<module 'twisted.plugins.twisted_news' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_news.py'>, 
<module 'twisted.plugins.twisted_portforward' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_portforward.py'>, 
<module 'twisted.plugins.twisted_qtstub' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_qtstub.py'>, 
<module 'twisted.plugins.twisted_reactors' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_reactors.py'>, 
<module 'twisted.plugins.twisted_runner' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_runner.py'>, 
<module 'twisted.plugins.twisted_socks' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_socks.py'>, 
<module 'twisted.plugins.twisted_telnet' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_telnet.py'>, 
<module 'twisted.plugins.twisted_trial' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_trial.py'>, 
<module 'twisted.plugins.twisted_web' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_web.py'>, 
<module 'twisted.plugins.twisted_web2' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_web2.py'>, 
<module 'twisted.plugins.twisted_words' from '/home/washort/Projects/Twisted/trunk/twisted/plugins/twisted_words.py'>]

In this case we used the default pep302Mapper since we wanted plugin modules to be able to import anything they wanted. However, a different mapper could be provided which provided different modules, limited them to only importing certain ones, or none at all.

Unit testing

Hardly anybody asks about this on #python, but they should. By loading the modules you test with Exocet, you can replace their dependencies with stubs without monkeypatching. Just provide overrides to the mapper object when loading the module, and you have a stubbed or mocked module to test without altering the behaviour of other tests.

Next steps

Exocet lets you load most existing Python modules, but in a way that gives much better control over what happens. This makes room for radically different ways of thinking about how to organize Python programs. Since this approach is so different, practice is needed to come up with new idioms for how to use it effectively. I've tried carefully to provide mechanism here with very little policy. There are a few kinks to still work out in how Exocet works in more complicated scenarios, but give it a try! It might be just the thing for your next IRC bot. ;-)

If you want to help with Exocet development, it happens here, on Launchpad. I look forward to your feedback.

Modules in Python: the Good, the Bad, and the Ugly

I've spent a lot of time living with Python's module system, both in my own
work and in helping people on Freenode's #python channel. A lot of Python's
power comes from its module system; however, it could be better. It can be
hard to think about how modules could be done differently, since it's so
central to the design of Python software, but it's worth the effort. Here's
some stuff I've been thinking about.

The Good

No global namespace for objects: This is the benefit of module systems in general and it's a big one. It's what I miss the most when writing in C, for example (And until very recently — Javascript!) Being able to treat a source file as an actual container for code instead of an arbitrary pile of functions and variables with no structure makes code much easier to comprehend and navigate.
Python modules are easy to write and to import: Particularly I'm comparing this to Scheme 48 and ML, which both have very well-designed and powerful module systems, but they're rather confusing to the newcomer because they require a good bit of up-front knowledge to construct a module that's useful to anyone else. In Python, you just stick some code in a file and then all the names in it are importable. My earliest Python memory was joining #python and asking "how do I import some code I wrote in one file into another"? I was told 'for a file named foo.py, use "import foo"'. My reaction was "Really? That's all?" Providing a low barrier to entry for creating and using modules is an extremely powerful advantage of Python.

The Bad

Modules are in a global namespace: Although modules contain classes, functions, etc., there's no containment hierarchy for module names themselves. Different modules can have functions and classes with the same name in them, but there's nothing that can contain multiple modules with the same name. This shows up as a problem when you want to write unit tests that use fake versions of some modules, for example. When faking a function or a class, one creates a new version. Modules generally have to be modified rather than replaced, since import looks up modules in the global module namespace.
PYTHONPATH is a rather inflexible way to organize modules: Organizing modules by location in the filesystem is a great way to get started, but it's not the only possible thing one might want. This deficiency has been addressed in recent Pythons via the PEP 302 import hooks. However...
PEP302 hooks help, but aren't enough by themselves: The canonical example of alternate module organization is putting them in a zip file, which Python supports via the standard import hooks now. Now you have extra problems, though. Python packages are a good way to organize modules, but they don't provide a way to enumerate their contents. To work around this, everybody looks at the filesystem layout to determine what's in a package. But if your modules aren't being loaded directly from the filesystem, this approach won't work.

The Really Bad

Modules are singletons (i.e., global mutable state): This is the dark secret at the heart of any large-scale Python project. One can be very careful about organizing one's state into instances and so forth, but all modules are still visible and modifiable by any code at any time.
Still easy to write unreadable code via monkey-patching: It's easy and convenient to assign to module attributes any time you feel like it. The result is that any time you see "from foo import someObject", you can't every be sure about where that object was defined unless you read all the source code in the application. Even when it's desirable to change module contents (such as for tests), it's easy to fail to do so in a way that doesn't introduce dependencies or conflicts between tests. The classic example is calling some function that initializes module globals from a config file; if one test does it, it can cause tests run after it to fail or incorrectly succeed.
reload(): The reload function is a symptom of all the above problems. Its inspiration is obvious: loading code that's changed since the current Python process has started is an entirely sensible idea. However, Python's assumptions about how modules work makes this rather difficult to do in a sensible manner. It's common to create new lists rather than modify old ones when a new version of some data is wanted. This convention is reinforced by the ease by which list comprehensions can be used to do this job. The convention encouraged by the existence of reload is exactly opposite, though — instead of creating a new module object, the old one is emptied and refilled with fresh objects. The result is that instances of classes in that module are orphaned; the class they were instantiated from can't be reached by its name. Also, it only reloads a single module; no help is provided in updating modules that depend on it, or updating its own dependencies. Figuring out which modules to reload or not reload at any given time is often very tricky. Plenty of other corner cases exist, such as reinitialization of function default arguments, and so forth. Because of all this, the standard advice on #python is that "reload will not make you happy".

What Now?

So with these problems identified in how Python handles modules, can anything be done?
Well, that's why I wrote Exocet. More about that next time.

Allen's Weblog