PyMeta: How and Why

(Note: In the course of writing this post, I fixed a bug. Get PyMeta 0.1.1 here.)

Python already has a big pile of parsing frameworks, such as ANTLR, PLY, PyParsing, YAPPS2, ZAPPS, ZestyParser, and plenty others. Why did I write another one?

PyMeta Works Like Python


One of the main difficulties I've had using parser generators has been the difficulty of figuring out why a grammar didn't work. Fixing shift-reduce and reduce-reduce conflicts seemed like voodoo to me, and though I slightly understand better how to fix such things now it's still a different mode of thinking that I don't want to try to get into when I just want to parse something simple. PyMeta uses a variation on the Parsing Expression Grammar (PEG) approach to parsing. The chief consequence of this is there's no possibility of ambiguity in a parse: a successful parse will yield exactly one result, and you can trace the control flow through the grammar to figure out how it got there.

This is possible because the alternative operator ("|") works like Python's "or": it tries all the alternatives in order, returning the value of the first expression to succeed. Let's take on the example that's so problematic in the Bison doc I linked to:

>>> ifExample = """
ifStmt ::= ( <token 'if'> <expr> <token 'then'> <stmt> <token 'else'> <stmt>
| <token 'if'> <expr> <token 'then'> <stmt>)

expr ::= <spaces> <letter>
stmt ::= <ifStmt> | <expr>
"""


(Rules can take arguments in PyMeta: "token" here is a built-in rule that consumes whitespace and matches all the characters in the string passed to it.) I've reversed the order in 'ifStmt' from the Bison example. This grammar isn't ambiguous: PyMeta will try to match the if/then/else form first, and if that fails, it rewinds back to before the 'if' and tries the if/then branch of the
alternative.

>>> Example = OMeta.makeGrammar(ifExample, {})
>>> g = Example("if x then y")
>>> g.apply("ifStmt")
'y'
>>> g2 = Example("if x then y else if a then b")
>>> g2.apply("ifStmt")
'b'

(Since I didn't specify explicit return values for any of the expressions, they simply return the value of the last expression in the sequence. In this case, "letter" returns the letter it parses, which is then passed up through "expr" and "stmt" and "ifStmt".)

An additional advantage is that you don't need a separate lexer pass: you can handle everything in a single grammar.

PyMeta Lets You Say It Quickly


So why not ZestyParser or PyParsing, which also don't have these problems? There's a couple reasons. First of all, neither of these offer a compact syntax for defining a parser. (Which arguably isn't desirable in all cases; I may add a pyparsing-like interface to PyMeta later.) Second, PyMeta fits Python's object-oriented abilities better -- each grammar is its own class, and can be subclassed in the usual way, as well as extended with new grammar rules.

>>> from pymeta.runtime import ParseError
>>> class Example2(Example):
... def parseIt(self):
... try:
... return self.apply("ifStmt")
... except ParseError:
... return "No Good"
...
>>> Example2("if x then y").parseIt()
'y'
>>> Example2("bob's yer uncle").parseIt()
'No Good'
>>> extendedGrammar = """
number ::= <digit>+
expr ::= <super> | <spaces> <number>
"""
>>> Example3 = Example2.makeGrammar(extendedGrammar, {})
>>> Example3("if 1 then x").parseIt()
'x'



Notice here the use of the special rule "super": this calls the grammar rule of the same name as the current one in the superclass. (This uses Python's super() internally.) This way it's possible to try new extensions to a grammar without having to disturb existing code.

Introducing PyMeta

I've just upload the initial release of a Python implementation of Alessandro Warth's OMeta, an object-oriented pattern matching language. Get the tarball here (or simply bzr get lp:pymeta). It's an extremely handy tool for all kinds of parsing and filtering jobs. Here's a tiny example:

>>> exampleGrammar = """
word ::= <letter>+:chars => ''.join(chars)
greeting ::= <word>:x ',' <spaces> <word>:y '!' => (x, y)
"""


Here's a breakdown of the syntax in the first rule: as you can probably tell, "word ::=" defines a parsing rule. "<letter>" calls a builtin rule that matches a single letter. The "+" works similar to its use in regexes, executing the expression before it multiple times and collecting the results into a list. ":chars" binds the result to a local variable, "chars". "=>" is used to provide a Python expression as the result of a successful parse.

The second rule strings multiple expressions together in sequence -- after matching a word, it looks for a comma and then calls the builtin rule "spaces" which matches any amount of whitespace, then another word and finally an exclamation point. Once that's all matched it returns a tuple containing the two matched words.

Using this in a Python program is simple:

>>> from pymeta.grammar import OMeta
>>> GreetingParser = OMeta.makeGrammar(exampleGrammar, {})
>>> gp = GreetingParser("Hello, world!")
>>> gp.apply("greeting")
('Hello', 'world')

In my next few posts I'll be showing more examples of what PyMeta can do and how it's different from other Python parsing tools.