PyMeta: How and Why

(Note: In the course of writing this post, I fixed a bug. Get PyMeta 0.1.1 here.)

Python already has a big pile of parsing frameworks, such as ANTLR, PLY, PyParsing, YAPPS2, ZAPPS, ZestyParser, and plenty others. Why did I write another one?

PyMeta Works Like Python


One of the main difficulties I've had using parser generators has been the difficulty of figuring out why a grammar didn't work. Fixing shift-reduce and reduce-reduce conflicts seemed like voodoo to me, and though I slightly understand better how to fix such things now it's still a different mode of thinking that I don't want to try to get into when I just want to parse something simple. PyMeta uses a variation on the Parsing Expression Grammar (PEG) approach to parsing. The chief consequence of this is there's no possibility of ambiguity in a parse: a successful parse will yield exactly one result, and you can trace the control flow through the grammar to figure out how it got there.

This is possible because the alternative operator ("|") works like Python's "or": it tries all the alternatives in order, returning the value of the first expression to succeed. Let's take on the example that's so problematic in the Bison doc I linked to:

>>> ifExample = """
ifStmt ::= ( <token 'if'> <expr> <token 'then'> <stmt> <token 'else'> <stmt>
| <token 'if'> <expr> <token 'then'> <stmt>)

expr ::= <spaces> <letter>
stmt ::= <ifStmt> | <expr>
"""


(Rules can take arguments in PyMeta: "token" here is a built-in rule that consumes whitespace and matches all the characters in the string passed to it.) I've reversed the order in 'ifStmt' from the Bison example. This grammar isn't ambiguous: PyMeta will try to match the if/then/else form first, and if that fails, it rewinds back to before the 'if' and tries the if/then branch of the
alternative.

>>> Example = OMeta.makeGrammar(ifExample, {})
>>> g = Example("if x then y")
>>> g.apply("ifStmt")
'y'
>>> g2 = Example("if x then y else if a then b")
>>> g2.apply("ifStmt")
'b'

(Since I didn't specify explicit return values for any of the expressions, they simply return the value of the last expression in the sequence. In this case, "letter" returns the letter it parses, which is then passed up through "expr" and "stmt" and "ifStmt".)

An additional advantage is that you don't need a separate lexer pass: you can handle everything in a single grammar.

PyMeta Lets You Say It Quickly


So why not ZestyParser or PyParsing, which also don't have these problems? There's a couple reasons. First of all, neither of these offer a compact syntax for defining a parser. (Which arguably isn't desirable in all cases; I may add a pyparsing-like interface to PyMeta later.) Second, PyMeta fits Python's object-oriented abilities better -- each grammar is its own class, and can be subclassed in the usual way, as well as extended with new grammar rules.

>>> from pymeta.runtime import ParseError
>>> class Example2(Example):
... def parseIt(self):
... try:
... return self.apply("ifStmt")
... except ParseError:
... return "No Good"
...
>>> Example2("if x then y").parseIt()
'y'
>>> Example2("bob's yer uncle").parseIt()
'No Good'
>>> extendedGrammar = """
number ::= <digit>+
expr ::= <super> | <spaces> <number>
"""
>>> Example3 = Example2.makeGrammar(extendedGrammar, {})
>>> Example3("if 1 then x").parseIt()
'x'



Notice here the use of the special rule "super": this calls the grammar rule of the same name as the current one in the superclass. (This uses Python's super() internally.) This way it's possible to try new extensions to a grammar without having to disturb existing code.

3 comments:

shanewholloway said...

Allen,

This is probably the most beautiful piece of orthogonal, introspective, object oriented python I have seen. BootOMetaGrammar and the Ast/Python Builder classes are particularly rich. Thank you for investing the time to create this so I can use it instead of just playing with OMeta in COLA and/or JS.

And, for the record, I have to agree with your reasoning. Fighting with shift-reduce and reduce reduce conflicts makes writing parsers about as fun as writing C-api python modules.

I'll be interested in hearing about your future adventures with OMeta and Python.

Appreciatively,
-Shane Holloway

Allen Short said...

Thanks for your kind words :)

Unknown said...

Oh, wow, another Python parser author who likes Pokey! (I'm the (former) author of ZestyParser.) I wonder if there's statistically significant overlap between those two demographics. Anyway, for that reason I high-five you. If you ever visit my village you may eat freely of my cache of Arctic Circle-Candy.