Two-thirds slow, one-third amazing



My evident neglect of this site was not intentional. Moving across the country and starting a new job tend to reduce one's available time for open source work, and mine hasn't resulted in anything really worth announcing for the past year (or more). But today that changes!

Download PyMeta 0.4.0



Since the last release of Ecru I have been trying to get rid of its dependency on Python, by porting the E parser to E. In the process of doing so, I realized it was probably a bad idea to try to use a parser whose only form of error reporting was the string "Parse error". Since I'm still more familiar with Python than E, I started implementing error reporting in PyMeta. (Also, Python has a debugger.) This resulted in some significant rewrites of the internals.

So what's different in PyMeta 0.4?

Comments!


With its new space-age technology, PyMeta 0.4 will treat '#' as a comment character, just like Python. (Yes, this was rather overdue.)

Reorganized code generator


Previously, code for grammars was generated by the grammar parser calling methods on a builder object that directly emitted Python code as strings. Now the grammar parser builds a tree, which is then consumed by a code generator. If you want to generate something other than Python (or change how Python code gets generated), it should be a lot simpler now. Look at pymeta.builder.PythonWriter for specifics, specifically the generate_ methods.

Error tracking and reporting


This is the big one. Previously, PyMeta expressions returned a value or raised an error. Now, each expression evaluated by the parser now returns a value and an error, even if a successful parse was found. If parse failure occurs, an error still gets raised. Combining expressions with "|" returns the error from furthest into the input and combines ties.

So the result of this is that you can now get nicely formatted output telling you where stuff went wrong and how. Here's an example, based on the old TinyHTML parser:


When you mismatch tags, the parser notices:


Notice here that the information in ParseError is structured so that future tools can figure out stuff about what failed and how.

If there's more than one possible valid input at the point of failure, the parser will tell you:


Plans for the Future


There are a few directions I'd like to take PyMeta in the future. A really nice thing would be to have a way to generate grammars ahead of time easily, writing out a Python module. This release includes bin/generate_parser which does a rather naive version of this. The problem is figuring out how to make grammars that are subclasses of something other than just OMetaBase.
Also, with the new code generator setup, it'd be fairly easy to generate Cython instead of Python, resulting in grammars that can be compiled as extension modules, hopefully resulting in much faster parse times. PyMeta isn't meant to be blazingly fast -- PEG parsers aren't known to be the most efficient -- but it'd be nice to squeeze all we can out of it.

Other people have asked for event based parsing and incremental output. Seems like a neat idea... I'd just have to figure out what that means. :-)

Special thanks to Marien Zwart and Cory Dodt for their contributions and encouragement for this release.

4 comments:

Unknown said...

This is terrific. Thanks for all your effort, Allen!

Onne said...

Not sure, but I think that only works well when the grammar is simple. Lets say the html rule is like this:

text | html | php

Where the "php" rule would try to match php open and close tags.

Now you get the error in the php rule, saying:

Parse error at line 1, column 2: expected '?'

Or you get:

Parse error at line 1, column 1: expected html or php

Right?

Check out jmeta (my PEG parser for java) which uses an explicit no backtracking after this point as error notation (using "!"). Though I recently came to the conclusion that it should only report a real error if all invoking rules disallowed backtracking. ( http://github.com/onnlucky/jmeta ) I use the same example in the readme.

If you would add something like that, you can handle most errors nicely. But I can definitely use the collecting all errors from all alternatives for reporting in jmeta. Good work!

regards
-Onne

tav said...

Congrats on the release dash!

— Cheers, tav

Kevin H said...

Ack! How did I miss this? Fantastic news!