Turns out I was wrong. I misunderstood the requirements for OMeta's input stream. Various operations require the ability to save a particular point in the stream and go back to it. To further complicate matters, arguments to rules are passed on the stream by pushing them onto the front, and rules that take arguments read them off of the stream. This is very handy for certain types of pattern matching, but it totally destroys any hope of simply implementing the input as a list and an index into it, because there has to be a way to uniquely identify values pushed onto the stream. If a value gets pushed onto the stream, is read from it, then another one is pushed on, both of them have the same predecessor, but they don't have the same stream position. It becomes more like a tree than a sequence. JS-OMeta handled this by just creating a new stream object for each argument. I didn't give up soon enough on my clever idea when initially implementing PyMeta, and it grew more complicated with each feature I implemented, involving a bunch of lists for buffering things and a complicated mark/rewind scheme.
After writing a rather complicated grammar with PyMeta, I began to wonder if I could improve its speed. By this time I knew the JS version's algorithm was less complicated so I decided to try it out. It cut my grammar's unit tests' runtime from around 40 seconds to 4 seconds. Also, it fixed a bug.
Moral of the story: I'm not going to try to implement clever optimizations until I understand the original version any more. :)
I've released a version with this new input implementation. Get it in the usual place.
9 comments:
Hey Allen,
Thanks for the fix. I have been using PyMeta some more and have found two features to be quite useful that you might consider adding:
1) namespace inheritance so grammar subclasses don't have to explicitly include whatever python names the base class refers to (maybe this is this a bug report not a feature request? not sure).
2) in grammar comments, maybe with '#'.
I have a trivial monkey-patch solution that works for me but something built in might be nice.
regards,
kmh
I'd be interested in a patch for that. You can either register a new branch on Launchpad or just post a bug with a diff and I'll have a look at incorporating it.
I have a quick question regarding pymeta: is there a reasonably easy way to match a string *without* consuming whitespace on either side? I have a grammar where whitespace (and lack thereof) is important, but I'm forced to do constructs like this:
foo ::= 'f' 'o' 'o'
While this raises a syntax error:
foo ::= 'foo'
That's how it has to be done at the moment. The other OMeta implementations do accept 'foo' and PyMeta should as well, I just never developed a need for it. It should be a pretty simple matter to change the grammar to support that - if you'd like to submit a patch I'd gladly add it.
I just discovered and installed PyMeta 0.3.0, but I am not quite sure that I understand how to use it. I looked at the README and at the example file html.py, however, the latter just defines grammars but doesn't use them.
I added the following lines to html.py to make the script do something:
parsed = TinyHTML(testSource).apply("html")
unparsed = TinyHTMLUnparser(parsed).apply("contents")
links = LinkExtractor(parsed).apply("contents")
boringified = Boringifier(parsed).apply("contents")
The output of the parser looks fine, but the three transformations don't seem to work, so I guess I misunderstand their usage:
1) The call to the unparser crashes (TypeError: sequence item 1: expected string, dict found)
2) The call to LinkExtractor returns an empty list.
3) The call to the Boringifier is identical to the first element of the output of the parser, i.e. the Boringifier doesn't change anything in the data.
I'd appreciate some more explanation!
Hi Allen,
I have been experimenting with PyMeta and reading the sources to better understand the differences with OMeta's Squeak and JavaScript implementations.
I found that PyMeta supports direct left recursion but seems to be incomplete/broken when it comes to indirect left recursion.
I used the following test inspired by the article "Pacrat Parsers Can Support Left Recursion" by Warth A, Douglass J.R. and Millstein T.
import sys
from pymeta.grammar import OMeta
# direct Left Recursion grammar
directLRGrammar = """
expr ::= <expr>:a "-" <num>:b => ["expr", a, b]
| <num>:n => ["expr", n]
num ::= <digit>+:n => ["num", n]
"""
DirectLRGrammar = OMeta.makeGrammar(directLRGrammar, {})
DirectLRGrammar("""1-2-3""").apply('expr') # direct left recursion is ok!
# indirect Left Recursion grammar
indirectLRGrammar = """
x ::= <expr>:e => ["x", e]
expr ::= <x>:a "-" <num>:b => ["expr", a, b]
| <num>:n => ["expr", n]
num ::= ( <digit>+):n => ["num", n]
"""
IndirectLRGrammar = OMeta.makeGrammar(indirectLRGrammar, {})
IndirectLRGrammar("""1-2-3""").apply('expr') # indirect left recursion is NOT ok!
and I get (wrong in the second case and no exception...)
['expr', ['expr', ['expr', ['num', ['1']]], ['num', ['2']]], ['num', ['3']]]
['expr', ['num', ['1']]]
What do you think?
Konrad: (you've probably figured it out by now, but maybe someone else will see this comment thread..)
try "[parsed]" instead of "parsed" for all three. WFM.
Can a NanoJIT be written in Pymeta?
http://www.bluishcoder.co.nz/2009/05/simple-nanojit-example.html
There are some interesting usages.
Hi Allen,
after some tests with PyMeta I decided that it fits my needs. Before starting using it I'd like to know the current status of the project, if it's still alive.
regards, Ebor
Post a Comment