Saturday, June 24. 2006PHP_ParserGenerator and PHP_LexerGeneratorTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Very nice - you are becoming a lexer/parser adict
One idea that struck me from reading this, was that it may be possible to generate PHP from flex - If you remember that php extension I showed you htmltokenizer, that used flex to generate a C file, then stripped it into it's component parts using a few simple perl routines. then put it back together so that it worked within another file. Theory, is that Lex just overlays the data structures with a template -> changing the template, and slightly modifying the data structs might work... Mind you hand written lexer's apparently (from what I've heard) tend to be faster anyway, especially if the grammer is simple
Greg, this is very intestesting stuff. I'll be following these developments more closely and pondering how they might be used for an upcoming version of a certain project that I won't mention here. If I have any actual useful comments I'll be sure to forward them to you. In the meantime, best regards and thanks for all the fish!
This is a very interesting post, I always wanted to see how lemon works, now I can look at it in PHP which is more readable for me than just C. Thanks for these two packages proposals.
Interesting stuff - been comparing with Marcus Baker's lexer implementation in SimpleTest (lacks a lexer generator), which I used for Dokuwiki parser.
It also uses the approach with subpatterns - code here Don't know whether you'd consider adding something like that but what's a little different about this lexer is it comes with a stack machine. Marcus described it a little once here That approach means much more state is handled by the lexer. For the purposes of doing something like a BBCode or wiki parser, which outputs HTML, to an extent the parser may be able to be "state free" - it simply responds to tokens by spitting out the corresponding HTML. It seems that ANTLR v3 uses a stack in it's lexer and argues the value of that approach; QUOTE: You can recognize complicated tokens such as HTML tags or "executable" comments like the javadoc @-tags inside /** ... */ comments. The lexer has a stack, unlike a DFA, so you can match nested structures such as nested comments. ... but it's difficult to find decent discussion of the merits of this approach. The most I've ever found is here & here - "Deterministic Pushdown Finite Automata (DPDFA's)" Somehow would need a different input format, to allow a hierarchy of patterns / states. Also when you have one pattern which is able to nest under many other patterns, needs some kind of short cut syntax (e.g. BBCode the {i}italic{/i} syntax might nest within {b/}, {url/} and many others and ideally you'd only want to describe that once). Anyway - waffling now. One practical thought - in general for doing parsers, lexers in pure PHP, my feeling is the fewer function / method calls you have, the faster it's going to be based on Dokuwiki / XML_HTMLSax experiences - in general the most expensive part was executing callbacks etc. When you've got one callback per token, that's expensive. This is also illustrated by the SAX extension vs. the XMLReader extension - with the latter, most of the function calls are happening within the extension - the calling PHP code could just be a procedural loop - basically the OPCODE argument you mentioned. Haven't perfected the approach yet but seems the alternative is to have lexers / parsers which emit arrays of arrays, rather than executing callbacks. A naive implementation trades reduces the cost of executing user functions at the price of increased memory use (to store long arrays of tokens). The best next guess I have is delivering tokens in batches - e.g. 100 at a time... <br /> while ( $tokens = $lexer->getNext(100) ) {<br /> switch($tokens['name']) {<br /> <br /> // Once case for each token - would be nice to have goto...<br /> <br /> }<br /> }<br /> Anyway - great stuff you've done here - got me provaricating
hi
is the php4 equivalent of your lexer/parser generator still available? looks great! thanks conrad
There never was a php4 version, perhaps you're thinking of someone else's work?
It's in German, but here you can find a complete example using PHP_LexerGenerator and PHP_ParserGenerator, implementing a simple calculator:
http://blog.oncode.info/2007/10/25/eine-eigene-programmiersprache-erschaffen-lexer-und-parser-in-php/ I hope that PHP_ParserGenerator will soon include a meaningful example... Anyway, great work!
Thanks for that example. It was helpful even in German! I was having trouble figuring out if the lexer and parser somehow shared token constants, or if I needed to do it manually.
$this->token = Parser::TOK_CONST; is just what I needed. |
Calendar
CategoriesPopular EntriesSetting up your own PEAR channel with Chiara_PEAR_Server - the official way
(36) Do you develop a website? It is infinitely better to synchronize live and development sites using the PEAR Installer(25) How to put the FAIL in open source(22) doing the PEAR thing(19) Using PEAR 1.4.0 to install PEAR packages on a remote host(19) phpDocumentor and __get/__set/__call - give us your ideas (RFC)(17) PEAR now fits in a bottle: meet go-pear.phar(17) Mac OS X ships with security hole-laden PEAR - how to upgrade immediately(16) Introducing pecl extension phar(13) go-pear.phar works! In related news, PHP_Archive is now PHP 5.1.0+(12) |
|||||||||||||||||||||||||||||||||||||||||||||||||