Generating good syntax errors using a table-driven parser: apparently it is possible after all.
The objection to parser generators that seems to resonate most is that generators like yacc produce inadequate error messages, little more than “syntax error.†Better error messages were one of the key benefits hoped for when g++ converted from a yacc-based C++ parser to a hand-written one (and to be fair, C++ syntax is nearly impossible to parse with any tool; the many special cases cry out for hand-written code). Here the objection seems harder to work around: the parser internally gets compiled into an automaton—usually a big table of numbers—that moves from state to state as it processes input tokens. If at some point it can’t find a next state to go to, it reports an error. How could you possibly turn that into a good message?
Difficulty reporting useful error messages is one reason I’ve abandoned parser generators. I’m still not interested in going back any time soon – parsing just isn’t that hard! – but parser generators are probably not going away, so it’s interesting to see a technique for getting useful results out of them.
I think it should be possible to do, I just don’t see any major parser generator libraries actually doing anything about it. I mean, the theory is easy enough to describe. If you don’t match a state in the table, attempt all token inputs and see which ones generate a new, legal state. Push those onto a list and report back once you’ve exhausted all tokens.
Comment by Aaron Ballman — January 27, 2010 @ 2:01 pm
Unrelated: I’ll give you a dollar if you add an RSS feed to this site. And yeah, I know all the reasons you haven’t, but I’m hoping a dollar will sway you. :)
Comment by Sara — January 27, 2010 @ 4:07 pm
I’ve had an RSS feed for years! It’s here:
feed://www.redecho.org/feed/
Comment by Mars Saxman — January 27, 2010 @ 4:19 pm