I've been converting 1.33 grammars to 2.0 and thought I'd pass along the
following tips that may help folks avoid some of the problems I've had doing
it.  I hope these tips help -- happy parsing!
Tom Nurkkala, PhD
tom.nurkkala@powercerv.com
  - Note that several of the EBNF notations have changed. In particular, the
 optional clause "{...}", has become "(...)?". This new notation for
 optional clauses conflicts with the old way to express syntactic predicates,
 which have become "(...)=>". Because you'll probably have more optional
 clauses than syntactic predicates, convert the optional clauses first, then
 go back to your old grammar, find the syntactic predicates and change them
 appropriately in the new grammar.
- Semantic actions are now delimited with "{...}" rather than the old-style
 "<<...>>" notation. This is an easy replacement to make, as there
    are
 probably few "<<" or ">>" shift operators in your old C++
    code, so you can
 do a simple search-and-replace. Note that you should change optional
 clauses from "{...}" to "(...)?" _before_ changing semantic action
 delimiters, when the old optional clauses are still easy to distinguish from
 the new semantic action delimiters.
- Probably the most challenging part of the conversion will be moving from
 the DLG-based scanner to the LL(k) scanner. Most of the conversions are
 quite mechanical, but some are not. In particular, you now have to address
 left factoring in those productions of the scanner that will return tokens
 to the parser.
- ANTLR is happiest when you use quoted strings directly in the grammar for
 keywords. Under 1.33, I had defined all my keywords as lexical tokens
 (using something like "#token K_WORD "keyword"). Although doing this avoids
 misspelling problems (e.g., using "while" in one place and "whiel"
    another),
 ANTLR 2.x is best-suited to using literals directly in the grammar because
 of the way it generates the token hash table, etc. in the resulting code.
 Watch carefully for misspellings.
- There is no #tokenclass in ANTLR 2.x. The best way to handle such cases
 appears to be to create a new production in the _parser_ that mimics the
 old-style token class (e.g., changing "#tokenclass SQLVerbs { K_SELECT,
 K_DELETE, ...}" to something like "sqlVerbs : "select" |
    "delete" | ...").
- Handling numeric literals is more problematic in 2.x. In particular, if
 you have a language that has "similar" literals (e.g., integers, reals,
 dates, times, etc. as are present in a database-focused language), you'll
 have more work to do in the LL(k) scanner environment. It appears easiest
 to collect these literals into a single scanner production and either
 left-factor or make use of syntactic predicates. You can set the token type
 in each alternative using a specific semantic action in each disjunct of the
 production (e.g., "{ _ttype = NUM_FLOAT; }"). (Note that if you use
 the -diagnostic switch on antlr.Tool, the scanner's ".txt" file includes
 what seem like spurious complaints about setting _ttype in this manner. The
 warnings can apparently be safely ignored.) See the sample Java grammars
 (particularly Scott's new one) for examples of how to do this type of thing.
- Use the "protected" flag on lexer rules that are only being used as
 "helpers" (e.g., on a "DIGIT" production that's used in other lexer
 productions for integers, floats, etc.). Not only does this make the
 resulting method in the output protected, it is also used by ANTLR to modify
 its test for ambiguous rules in the scanner, eliminating some
 "non-deterministic" warnings. See examples of this in Scott's new Java
 parser.
- When generating ASTs, it's often helpful to create "dummy" nodes that
 have a token type that's used only to make AST traversal unambiguious (i.e.,
 "flag" various subtrees so that the tree parser doesn't have to fool with
 resolving ambiguous tree structures). Under 1.3x, such dummy token types
 could be created using #token with no pattern (e.g., "#token D_DUMMY").
 Under 2.x, you can create dummy token types with a production that simply
 has the dummy values as disjuncts (for example, "dummyTokens : D_RED |
 D_GREEN | D_BLUE;"). Such a production will cause the tokens to be created,
 added to the TokenTypes output and so on. You can then refer to the dummy
 types in semantic actions used to build ASTs. Be sure NOT to refer to the
 "dummyTokens" production elsewhere in your grammar!
- Make use of the "-diagnostic" flag on antlr.Tool. The ".txt" output
    for
 your parser(s) and scanner(s) are very helpful in diagnosing conflicts and
 ambiguities. Using the txt files in conjunction with the ANTLR output
 itself is the easiest way to figure out which alternatives are conflicting
 with which when there are ambiguities. Note that when the ANTLR output
 refers to "line 0", it's really talking about the "nextToken"
    function, the
 alternatives for which will appear first in the scanner txt file.