The C++ runtime and generated grammars look very much the same as the java ones. There are some subtle differences though, but more on this later.
The following is a bit unix centric. For Windows some contributed project files can be found in lib/cpp/contrib. These may be slightly outdated.
The runtime files are located in the lib/cpp subdirectory of the ANTLR distribution. Building it is in general done via the toplevel configure script and the Makefile generated by the configure script. Before configuring please read INSTALL.txt in the toplevel directory. The file lib/cpp/README may contain some extra information on specific target machines.
./configure --prefix=/usr/local make
Installing ANTLR and the runtime is then done by typing
make installThis installs the runtime library libantlr.a in /usr/local/lib and the header files in /usr/local/include/antlr. Two convenience scripts antlr and antlr-config are also installed into /usr/local/bin. The first script takes care of invoking antlr and the other can be used to query the right options for your compiler to build files with antlr.
c++ -c MyParser.cpp -I/usr/local/includeLinking is done with something similar to:
c++ -o MyExec <your .o files> -L/usr/local/lib -lantlr
To get ANTLR to generate C++ code you have to add
language="Cpp";to the global options section. After that things are pretty much the same as in java mode except that a all token and AST classes are wrapped by a reference counting class (this to make live easier (in some ways and much harder in others)). The reference counting class uses
operator->to reference the object it is wrapping. As a result of this you use -> in C++ mode in stead of the '.' of java. See the examples in examples/cpp for some illustrations.
New as of ANTLR 2.7.2 is that if you supply the
buildAST=trueoption to a parser then you have to set and initialize an ASTFactory for the parser and treewalkers that use the resulting AST.
ASTFactory my_factory; // generates CommonAST per default.. MyParser parser( some-lexer ); // Do setup from the AST factory repeat this for all parsers using the AST parser.initializeASTFactory( my_factory ); parser.setASTFactory( &my_factory );
In C++ mode it is also possible to override the AST type used by the code generated by ANTLR. To do this you have to do the following:
#ifndef __MY_AST_H__ #define __MY_AST_H__ #include <antlr/CommonAST.hpp> class MyAST; typedef ANTLR_USE_NAMESPACE(antlr)ASTRefCount<MyAST> RefMyAST; /** Custom AST class that adds line numbers to the AST nodes. * easily extended with columns. Filenames will take more work since * you'll need a custom token class as well (one that contains the * filename) */ class MyAST : public ANTLR_USE_NAMESPACE(antlr)CommonAST { public: // copy constructor MyAST( const MyAST& other ) : CommonAST(other) , line(other.line) { } // Default constructor MyAST( void ) : CommonAST(), line(0) {} virtual ~MyAST( void ) {} // get the line number of the node (or try to derive it from the child node virtual int getLine( void ) const { // most of the time the line number is not set if the node is a // imaginary one. Usually this means it has a child. Refer to the // child line number. Of course this could be extended a bit. // based on an example by Peter Morling. if ( line != 0 ) return line; if( getFirstChild() ) return ( RefMyAST(getFirstChild())->getLine() ); return 0; } virtual void setLine( int l ) { line = l; } /** the initialize methods are called by the tree building constructs * depending on which version is called the line number is filled in. * e.g. a bit depending on how the node is constructed it will have the * line number filled in or not (imaginary nodes!). */ virtual void initialize(int t, const ANTLR_USE_NAMESPACE(std)string& txt) { CommonAST::initialize(t,txt); line = 0; } virtual void initialize( ANTLR_USE_NAMESPACE(antlr)RefToken t ) { CommonAST::initialize(t); line = t->getLine(); } virtual void initialize( RefMyAST ast ) { CommonAST::initialize(ANTLR_USE_NAMESPACE(antlr)RefAST(ast)); line = ast->getLine(); } // for convenience will also work without void addChild( RefMyAST c ) { BaseAST::addChild( ANTLR_USE_NAMESPACE(antlr)RefAST(c) ); } // for convenience will also work without void setNextSibling( RefMyAST c ) { BaseAST::setNextSibling( ANTLR_USE_NAMESPACE(antlr)RefAST(c) ); } // provide a clone of the node (no sibling/child pointers are copied) virtual ANTLR_USE_NAMESPACE(antlr)RefAST clone( void ) { return ANTLR_USE_NAMESPACE(antlr)RefAST(new MyAST(*this)); } static ANTLR_USE_NAMESPACE(antlr)RefAST factory( void ) { return ANTLR_USE_NAMESPACE(antlr)RefAST(RefMyAST(new MyAST())); } private: int line; }; #endif
ASTLabelType = "RefMyAST";After that you only need to tell the parser before every invocation of a new instance that it should use the AST factory defined in your class. This is done like this:
// make factory with default type of MyAST ASTFactory my_factory( "MyAST", MyAST::factory ); My_Parser parser(lexer); // make sure the factory knows about all AST types in the parser.. parser.initializeASTFactory(my_factory); // and tell the parser about the factory.. parser.setASTFactory( &my_factory );
After these steps you can access methods/attributes of (Ref)MyAST directly (without typecasting) in parser/treewalker productions.
Forgetting to do a setASTFactory results in a nice SIGSEGV or you OS's equivalent. The default constructor of ASTFactory initializes itself to generate CommonAST objects.
If you use a 'chain' of parsers/treewalkers then you have to make sure they all share the same AST factory. Also if you add new definitions of ASTnodes/tokens in downstream parsers/treewalkers you have to apply the respective initializeASTFactory methods to this factory.
This all is demonstrated in the examples/cpp/treewalk example.
This should now (as of 2.7.2) work in C++ mode. With probably some caveats.
The heteroAST example show how to set things up. A short excerpt:
ASTFactory ast_factory; parser.initializeASTFactory(ast_factory); parser.setASTFactory(&ast_factory);
A small excerpt from the generated initializeASTFactory method:
void CalcParser::initializeASTFactory( antlr::ASTFactory& factory ) { factory.registerFactory(4, "PLUSNode", PLUSNode::factory); factory.registerFactory(5, "MULTNode", MULTNode::factory); factory.registerFactory(6, "INTNode", INTNode::factory); factory.setMaxNodeType(11); }
After these steps ANTLR should be able to decide what factory to use at what time.
header "<identifier>" { }
identifier | where |
pre_include_hpp |
Code is inserted before ANTLR generated includes in the header file. |
post_include_hpp |
Code is inserted after ANTLR generated includes in the header file, but outside any generated namespace specifications. |
pre_include_cpp |
Code is inserted before ANTLR generated includes in the cpp file. |
post_include_cpp |
Code is inserted after ANTLR generated includes in the cpp file, but outside any generated namespace specifications. |
Sometimes various tree building constructs with '#' in them clash with the C/C++ preprocessor. ANTLR's preprocessor for actions is slightly extended in C++ mode to alleviate these pains.
NOTE: At some point I plan to replace the '#' by something different that gives less trouble in C++.
The following preprocessor constructs are not touched. (And as a result you cannot use these as labels for AST nodes.
if
define
ifdef
ifndef
else
elif
endif
warning
error
ident
pragma
include
As another extra it's possible to escape '#'-signs with a backslash e.g. "\#". As the action lexer sees these they get translated to simple '#' characters.
header "pre_include_hpp" {
// gets inserted before antlr generated includes in the header file
}
header "post_include_hpp" {
// gets inserted after antlr generated includes in the header file
// outside any generated namespace specifications
}
header "pre_include_cpp" {
// gets inserted before the antlr generated includes in the cpp file
}
header "post_include_cpp" {
// gets inserted after the antlr generated includes in the cpp file
}
header {
// gets inserted after generated namespace specifications in the header
// file. But outside the generated class.
}
options {
language="Cpp";
namespace="something"; // encapsulate code in this namespace
// namespaceStd="std"; // cosmetic option to get rid of long defines
// in generated code
// namespaceAntlr="antlr"; // cosmetic option to get rid of long defines
// in generated code
genHashLines = true; // generated #line's or turn it off.
}
{
// global stuff in the cpp file
...
}
class MyParser extends Parser;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...
{
// global stuff in the cpp file
...
}
class MyLexer extends Lexer;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...
{
// global stuff in the cpp file
...
}
class MyTreeParser extends TreeParser;
options {
exportVocab=My;
}
{
// additional methods and members
...
}
... rules ...