Post

How to Use ANTLR 4 on .NET

Learn how to effectively use ANTLR 4 on .NET and optimize your parsing experience.

Update: A new article is now alive.

I once blogged heavily about how to use ANTLR 3 on .NET, and you can find all links from here.

Interesting that when I prepared those materials I was fully aware of the upcoming ANTLR 4, and kept an eye on it until right now. In the past two weeks I converted a very large grammar file from ANTLR 3 to ANTLR 4, and this post aims to show some of the hints and tips.

What’s new in ANTLR 4?

The Java side is always the bleeding edge for ANTLR, so this time you also need to learn about ANTLR 4 from Java.

Just read some and make sure you get basic understanding. They can help you a lot but later I am going to reveal more details.

What’s not good?

A few things have changed dramatically, including

  • ANTLRWorks 2 is NetBeans based and almost rewritten. The debugging experience becomes pretty painful, like sample text must be fed by files , and we lose the power to debug the grammar token by token.
  • There is no more a C# port of the ANTLR compiler (grammar to C#) and we will have to use the Java version. Although this time the NuGet packages ease most of the pains, we get a dependency on JVM on development machines.
  • There are lots of changes required to modify the grammar files. Well, let’s revisit those later.

Starting Point - Replace ANTLR 3 with ANTLR 4 at Project level

With the project opened in Visual Studio, add two NuGet packages to it, Antlr4 and Antlr4.Runtime,

1
2
Install-Package Antlr4
Install-Package Antlr4.Runtime

The runtime is the usual C# runtime assembly just like what we link to in ANTLR 3. The Antlr4 package is a new addition, which packs the Java version ANTLR compiler, as well as the MSBuild related bits. During installation, your project file is going to be updated with those MSBuild targets, so that later ANTLR 4 grammar files can be automatically processed. But since we already have ANTLR 3 targets registered, it is mandate to remove ANTLR 3 items from your project file.

Second Step - Grammar Files

Exclude the previous grammar files, and change their extension to .g4. That’s required by ANTLR 4. Open them in ANTLRWorks 2 and do the following,

  1. Make sure all parser rules are above lexer rules. This just requires some cut and paste.
  2. Remove all predicates as ANTLR 4 claims to be able to handle them all. Anyway we will fix issues later.
  3. If you define returns like [Something result = new Something()], now it is time to move the constructor part to action { $result = new Something(); }. Yes, actions are still fully supported by ANTLR 4, and you don’t really need to remove them at this moment.
  4. Use _? instead of _ as options {greedy=false} is gone.
  5. Use ($name) in actions instead of $name if there is any error.
  6. To ignore something or put to hidden channel, now use –> skip or –>channel(HIDDEN).

Now you can add new grammar files back to the project and see how Visual Studio is smart enough to identify them and set ANTLR related build actions to them.

Third Step - Regression and Debugging

As a parser developer you must have lots of test cases to test behaviors of your parser. Now it is the time to run a full regression and locate issues due to ANTLR 4 upgrade. I did find a lot but luckily most of them can be easily fixed by slightly modifying the grammar (such as changing priority of rules, and avoiding ambiguity).

When it is tough to follow my previous dual file approach to debug in ANTLRWork 2, I find it more convenient to debug the actual generated parser in Visual Studio. ANTLR 4 does generate cleaner code and it is quite easy to follow the stream.

I would say “Patient, guys” here as you will find a way out. If not, make sure you post your question to StackOverflow with ANTLR4 tag.

Last Step - Embrace ANTLR 4 Fully

If you finish the first steps, you get your ANTLR 3 legacy working on ANTLR 4. But actions are something the designer wishes us to remove. Now both listeners and visitors can be generated by ANTLR for you to iterate all nodes on AST, it gives a possibility to just write the no-action grammars and attach actions via those interfaces. The final result will be a programming language neutral grammar.

That’s something wonderful, but I am not yet ready to explore. You must search and read the great posts for those topics.

Good luck and stay tuned.

© Lex Li. All rights reserved. The code included is licensed under CC BY 4.0 unless otherwise noted.
Advertisement

© - Lex Li. All rights reserved.

Using the Chirpy theme for Jekyll.

Last updated on September 04, 2024