MGrammar Tokenization

I’m still having some fun playing around with MGrammar but I was pretty stumped by a seemingly simple issue for a while. My problem was how to tokenize Type names vs. Reference Expressions. For example consider the following C# code:

namespace Testing.Example

Now a simple reference expression:

blah.Foo.Bar();

In the first example, I essentially want the entire string, where as in the second example I have three nested references to variables or members. My first inclination was to tokenize this with something similar to this:

token Identifier = Letter+ (Letter | Number)*;
token TypeIdentifier = Identifier ("." Identifier)*;

syntax Namespace = "namespace" TypeIdentifier;
syntax ReferenceExpression = (Identifier ".")* Identifier;
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

This is a very rough example but the idea was to have two different types of identifiers, one for each syntax to use. Well it turns out this doesn’t really work and I think the reason is because the tokenization happens before the syntax evaluation. Meaning if you have a string such as “blah.Foo.Bar” and it matches the TypeIdentifier then that is what it will be. If your syntax expects an Identifier token but gets a TypeIdentifier token you’re out of luck. You cannot optionally get one token or another. There is one solution in MGrammar and that is to use the “final” keyword. Such as:

token Identifier = Letter+ (Letter | Number)*;
final token Namespace = "namespace";
syntax NamespaceExpression = Namespace TypeIdentifier;

This will ensure that the string “namespace” will match the Namespace token rather than the more liberal Identifier token. But that is about it, you get, essentially two levels of priority. This is fine though, you just have to rely more on your syntax’s (syntaxii?) and post processing. For example you might do this:

token Identifier = Letter+ (Letter | Number)*;

syntax NamespaceIdentifier = n:(n:NamespaceIdentifier "." => n)? i:Identifier 
    => NamespaceIdentifier { Parent = n, Name = i };
syntax NamespaceExpression = Namespace i:NamespaceIdentifier 
    => Namespace { Name = i } ;

syntax ReferenceExpression = r:(ReferenceExpression "." => r)? i:Identifier
    => ReferenceExpression { Target = r, Name = i };
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

Of course this is very simplistic and doesn’t take into account other types of expressions but hopefully you get the idea. The trick is that your graph output for the namespace doesn’t yield a long string but instead it yields a nested node structure. When processing the graph you have to take this into account but that isn’t too hard.

Author: justinmchase

I'm a Software Developer from Minnesota.

Leave a Reply

Discover more from justinmchase

Subscribe now to keep reading and get access to the full archive.

Continue reading