Language Workbench Competition

I’m actually pretty close to having all of the MetaSharp grammars translated into .g files and Attila is working on the msbuild compiler components. Once we have them both done and merged we will have achieved dogfood status. There are plenty of Grammar and Lang features I want to work on still but the next major feature will be the Language Workbench, or in other words the interactive editor (I want to have some type of grammar REPL for the tclangug presentation I will be giving next month at least).

I just recently learned about the Code Generation Conference and now I am learning that there was also a Language Workbench website and that they sponsored a competition that went along with the conference. The most interesting part of this is the description of what a language workbench should do and how to rate them:

http://www.languageworkbenches.net/LWCTask-1.0.pdf

Which almost reads like a workbench spec! So this is very exciting, I have been thinking about what a MetaSharp workbench should be like and I think that I have a pretty good idea of what the external expectations are now, so this is good.

What’s funny about MetaSharp is that it is designed to solve all of the “Advanced” scenarios but currently doesn’t solve any of the “Basic” scenarios, lol. I’m not sure if that’s a good sign or a bad sign yet…

The Switch Pattern

I recently discovered a solution to a problem that has been stumping me, with regards to pattern matching, for a while. The problem is with Direct Left Recursion and it being too greedy in some cases.

For example suppose you have the following string you are trying to parse:

var input = "x.y.z";

This string represents an expression tree, in this case three reference expressions to x, y and z respectively. This is very easy to parse using direct left recursion, with the following grammar:

grammar DlrExample < Parser:
            
    Main = Expression;
    Expression 
        = target:Expression "." name:Identifier 
            -> new ReferenceExpression(target, name)
        | Identifier;
end

Hopefully above you can see that the Expression rule is referencing itself in the left most part of it’s first rule. This will cause a direct left recursion to occur which is solved by the MetaSharp packrat parser by keeping a stack of called rules and memozing returned values. You can easily add more types of expressions to this system, such as MethodInvokeExpressions that would match “x()”, for example. The solution to direct left recursion triggers something called Grow LR which will continue to invoke the Expression rule over and over, returning it’s previous result until it either fails or returns a lesser match.

The problem with this technique is that the growth stage is greedy. Meaning it will grow and match all of that expression and make it impossible to match the ending expression.

For example, an AttachDelegateStatement assumes that the last expression in the tree is a reference expression. See the below snippets for an example of this:

x.y.z += Foo; // valid
x.y().z += Foo; // valid
x.y.z() += Foo; // fail!

It is not valid for the AttachDelegateStatement to be attached to a MethodInvokeExpression, it must be a ReferenceExpression but the Grow LR will match the entire expression leaving you no way to determine if the last item is what you are wanting.

The solution to this problem is a combination of the Variable, Object and Switch patterns. The Switch pattern is the real magic here, it lets you switch the input of the stream from the standard input to projections assigned to previously parsed variables. This lets you perform pattern matching on your output directly inline. Here is an example of the solution (probably not 100% correct, for illustration purposes only) to the problem of the AttachDelegateStatement using the Switch pattern:

grammar DlrExample < Parser:
            
    Main = DelegateAttachStatement;
    Expression 
        = target:Expression "." name:Identifier "(" ")"
            -> new MethodInvokeExpression(target, names)
        | target:Expression "." name:Identifier 
            -> new ReferenceExpression(target, name)
        | Identifier;

    ReferenceExpression 
        = exp:Expression (ReferenceExpression { })(exp) -> exp;

    DelegateAttachStatement
        = target:ReferenceExpression 
          "+=" 
          method:ReferenceExpression
            -> new DelegateAttachStatement(target, method);
end

The interesting part of this is the ReferenceExpression rule, specifically the (…)(exp) which is the switch pattern. You can see that it first calls Expression but after that it is Switching on the “exp” variable. Here is the interpretation of that rule in C#:

var ReferenceExpression = Pattern.Rule(
    Pattern.Projection(
        Pattern.And(
            Pattern.Variable("exp", this.Expression),
            Pattern.Switch(
                "exp",
                Pattern.Object(
                    Pattern.Type(typeof(ReferenceExpression))))),
        s => s.Variables["exp"]));

So we’re switching the stream to the value stored in “exp”, then we are applying the Object match to verify that the result is a ReferenceExpression, if it is then the pattern is matched successfully and we simply project out exp. And remember, all of these rules are generating memos so even if this pattern fails it’s likely to be quick for other rules to match.

This is really powerful because it essentially solves the problems of the greedy Grow LR! I am also using this pattern to detect the difference between projections with single expressions and multi-statement blocks, in order to determine the need for ending semi-colons or not. This is very powerful stuff!