The Switch Pattern

I recently discovered a solution to a problem that has been stumping me, with regards to pattern matching, for a while. The problem is with Direct Left Recursion and it being too greedy in some cases.

For example suppose you have the following string you are trying to parse:

var input = "x.y.z";

This string represents an expression tree, in this case three reference expressions to x, y and z respectively. This is very easy to parse using direct left recursion, with the following grammar:

grammar DlrExample < Parser:
            
    Main = Expression;
    Expression 
        = target:Expression "." name:Identifier 
            -> new ReferenceExpression(target, name)
        | Identifier;
end

Hopefully above you can see that the Expression rule is referencing itself in the left most part of it’s first rule. This will cause a direct left recursion to occur which is solved by the MetaSharp packrat parser by keeping a stack of called rules and memozing returned values. You can easily add more types of expressions to this system, such as MethodInvokeExpressions that would match “x()”, for example. The solution to direct left recursion triggers something called Grow LR which will continue to invoke the Expression rule over and over, returning it’s previous result until it either fails or returns a lesser match.

The problem with this technique is that the growth stage is greedy. Meaning it will grow and match all of that expression and make it impossible to match the ending expression.

For example, an AttachDelegateStatement assumes that the last expression in the tree is a reference expression. See the below snippets for an example of this:

x.y.z += Foo; // valid
x.y().z += Foo; // valid
x.y.z() += Foo; // fail!

It is not valid for the AttachDelegateStatement to be attached to a MethodInvokeExpression, it must be a ReferenceExpression but the Grow LR will match the entire expression leaving you no way to determine if the last item is what you are wanting.

The solution to this problem is a combination of the Variable, Object and Switch patterns. The Switch pattern is the real magic here, it lets you switch the input of the stream from the standard input to projections assigned to previously parsed variables. This lets you perform pattern matching on your output directly inline. Here is an example of the solution (probably not 100% correct, for illustration purposes only) to the problem of the AttachDelegateStatement using the Switch pattern:

grammar DlrExample < Parser:
            
    Main = DelegateAttachStatement;
    Expression 
        = target:Expression "." name:Identifier "(" ")"
            -> new MethodInvokeExpression(target, names)
        | target:Expression "." name:Identifier 
            -> new ReferenceExpression(target, name)
        | Identifier;

    ReferenceExpression 
        = exp:Expression (ReferenceExpression { })(exp) -> exp;

    DelegateAttachStatement
        = target:ReferenceExpression 
          "+=" 
          method:ReferenceExpression
            -> new DelegateAttachStatement(target, method);
end

The interesting part of this is the ReferenceExpression rule, specifically the (…)(exp) which is the switch pattern. You can see that it first calls Expression but after that it is Switching on the “exp” variable. Here is the interpretation of that rule in C#:

var ReferenceExpression = Pattern.Rule(
    Pattern.Projection(
        Pattern.And(
            Pattern.Variable("exp", this.Expression),
            Pattern.Switch(
                "exp",
                Pattern.Object(
                    Pattern.Type(typeof(ReferenceExpression))))),
        s => s.Variables["exp"]));

So we’re switching the stream to the value stored in “exp”, then we are applying the Object match to verify that the result is a ReferenceExpression, if it is then the pattern is matched successfully and we simply project out exp. And remember, all of these rules are generating memos so even if this pattern fails it’s likely to be quick for other rules to match.

This is really powerful because it essentially solves the problems of the greedy Grow LR! I am also using this pattern to detect the difference between projections with single expressions and multi-statement blocks, in order to determine the need for ending semi-colons or not. This is very powerful stuff!

Author: justinmchase

I'm a Software Developer from Minnesota.

Leave a Reply

%d bloggers like this: