First Steps Towards a Parser Generator

I’ve been pretty quiet the last couple of months because I have been very busy at work and at home I have been slowly chipping away at a new Parser for MetaSharp.

It’s very late at night here so I’m not going to go into a lot of detail right now but suffice it to say that currently my Parser is completely hand written and right now I’m working on a Transform step that will let me generate Parsers from a grammar. The goal is to generate the Parser from the Grammar grammar!

I still have a long way to go but here is what I do have. The following Grammar:

grammar Simple < MetaSharp.Transformation.Parsing.Common.BasicParser:
    Alphabet = (A | B | C)+;
    A = "a";
    B = "b";
    C = "c";
end

Produces the following code:

//——————————————————————————
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:4.0.21006.1
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//——————————————————————————

public partial class Simple : MetaSharp.Transformation.Parsing.Common.BasicParser
{

    internal new static System.Collections.Generic.IEnumerable<string> Imports;

    static Simple()
    {
        Simple.Imports = new string[0];
    }

    public Simple(MetaSharp.Transformation.IContext context) :
        base(context)
    {
        MetaSharp.Transformation.IRuleService ruleService =
            this.Context.Locate<MetaSharp.Transformation.IRuleService>();
        this.Add(
            newMetaSharp.Transformation.Patterns.OrPattern(
                ruleService.GetRule(“Alphabet”, Simple.Imports),
                ruleService.GetRule(“A”, Simple.Imports),
                ruleService.GetRule(“B”, Simple.Imports),
                ruleService.GetRule(“C”, Simple.Imports)));
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(AlphabetRule))]
public class AlphabetRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.OneOrMorePattern(
            new MetaSharp.Transformation.Patterns.OrPattern(
                new MetaSharp.Transformation.Patterns.OrPattern(
                    this.Rules.GetRule(“A”, Simple.Imports),
                    this.Rules.GetRule(“B”, Simple.Imports)),
                    this.Rules.GetRule(“C”, Simple.Imports)));
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(ARule))]
public class ARule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“a”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(BRule))]
public class BRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“b”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(CRule))]
public class CRule : MetaSharp.Transformation.Rule
{

    protected override voidInitialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“c”);
        base.Initialize();
    }
}

It’s a pretty trivial grammar for now but this should give you an idea of where this is going. The RuleExportAttribute inherits from the MEF ExportAttribute and the concrete IRuleService uses MEF to supply rules using these attributes.

It’s exciting!

Surprisingly Difficult to Parse a Character

I’ve been working on a custom variant of an OMeta parser for a couple weeks now. It’s coming along pretty well, I think I’ve overcome most of the major hurdles and I’m just trying to go through what I currently have, clean it up and get it to solve some of the edge cases that I need.

Just now I was working on the grammar for parsing a character and realized how hard it really is. It sounds trivial, after all it’s just two single quotes and a character right? Wrong. Here’s my current grammar:

CharacterLiteralToken
    = '\'' '\\' 'u' Hex#4 '\''
    | '\'' '\\' 'U' Hex#8 '\''
    | '\'' '\\' 'x' Hex Hex? Hex? Hex? '\''
    | '\'' '\\' ('\'' | '\"' | '\\' | '0' | 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v') '\''
    | '\'' '\u0000'..'\uffff' '\'';

It turns out that you have to be sure to account for a multitude of escape characters as well as escaped Unicode literals. I didn’t want to have to implement this, but you can see the last rule which just matches every character under the sun needed it.

This will match:

  • ‘\u0000’
  • ‘\U00000000’
  • ‘\x0’, ‘\x00’, ‘\x000’, ‘\x0000’
  • ‘\”, ‘\”‘, ‘\\’, ”, ‘\a’, ‘\b’, ‘\f’, ‘\n’, ‘\r’, ‘\t’, ‘\v’
  • ‘a’ …

Next I get to do the string parser… that should be even more interesting.

Response to Ted Neward at Oredev

I was just listening to the Ted Neward talk on .NET Rocks during Oredevfrom a while back. It’s a very interesting discussion and I would definitely recommend listening to it. Ted, Carl and Richard discuss a variety of things such as Oslo, functional programming and DSLs. There were a few things I wanted to comment on since I spend a lot of time thinking about DSLs; perhaps I can add to the discussion.

Here’s a little paraphrase that Ted boldly proclaims towards the beginning of the conversation:

The next 5 to 10 years will be about programming languages… the renaissance of the programming language.

I couldn’t agree more.

There are a lot of things said and I agree with almost everything. There was one question outstanding that I would like to try to respond to. I don’t remember who said it exactly but the generalized question was “what is the business need DSLs are trying to solve?”.

Here is my response as succinctly as I can put it:

 

Constraint increases scalability.

 

This maxim has a multitude of implications, because in order to increase scalability we are talking about code that is easier to maintain, understand and author. We are also talking about code that is consistent and accurate while still increasing productivity. There are many factors that go into scalability and I think, in general, DSLs are the solution to the broader problems. Because when you think about it the most powerful aspect of a DSL is the fact that it is a constrained universe.

So to put it in more practical terms, I think as projects become more and more complex we will need DSLs in order for it to even be possible. This idea was somewhat given to me by a talk at the Lang.NET symposium I heard by Roman Ivantsov related to the failure rate for extremely large projects.

I would also add that while it may be necessary to have DSLs for larger projects it may be merely helpful for smaller ones. But in a competitive market any little advantage you can gain in productivity is usually a critical factor in the success of your projects overall. After all, most general purpose languages we use today could be thought of as a DSL over machine code itself, it’s simply the domain of programming in this case. Another way to think about is to say that a while loop is just a construct to help us safely use the jump instruction. But nobody today would argue that using a popular general purpose language is worse than manually writing assembly code. The sorts of productivity boosts given to you by a modern programming language is undeniable and used by nearly everyone.

But back to the idea of constraint as a good thing. We normally think of flexibility in a programming language as a good thing but I would like to go ahead and claim that the opposite might likely by true. Perhaps flexibility in languages in the past was necessary because of the general nature of their purposes as well as the rapidity with which they can change (slow). There are entire classes of solutions that require lots of flexibility in order to be solved by a general purpose language.

However think of the value that design patterns bring to a general purpose language. As you’re implementing your application you start noticing patterns, places where you tend to need to do the same thing over and over, you may abstract that into a design pattern. Unfortunately due to the general nature of the languages we tend to use, it is very easy to stray from the patterns we have created and do something that breaks that pattern due to ignorance, or haste or whatever. This usually results in an increase in complexity and maintenance. And there is nothing more permanent than a temporary fix.

For example, just because you’ve gone through great pains to create a Data Access Layer and Business Logic Layer, it doesn’t mean that somebody won’t spin up a SqlConnection in the middle of your view and screw everything up. There are ways to mitigate this, code reviews, static analysis, education,  etc. but these are all simply other forms of constraint. What if the very language you were using to create your application didn’t even have the capability to stray from the accepted design patterns. What if your view was written in a declarative DSL specifically for authoring views where accessing data was completely agnostic to the implementation, and interpreted at design time or runtime to always go through the accepted channels? This is how a DSL can increase scalability.

 

Design patterns are DSLs.

 

Any where you can abstract your application into a design pattern you should be able to create a DSL to express that design pattern. Additionally, those DSLs should be re-usable and implementable in any application. An interesting example of this is the Axum programming language, specifically designed to solve the problem of concurrency by creating a language constrained to enable concurrent code safely. Under the hood the code created is something you could have done manually in any general purpose language but the more we can be constrained and declarative about such things the less error prone the underlying code will be. Additionally it helps us to easily understand and implement highly complex code, which increases productivity. Even the smartest developers have a hard time getting concurrency right, we really need constraint in this domain because its incredibly easy to do something unsafe.

There are a few things we need in the programming community in order to make any of this feasible, which we are currently lacking. I have been working on MetaSharp specifically to solve some of these issues, but it has a long way to go. Here is a brief list off the top of my head of problems needing to be solved by a DSL tool:

  • An open, transparent, extensible, language agnostic compiler.
  • A common, extensible AST.
  • An excellent, runtime based grammar parser.
  • Common transformations.
  • IDE support, for debugging transformations as well as author time feedback and visualization.

I could go on and on… but look forward to our future of programming languages. In the near future we may finally be equipping ourselves with the right tool for the job at hand.

Dynamic Pipelines and Debugging DSL Transformations

I’ve been working on a dynamic transformation library for MetaSharp the last couple of weeks. I think I’m finally past the prototype phase and am almost ready to make it real. There is still lots of work for it to be actually usable but I think that the infrastructure is in place to enable me to solve all of the rest of the problems. Here are some samples of how I have it working so far:

A Simple Eval

int z = pipeline.Eval<int>("x * y", new { x = 3, y = 7 });

Delegate Compilation With a More Complex Context

public class MathContext
{
    public double x;
    public double y;

    public double sin(double d)
    {
        return Math.Sin(d);
    }

    public double cos(double d)
    {
        return Math.Cos(d);
    }
}
Func<MathContext, double> equation = pipeline.Compile<MathContext, double>("sin(x) * cos(y)");
double result = equation(new MathContext { x = 3, y = 7 });

I want to allow you to add white-listed namespaces to the pipeline also to allow “new” objects and static methods to be called using those namespaces. And of course the actual code can be more complex then one line expressions. The new expressions in .net 4 include all statements as well. Hopefully we’ll get better support for dynamic types in the future, but I may have some tricks up my sleeve for doing that as is, we’ll see.

Anyway, one of the things I needed to do to get it to work was to create a more robust object as a result of a transformation. Previously you’d get in some object then simply yield whatever you wanted to transform it into. This was ok for the CodeDom where you could stuff the original AST nodes into a metadata dictionary on any CodeObject but it doesn’t work for the DLR objects. At some point you need to know what a node came from in order to understand the context of how to apply it in a following transformation. For example, a method invocation node, needs to know if what it is invoking is a property, field or method reference and do something different for each case.

So, now when you yield transformed objects it stuffs them into a Transform<T>. Which contains the original node, the yielded transformation nodes and all child Transforms. The end result is 3 trees, the original AST tree, the new AST tree and a tree showing the relationship between the two trees.

This is necessary to actually do transformations but one of the cool side effects I want to play around with is that the relationship tree (the tree of Transform<T> objects) could be really cool to visualize. I really want to make a Debugger Visualizer where you can pop open a dialog and see a visual representation of the transformation. I’m envisioning something like this:

transform

Where as your mouse moves over a node in the original tree you show the nodes it transformed into. In this image I have a AST representing a Song DSL. When run through the pipeline it transforms into CodeDom objects (or whatever but CodeDom in this case). Here you can see the Note is being transformed into objects that would generate code that looks like this “this.Play(key, octave, duration)”. As you move up to the bar you’d see a more complete representation of the generated tree and as you move up to Song you’d see the entire tree.

That is the ultimate goal, in reality, to do this transformation takes several steps and it would be more complicated than this. But now that we have all of the Transform trees you could track these sorts of things and display it visually like this. A tool such as this could be invaluable for debugging complex transformations!

Dynamic Pipeline in MetaSharp

I’ve been working on a prototype of a Dynamic Pipeline for MetaSharp using the new Linq Expressions in .NET 4. I’m pretty darn excited about it, here is what I have working so far:

[Test]
public void BinaryVariablesTest()
{
    var result = this.pipeline.Eval<int>("x * y", new { x = 3, y = 7 });
    Assert.AreEqual(21, result);
}

What I’m doing here is compiling the string as MetaSharp code then using a node visitor to generate the appropriate linq expressions. The trick here is the context parameter being passed in. This object contains all of the properties and methods allowed to be called from the expression, essentially it’s the “this” parameter. To get this to work you can use the handy dandy ConstantExpression. Here is how I’m doing it:

public class ReferenceExpressionVisitor 
    : TransformVisitor<linq.Expression, ReferenceExpression>
{
    protected override IEnumerable<linq.Expression> Visit(
        IEnumerable<linq.Expression> items, 
        ReferenceExpression visitedObject)
    {
        IDynamicContextService parameterService = 
            this.ServiceProvider.Locate<IDynamicContextService>();
    
        yield return linq.Expression.PropertyOrField(
            linq.Expression.Constant(parameterService.Context),
            visitedObject.Name);
    }
}

The constant allows us to call properties or fields on the context object directly. You should even be able to attach lambda expressions to fields so you can call those as well. I have to do some restructuring in order to enable everything like Method calls and whatnot but it should all be doable. Very sweet. Also the dynamic pipeline will have to have some type of caching provider, clearly you wouldn’t want to rebuild these expressions every single time you try to execute it. Optionally I could just give back a delegate instead of executing it directly, then let the caller do all the caching. That sounds easier and more flexible actually now that I think about it.

What’s cool about this though isn’t the fact that you can compile expressions dynamically (which is pretty cool but there are others out there already) but the fact that you will be able to dynamically compile DSLs… which reduce to dynamic expressions. Imagine this:

when truck is late:
  apply discount(.1);

when truck is early:
  apply bonus(.1);

when truck cost > $10,000:
  notify

You compile your DSL into MetaSharp nodes, transform it into Common language nodes then transform that into dynamic linq expressions. Very sweet! You might transform this into something like:

if(truck.late)
{
    context.discount(.1);
}
if(truck.early)
{
    context.bonus(.1);
}
if(truck.cost > 10000)
{
    context.notify();
}

Where you’d have a custom ShippingContext object with all of the methods and properties you wanted to expose in your DSL. This would be really handy for creating all sorts of systems that have the potential to have rapidly changing rules and also, potentially, enable business analysts to author their own rules.