First Steps Towards a Parser Generator

I’ve been pretty quiet the last couple of months because I have been very busy at work and at home I have been slowly chipping away at a new Parser for MetaSharp.

It’s very late at night here so I’m not going to go into a lot of detail right now but suffice it to say that currently my Parser is completely hand written and right now I’m working on a Transform step that will let me generate Parsers from a grammar. The goal is to generate the Parser from the Grammar grammar!

I still have a long way to go but here is what I do have. The following Grammar:

grammar Simple < MetaSharp.Transformation.Parsing.Common.BasicParser:
    Alphabet = (A | B | C)+;
    A = "a";
    B = "b";
    C = "c";
end

Produces the following code:

//——————————————————————————
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:4.0.21006.1
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//——————————————————————————

public partial class Simple : MetaSharp.Transformation.Parsing.Common.BasicParser
{

    internal new static System.Collections.Generic.IEnumerable<string> Imports;

    static Simple()
    {
        Simple.Imports = new string[0];
    }

    public Simple(MetaSharp.Transformation.IContext context) :
        base(context)
    {
        MetaSharp.Transformation.IRuleService ruleService =
            this.Context.Locate<MetaSharp.Transformation.IRuleService>();
        this.Add(
            newMetaSharp.Transformation.Patterns.OrPattern(
                ruleService.GetRule(“Alphabet”, Simple.Imports),
                ruleService.GetRule(“A”, Simple.Imports),
                ruleService.GetRule(“B”, Simple.Imports),
                ruleService.GetRule(“C”, Simple.Imports)));
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(AlphabetRule))]
public class AlphabetRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.OneOrMorePattern(
            new MetaSharp.Transformation.Patterns.OrPattern(
                new MetaSharp.Transformation.Patterns.OrPattern(
                    this.Rules.GetRule(“A”, Simple.Imports),
                    this.Rules.GetRule(“B”, Simple.Imports)),
                    this.Rules.GetRule(“C”, Simple.Imports)));
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(ARule))]
public class ARule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“a”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(BRule))]
public class BRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“b”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(CRule))]
public class CRule : MetaSharp.Transformation.Rule
{

    protected override voidInitialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“c”);
        base.Initialize();
    }
}

It’s a pretty trivial grammar for now but this should give you an idea of where this is going. The RuleExportAttribute inherits from the MEF ExportAttribute and the concrete IRuleService uses MEF to supply rules using these attributes.

It’s exciting!

Out with Code Generation and in with Transformation

As I’ve been playing around with DSLs for the past couple of years I’ve been focused on Code Generation as my primary strategy. This is all well and good and I think that code generation still servers its purpose in the greater world of DSLs but it’s not quite good enough. I would like to start using the word Transformation as more generalized form of code generation and manipulation. What I used to refer to as Code Generation I will now simply call Textual Transformation. The other main form of Transformation is an AST Transformation. The Groovy folks have also adopted this to be synonymous with Compile-time Meta Programming and the Boo folks would call this a Syntactic Macro.

In order to promote the DRY principle and really allow N levels of arbitrary transformations I’ve been busy changing MetaSharp to adopt the Pipelinepattern for the compilation process (according to that wikipedia article what I have now is more of a psued-pipeline though, since each step is done synchronously). The end result is pretty simple actually.

image

The Pipeline has a series of steps and a collection of services. Each step depends on certain services and may alter / create certain services. In this way each step can be completely re-usable for different compilation scenarios. For example the MetaCompilePipeline has three steps:

  1. MetaSharpCompileStep
  2. CodeDomTransformStep
  3. CodeDomCodeGenerateStep

Which is to say, if you want to compile MetaSharp code inside of a project of a different language your pipeline needs to perform those three steps. First compile the code into MetaSharp AST nodes. Second transform those nodes into CodeDom objects. Third use a CodeDomProvider to generate code based on those CodeDom objects. The MetaTemplatePipeline is the same as the above with one extra step at the beginning, for transforming the code into something else.

The point here though, is that key to this whole process is the idea of Transformation. In fact the whole theory behind MetaSharp is simply to be a transformation tool. Each step is simply transforming the results of the previous step into something else. This is powerful because your DSL can consist of arbitrary levels of transformation, litterally your DSL could transform into a lower level DSL, which transforms into an even lower level DSL, etc. all the way down to machine code.

Transformation isn’t a new concept it’s, been around forever. At the very root of any software is essentially a bunch of 1’s and 0’s but we haven’t written raw 1’s and 0’s as our software for a long time. The compiler has always been a way for us to transform slightly more complex concepts into lower level ones. Even the extremely low level machine code is a step above raw 1’s and 0’s. General purpose programming languages themselves consist of constructs used to transform into much more verbose machine code or IL.

Taking transformation to the next level of abstraction is necessary for us to effectively create DSLs. If there was a tool to help us easily perform those transformations then it would go a long way towards making external DSL authoring more realistic, which is what I’m hoping to do with MetaSharp.

So to me, at this point, Code Generation is just another form of Transformation, which I will be calling “Textual Transformation” from now on. It has its pros and cons, of which I hope to discuss further in other posts. However, my point today is simply to convey the idea of Transformation as more general and more important to the DSL world than simply Code Generation and also to consciously force myself to update my lexicon.

MetaSharp Vision for the Future

I was just having some ideas and wanted to put it down somewhere partly for myself and partly to get some feedback.

One of the next things I want to do is to convert the compile-to-CodeDom parts of MetaSharp into a Vistor Pattern so that I can use the same system to compile to CodeDom or Generate MetaSharp or to transform the AST or whatever I want. This will bring a lot of flexibility and power to the whole system.

I was thinking about a post by Ayende Rahien the other day called M is to DSL as Drag and Drop is to Programming and specifically I was thinking about the quote “If you want to show me a DSL, show me one that has logic, not one that is a glorified serialization format.“ And what I took this to mean is that there is no logic in this DSL. Which can still be declarative but will often time have concepts like less-than or greater-than or equal-to. Certainly not limited by this but these are fairly common. To me his complaint (which is valid) is that with an external DSL, no matter how easy it is to write a grammar, it is still hard to expression logic with a grammar, and furthermore it is just as hard to translate that logic into something executable.

With an internal DSL, such as you get with Boo, you can easily just author keywords for your DSL but you get all of the logical operators for free, which is very nice of Boo. But unfortunately with an internal DSL you not only get the logical operators for free you are forced to get them. With an internal DSL you can do less work to get it working but you are not operating in a constrained universe. This has trades offs but lets certainly not dismiss it. There plenty of use cases where this is the preferred way of doing it.

However there are some distinct benefits of an external DSL, the major tradeoff being the effort required to implement it. The main benefit is that you can constrain your universe such that only allowable logic can happen in the correct spots. It’s like a sandboxed language, which I like to call a constrained universe. And believe it or not constraint can actually be freeing.

So my sudden flash of insight this morning was when I realized that actually, with MGrammar, you can choose to import grammars defined in other assemblies and use the syntax and tokens defined there. So when you choose to use MetaSharp by adding a reference to the assembly you can actually also import the MetaSharp.Lang grammar and easily make use of the BinaryExpression syntax in your own DSL (or anything else). Then I was also thinking that you could probably make use of the same AST serialization tools and (soon to be) AST transformation Visitors to build your own DSLs without a lot of the extra work. Using that type of system you could probably transform directly into executable code completely without using the templating at all, haha! Simply transform your custom AST nodes into standard supported Nodes, or write your own visitor that can handle your custom nodes. Your custom visitor could probably also tap into the templating system so you could write the AST transformation as a MetaSharp template if you desired as well.

This would put MetaSharp into the role of being an extensible compiler system where custom external DSLs can opt-in to standard language grammar where appropriate rather than not even being able to opt-out as in current internal DSLs. This is powerful idea and I think it is well within my grasp.

MetaSharp on CodePlex

I finally managed to get the source code for the little side project I’ve been working on into CodePlex.

http://metasharp.codeplex.com

Give it a shot and let me know how it works out for you! There are definitely known limitations at this point, there is almost certainly language syntax that doesn’t work right for starters. Most of the basic stuff is there but l haven’t really tested abstract members much or events and things like that. I also haven’t implemented “macros” yet, which isn’t very hard to do in the parser but is much harder to do in the compiler. I’ll probably hook that up for v0.2.

I’m also in dire need of cleaning up some regular project maintenance stuff such as versions for the assemblies, automated build, many more unit tests, code analysis, etc.

If you’re still scratching your head about what I’m even trying to do here check out these two wiki pages

Basic Workflow

http://metasharp.codeplex.com/Wiki/View.aspx?title=Workflow

Song Sample

http://metasharp.codeplex.com/Wiki/View.aspx?title=Song%20Sample

If you’re still scratching your head please let me know, because I’m either not explaining it right or this is more confusing than I thought.

By the way, this project is not an official Microsoft project in any way. This is something I have done in the evenings and weekends for my own enjoyment. Please don’t ask anyone else for support and don’t blame them if you think this project is lame 🙂

MetaSharp code generation success!

I finally have an end to end example of code generation with MGrammar I’m happy with. It’s pretty simple too I think. What I have working right now is the scenario where you author a grammar for use as an external DSL and would like to use it in existing applications. You are able to add templates to any .NET project and generate code in the native language. Which means the same template will work for C#, VB, F#, Boo or whatever projects since it compiles down to the CodeDom.

In addition to your MGrammar language definition you can now create a template in MetaSharp, there is a MSBuild task that compiles that template file into a Template class in the native language of your project. That template class knows how to generate code that binds to a model (MGraph objects or CLR objects). Here is a visual example.

Here is an example of a template in MetaSharp.

namespace Samples.Song.Templates:
    import System;
    import System.Collections.Generic;
    import System.Threading;
    import System.Text;
    
    public class {Song.Name} as Song:
        public constructor:
            @for(b in {Song.Bars as enumerable}):
                super.Bars.Add(new Bar(
                    "{b.Note1}",
                    "{b.Note2}",
                    "{b.Note3}",
                    "{b.Note4}"));
            @end
        end
    end
end

Its a very simple language, the interesting parts are the ‘@’ characters and the {…} groups. The @ symbol puts that line into template mode. Meaning that will be a literal line in the template. The curly bracket groups are Binding statements, similar to what you see in Xaml only much less expressive at this point. So far you just specify a path and an optional type to cast it to. “enumerable” is just a helper to cast it to IEnumerable, or in the case if MGraph to get the sequence for the node.

So if you take the song DSL file from the Song Sample included in the Oslo SDK.

Song Fun
- - - D
C C# F G
E E - D
A E - E
G F - E
D C D E
A E D D
D E A C

And apply that to the template above you end up with this code (in C# in this case):

//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:2.0.50727.3521
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

namespace Samples.Song.Templates
{
    using System;
    using System.Collections.Generic;
    using System.Threading;
    using System.Text;


    public class Fun : Song
    {

        public Fun()
        {
            base.Bars.Add(new Bar("-", "-", "-", "D"));
            base.Bars.Add(new Bar("C", "C#", "F", "G"));
            base.Bars.Add(new Bar("E", "E", "-", "D"));
            base.Bars.Add(new Bar("A", "E", "-", "E"));
            base.Bars.Add(new Bar("G", "F", "-", "E"));
            base.Bars.Add(new Bar("D", "C", "D", "E"));
            base.Bars.Add(new Bar("A", "E", "D", "D"));
            base.Bars.Add(new Bar("D", "E", "A", "C"));
        }
    }
}

So this is really cool! This will allow you to create MGrammar DSLs without having to write complicated code to consume the object graph. And it all happens at build time! I’m planning on doing a little more cleanup then probably creating a CodePlex project for this. This is exactly what I wanted for NBusiness… hopefully I can get back to that soon, haha!