First Steps Towards a Parser Generator

I’ve been pretty quiet the last couple of months because I have been very busy at work and at home I have been slowly chipping away at a new Parser for MetaSharp.

It’s very late at night here so I’m not going to go into a lot of detail right now but suffice it to say that currently my Parser is completely hand written and right now I’m working on a Transform step that will let me generate Parsers from a grammar. The goal is to generate the Parser from the Grammar grammar!

I still have a long way to go but here is what I do have. The following Grammar:

grammar Simple < MetaSharp.Transformation.Parsing.Common.BasicParser:
    Alphabet = (A | B | C)+;
    A = "a";
    B = "b";
    C = "c";
end

Produces the following code:

//——————————————————————————
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:4.0.21006.1
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//——————————————————————————

public partial class Simple : MetaSharp.Transformation.Parsing.Common.BasicParser
{

    internal new static System.Collections.Generic.IEnumerable<string> Imports;

    static Simple()
    {
        Simple.Imports = new string[0];
    }

    public Simple(MetaSharp.Transformation.IContext context) :
        base(context)
    {
        MetaSharp.Transformation.IRuleService ruleService =
            this.Context.Locate<MetaSharp.Transformation.IRuleService>();
        this.Add(
            newMetaSharp.Transformation.Patterns.OrPattern(
                ruleService.GetRule(“Alphabet”, Simple.Imports),
                ruleService.GetRule(“A”, Simple.Imports),
                ruleService.GetRule(“B”, Simple.Imports),
                ruleService.GetRule(“C”, Simple.Imports)));
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(AlphabetRule))]
public class AlphabetRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.OneOrMorePattern(
            new MetaSharp.Transformation.Patterns.OrPattern(
                new MetaSharp.Transformation.Patterns.OrPattern(
                    this.Rules.GetRule(“A”, Simple.Imports),
                    this.Rules.GetRule(“B”, Simple.Imports)),
                    this.Rules.GetRule(“C”, Simple.Imports)));
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(ARule))]
public class ARule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“a”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(BRule))]
public class BRule : MetaSharp.Transformation.Rule
{

    protected override void Initialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“b”);
        base.Initialize();
    }
}

[MetaSharp.Transformation.RuleExportAttribute(typeof(CRule))]
public class CRule : MetaSharp.Transformation.Rule
{

    protected override voidInitialize()
    {
        this.Body = new MetaSharp.Transformation.Patterns.StringPattern(“c”);
        base.Initialize();
    }
}

It’s a pretty trivial grammar for now but this should give you an idea of where this is going. The RuleExportAttribute inherits from the MEF ExportAttribute and the concrete IRuleService uses MEF to supply rules using these attributes.

It’s exciting!

Out with Code Generation and in with Transformation

As I’ve been playing around with DSLs for the past couple of years I’ve been focused on Code Generation as my primary strategy. This is all well and good and I think that code generation still servers its purpose in the greater world of DSLs but it’s not quite good enough. I would like to start using the word Transformation as more generalized form of code generation and manipulation. What I used to refer to as Code Generation I will now simply call Textual Transformation. The other main form of Transformation is an AST Transformation. The Groovy folks have also adopted this to be synonymous with Compile-time Meta Programming and the Boo folks would call this a Syntactic Macro.

In order to promote the DRY principle and really allow N levels of arbitrary transformations I’ve been busy changing MetaSharp to adopt the Pipelinepattern for the compilation process (according to that wikipedia article what I have now is more of a psued-pipeline though, since each step is done synchronously). The end result is pretty simple actually.

image

The Pipeline has a series of steps and a collection of services. Each step depends on certain services and may alter / create certain services. In this way each step can be completely re-usable for different compilation scenarios. For example the MetaCompilePipeline has three steps:

  1. MetaSharpCompileStep
  2. CodeDomTransformStep
  3. CodeDomCodeGenerateStep

Which is to say, if you want to compile MetaSharp code inside of a project of a different language your pipeline needs to perform those three steps. First compile the code into MetaSharp AST nodes. Second transform those nodes into CodeDom objects. Third use a CodeDomProvider to generate code based on those CodeDom objects. The MetaTemplatePipeline is the same as the above with one extra step at the beginning, for transforming the code into something else.

The point here though, is that key to this whole process is the idea of Transformation. In fact the whole theory behind MetaSharp is simply to be a transformation tool. Each step is simply transforming the results of the previous step into something else. This is powerful because your DSL can consist of arbitrary levels of transformation, litterally your DSL could transform into a lower level DSL, which transforms into an even lower level DSL, etc. all the way down to machine code.

Transformation isn’t a new concept it’s, been around forever. At the very root of any software is essentially a bunch of 1’s and 0’s but we haven’t written raw 1’s and 0’s as our software for a long time. The compiler has always been a way for us to transform slightly more complex concepts into lower level ones. Even the extremely low level machine code is a step above raw 1’s and 0’s. General purpose programming languages themselves consist of constructs used to transform into much more verbose machine code or IL.

Taking transformation to the next level of abstraction is necessary for us to effectively create DSLs. If there was a tool to help us easily perform those transformations then it would go a long way towards making external DSL authoring more realistic, which is what I’m hoping to do with MetaSharp.

So to me, at this point, Code Generation is just another form of Transformation, which I will be calling “Textual Transformation” from now on. It has its pros and cons, of which I hope to discuss further in other posts. However, my point today is simply to convey the idea of Transformation as more general and more important to the DSL world than simply Code Generation and also to consciously force myself to update my lexicon.

MetaSharp Vision for the Future

I was just having some ideas and wanted to put it down somewhere partly for myself and partly to get some feedback.

One of the next things I want to do is to convert the compile-to-CodeDom parts of MetaSharp into a Vistor Pattern so that I can use the same system to compile to CodeDom or Generate MetaSharp or to transform the AST or whatever I want. This will bring a lot of flexibility and power to the whole system.

I was thinking about a post by Ayende Rahien the other day called M is to DSL as Drag and Drop is to Programming and specifically I was thinking about the quote “If you want to show me a DSL, show me one that has logic, not one that is a glorified serialization format.“ And what I took this to mean is that there is no logic in this DSL. Which can still be declarative but will often time have concepts like less-than or greater-than or equal-to. Certainly not limited by this but these are fairly common. To me his complaint (which is valid) is that with an external DSL, no matter how easy it is to write a grammar, it is still hard to expression logic with a grammar, and furthermore it is just as hard to translate that logic into something executable.

With an internal DSL, such as you get with Boo, you can easily just author keywords for your DSL but you get all of the logical operators for free, which is very nice of Boo. But unfortunately with an internal DSL you not only get the logical operators for free you are forced to get them. With an internal DSL you can do less work to get it working but you are not operating in a constrained universe. This has trades offs but lets certainly not dismiss it. There plenty of use cases where this is the preferred way of doing it.

However there are some distinct benefits of an external DSL, the major tradeoff being the effort required to implement it. The main benefit is that you can constrain your universe such that only allowable logic can happen in the correct spots. It’s like a sandboxed language, which I like to call a constrained universe. And believe it or not constraint can actually be freeing.

So my sudden flash of insight this morning was when I realized that actually, with MGrammar, you can choose to import grammars defined in other assemblies and use the syntax and tokens defined there. So when you choose to use MetaSharp by adding a reference to the assembly you can actually also import the MetaSharp.Lang grammar and easily make use of the BinaryExpression syntax in your own DSL (or anything else). Then I was also thinking that you could probably make use of the same AST serialization tools and (soon to be) AST transformation Visitors to build your own DSLs without a lot of the extra work. Using that type of system you could probably transform directly into executable code completely without using the templating at all, haha! Simply transform your custom AST nodes into standard supported Nodes, or write your own visitor that can handle your custom nodes. Your custom visitor could probably also tap into the templating system so you could write the AST transformation as a MetaSharp template if you desired as well.

This would put MetaSharp into the role of being an extensible compiler system where custom external DSLs can opt-in to standard language grammar where appropriate rather than not even being able to opt-out as in current internal DSLs. This is powerful idea and I think it is well within my grasp.

MetaSharp on CodePlex

I finally managed to get the source code for the little side project I’ve been working on into CodePlex.

http://metasharp.codeplex.com

Give it a shot and let me know how it works out for you! There are definitely known limitations at this point, there is almost certainly language syntax that doesn’t work right for starters. Most of the basic stuff is there but l haven’t really tested abstract members much or events and things like that. I also haven’t implemented “macros” yet, which isn’t very hard to do in the parser but is much harder to do in the compiler. I’ll probably hook that up for v0.2.

I’m also in dire need of cleaning up some regular project maintenance stuff such as versions for the assemblies, automated build, many more unit tests, code analysis, etc.

If you’re still scratching your head about what I’m even trying to do here check out these two wiki pages

Basic Workflow

http://metasharp.codeplex.com/Wiki/View.aspx?title=Workflow

Song Sample

http://metasharp.codeplex.com/Wiki/View.aspx?title=Song%20Sample

If you’re still scratching your head please let me know, because I’m either not explaining it right or this is more confusing than I thought.

By the way, this project is not an official Microsoft project in any way. This is something I have done in the evenings and weekends for my own enjoyment. Please don’t ask anyone else for support and don’t blame them if you think this project is lame 🙂

MetaSharp code generation success!

I finally have an end to end example of code generation with MGrammar I’m happy with. It’s pretty simple too I think. What I have working right now is the scenario where you author a grammar for use as an external DSL and would like to use it in existing applications. You are able to add templates to any .NET project and generate code in the native language. Which means the same template will work for C#, VB, F#, Boo or whatever projects since it compiles down to the CodeDom.

In addition to your MGrammar language definition you can now create a template in MetaSharp, there is a MSBuild task that compiles that template file into a Template class in the native language of your project. That template class knows how to generate code that binds to a model (MGraph objects or CLR objects). Here is a visual example.

Here is an example of a template in MetaSharp.

namespace Samples.Song.Templates:
    import System;
    import System.Collections.Generic;
    import System.Threading;
    import System.Text;
    
    public class {Song.Name} as Song:
        public constructor:
            @for(b in {Song.Bars as enumerable}):
                super.Bars.Add(new Bar(
                    "{b.Note1}",
                    "{b.Note2}",
                    "{b.Note3}",
                    "{b.Note4}"));
            @end
        end
    end
end

Its a very simple language, the interesting parts are the ‘@’ characters and the {…} groups. The @ symbol puts that line into template mode. Meaning that will be a literal line in the template. The curly bracket groups are Binding statements, similar to what you see in Xaml only much less expressive at this point. So far you just specify a path and an optional type to cast it to. “enumerable” is just a helper to cast it to IEnumerable, or in the case if MGraph to get the sequence for the node.

So if you take the song DSL file from the Song Sample included in the Oslo SDK.

Song Fun
- - - D
C C# F G
E E - D
A E - E
G F - E
D C D E
A E D D
D E A C

And apply that to the template above you end up with this code (in C# in this case):

//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:2.0.50727.3521
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

namespace Samples.Song.Templates
{
    using System;
    using System.Collections.Generic;
    using System.Threading;
    using System.Text;


    public class Fun : Song
    {

        public Fun()
        {
            base.Bars.Add(new Bar("-", "-", "-", "D"));
            base.Bars.Add(new Bar("C", "C#", "F", "G"));
            base.Bars.Add(new Bar("E", "E", "-", "D"));
            base.Bars.Add(new Bar("A", "E", "-", "E"));
            base.Bars.Add(new Bar("G", "F", "-", "E"));
            base.Bars.Add(new Bar("D", "C", "D", "E"));
            base.Bars.Add(new Bar("A", "E", "D", "D"));
            base.Bars.Add(new Bar("D", "E", "A", "C"));
        }
    }
}

So this is really cool! This will allow you to create MGrammar DSLs without having to write complicated code to consume the object graph. And it all happens at build time! I’m planning on doing a little more cleanup then probably creating a CodePlex project for this. This is exactly what I wanted for NBusiness… hopefully I can get back to that soon, haha!

Meta Syntax Ramblings

I’ve been hung up for a while here on simplifying my meta syntax for Meta# since I am not real happy with the way my current prototype works. What I should say is that I have, so far, gone with a purely declarative syntax that ends up very verbose, much like XAML. This isn’t very fun for code that is supposed to be user readable.

I’m thinking of changing the {…} syntax to always mean binding to the model and [|…|] to be meta code. I am also thinking of creating a special expand keyword that allows you to switch into a code generation mode. For example, something like this:
template Blah:
                int count = 0;
                expand f in {Foo} where {Bar as int} < 100 && count % 2 == 0:
                                private field {Type} _{Name};
                                [| count++; |]
                end
end
Which is an ugly example but you get the point. So inside of a template you are writing literally generated code unless you have an expand statement. The expand statement would allow you to specify some things you want to expand upon and an optional where clause to turn sections of the template into meta code. If you want to inject logic in the middle of an expansion block you would use the [|…|] operators to switch back into generating literal code. This is essentially the opposite of Boo, where you write everything in literal code then use the [|…|] to indicate meta blocks rather than literal logic. The sample above would generate to code such as this:
public class BlahTemplate : Template
{
protected override IEnumerable<CodeObject> Generate(object Blah)
{
    int count = 0;
    foreach(object f in Binder.Bind(“Foo”, Blah))
    {
      if(Binder.Bind<int>(“Bar”, f) < 100 && count % 2 == 0)
      {
         string code = “private field {0} _{1};”;
         yield return Compile(string.Format(code, Binder.Bind(“Type”, f), Binder.Bind(“Name”, f)));
         code++;
      }
    }  
}
}
(Except it would be a for block not foreach due to limitations in the CodeDom)
I think I like this a little better (still deciding) because most of the time you won’t need anything other than binding statements inside of your expansion block and then you end up with very clean looking template code. The other thing I like is that using the {…} purely for binding makes it a lot simpler. I was encountering an issue where the property you are trying to bind to was actually a string due the way MGrammar works but you want it to be treated as an int. Doing it purely as a Binding syntax makes this a lot simpler I think {Property(.SubProperty)* (as Type)?}. The only question that concerns me is if you’ll ever actually need anything other than these keywords to solve any problem. I suppose I can always add more…
So if you were doing this for the song sample it might look like this:
template Song:
expand s in {Song}:
    public sealed class {Name} implements Song:
      public constructor:
        expand b in {Bars}:
          this.bars.Add(new Bar([|{Note1}|], [|{Note2}|], [|{Note3}|], [|{Note4}|]);
        end
      end
    end
end
end
The meta statement surrounding the bound notes would be because the Note is actually a string but what you really want is a PrimitiveExpression(string). Using the meta statement syntax would force the string to be interpreted as a code block instead of string. I might be able to figure out a way to not need that but as of now I think it’s necessary.
The other weird thing is how to handle mixed binding expressions. For example this is easy to figure out:
public field {f.Type} {f.Name};
But how do you handle this:
Public field {f.Type} _{f.Name};
Here, for the fields name we have a mix of a string and a binding expression where as in the first sample it was just a binding expression. What I’m thinking is to expand the syntax for a Binding expression to be any token that contains {…} rather than starts and ends with it. It will basically translate everything inside of the brackets into a string format to a Binder call such as:
_{f.Name} -> string.Format(“_{0}”, Binder.Bind(“Name”, f))
Here the bind method knows how to get values out of basic objects with properties using reflection or MGrammar sequence elements using the MGrammar API.
Also, if anyone has something better than “: end” for blocks other than “{ }” please let me know.

MetaSharp – A CodeDom based Template Engine using MGrammar

I’ve been working on a tangential project related to NBusiness for a couple of weeks now and I just wanted to take a moment to get a few of my thoughts out. The project I have been working on I am tentatively calling “MetaSharp” for now. It’s been fun and educational but hopefully it will have real usefullness when it is done. I wanted to have a fully working example before I publicly posted the code (since it’s basically prototype quality right now) but if anyone is interested in seeing what I have so far feel free to ask and I’ll hook you up somehow.

I’ll try to start at the beginning to justify my rationale for creating this strange project. I’ve been working on NBusiness for quite a while now and while I’ve really had NBusiness “working” almost all along I have never quite been able to get it where I want it to be (complete). If I had to sum up the entire process of working on NBusiness into one sentence it would be “creating a DSL is hard”. That’s an understatement frankly. Let me see if I can lay out the various layers required for DSL creation.
·         Domain Objects
·         Parser
·         Compiler
·         Template Engine
·         Build Integration
·         Tooling Support
The first three items are actually relatively easy and pretty fun. This is what we all know how to do, write code to parse strings and stick values into objects. No problem. It turns out the next three layers which really provide the fit, finish and ultimate usability of your DSL are not easy at all. Build integration isn’t really that bad actually but tooling integration can be a real bear. In the case of a DSL you really want syntax hilighting and intellisense and nice IDE integration for file templates and things like that. Maybe a few additional context menus in your IDE and such. For me I have been trying to integrate into Visual Studio and I can officially say that I have sunk well over half my time into that aspect alone and it has been one of the hardest things I have ever tried to do. Visual Studio is also architected such that I had to completely redo my parser and compiler to be compatible with the needs of Visual Studio. Very painful.
But what is really hanging me up now is what I consider to be a large gap in the .NET  DSL world and that is a suitable templating engine. By templating engine I mean something that can take metadata and translate it into code.
I mean we have a bunch out there but they’re all (as far as I know) effectively giant string builders. They suffer from Tag Soup and and are bound strongly to a specific language implementation. For NBusiness I want to support side by side integration with any .NET language, C# or VB or Python or whatever. And re-creating all of these templates for every language is not an option. It’s too much upfront work and it’s too much long term maintenance. I absolutely need templates that are based on the CodeDom so I can be language agnostic… But if you’ve ever tried to use the CodeDom you know how hard it is to work with. Because of this users are very unlikely to actually make their own templates (which is almost always necessary) and when they do it is a very painful process. So I’ve been stuck in this cunundrum for quite a while, how can you make a template engine that is both based on the CodeDom but has the ease of use of a string builder?
Enter MGrammar. Using MGrammar I have found a way to define a DSL for generating code. This DSL turns out to be a full fledged programming language in and of itself with the caveat of being restricted only to that which is CLS compliant. I have combined this DSL with the capability to create templates (to extend the language, similar to macros in Boo) and databinding similar to what you have in XAML. The end result allows you to do something similar to this:
namespace Example:
    import System;
 
    template One:
        public class {Binding Name}:
            {SequenceBinding Items, Template=Two}
        end
    end
 
    template Two:
        private field {Binding Type} _{Binding Name};
        public property {Binding Type} {Binding Name}:
            get:
                return this._{Binding Name};
            end
            set:
                this._{Binding Name} = value;
            end
    end
end
(This is just an example, the end result might not actually be exactly this syntax)
Which when compiled will generate a class called OneTemplate that inherits from Template and returns a CodeTypeDeclaration object from it’s Generate method. Extensions such as the BindingExtension show here can be custom objects to extend behaviors but in this case it binds the name of the class to the Name property (or Name sequence node of an MGraph tree) of the provided metadata.
Technically you could write your entire project in pure MetaSharp code but more likely you will write all of your static classes in your rich language of choice and simply use MetaSharp to define templates. Since this is all compiling down to CodeDom objects I have cooked up some MSBuild tasks that simply translate those objects into the code of the project the files exist in. You could share this same file in a VB or C# project and it would compile to the same thing in both assemblies.
Currently I am working on a prototype using the Song example from the MGrammar sample code that will allow you to write songs that generate song classes using templates like these. It’s almost working… the CSharpCodeProvider is throwing a random NullReferenceException with no useful error messages. Which is one reason why a DSL like this is helpful, it should be able to abstract away the pain of working directly with the CodeDom.