The 5 Laws of Code Generation

This is a refresh of an old blog post. The more I look at Oslo and contrast it with what I have been working on I’m trying to verify that NBusiness isn’t actually redundant or made obsolete by Oslo. Obviously the two will be competitors, but what I’m trying to figure out is what actually is the difference between the two and more importantly why is NBusiness the more correct solution to the problem?

I would like to start out by saying that Oslo and NBusiness are both DSLs and core to any DSL is the transformation of the metadata into more concrete code, be that a less abstract DSL, actual executable code or something else entirely. So what I’m calling Code Generation is essentially that transformation process. Additionally what used to be called an “intermediate format” I’m now simply calling Metadata. To me, in this context, metadata is simply the name for the actual DSL declaration for a given application.

So here are those rules again, to be reframed into the context of Oslo and how I feel it may violate those rules.

1. Code generation is controlled through modifiable templates.

Translation of your metadata into another form should never be done through a black box. I’m not entirely sure how customizable the SQL generation of Oslo is but from what I understand it’s pretty opaque. In fact the entire system of translating MGraph into another form should be completely transparent and built such that it is very easy to be shaped into any form. If anyone can explain to me how Oslo translates M into sql and show me how I can do it myself, and alter the SQL that is generated then I’ll be happy to change my mind on this one but it feels rather opaque to me at this moment.

2. Code generation is done during the build process.

I would like to add an amendment to this one and specify that code generation can also legitimately be done dynamically as well. So the new rule could be more accurately changed to “Intermediate forms of metadata should never be persisted”. Not that you couldn’t write it out to temp folders but the point is that the integration of a DSL into an application should be seamless, you shouldn’t have to have multiple manual steps to get it all integrated. Whether this is done at runtime or at build time is irrelevant. Presumably the many command line apps that come with Oslo in order to transform your DSL into something that goes into the repository will be streamlined but this is the sort of thing that should be avoided.

3. Code generation is done from an intermediate format.

I would like to amend this rule to instead be “Metadata must have a single source of truth.” I think that Oslo has a pretty good system for the intermediate format (M) but it doesn’t follow the single source of truth rule. M is simply translated into the Repository, which is the real source of truth but is not necessarily synchronized in the reverse with the original M code (as far as I know at least).

To me Oslo violates this rule simply by the existence of the Repository. The repository is essentially a “second source of truth” and can be edited from multiple sources. To me the DSLs should be the single source of truth and the repository should be essentially a temp file or part of the output of the build. Editing the repository should simply be a matter of editing your entities.

4. The intermediate format is under source control with versioning.

I would like to amend this rule also to specify that only deltas must be part of each revision. So maybe this rule could be simply changed to “The metadata declaration must be versionable and mergable”. Which, usually means that your DSL needs to be textual. I would be willing to buy into a binary format for metadata but only if it had the ability to be versioned with my preferred source control system as deltas not just a giant binary blob and as long as it didn’t break any of the above rules in the process.

5. The code generation templates are also under source control.

This is to place the same constraints on the transformation system as the metadata itself. This will usually translate into the idea that your templates should also be textual. Again, if you want it to be some binary format then there needs to be ways to allow source control tools to persist only deltas. So this rule could be changed to be “The transformation templates must be versionable and mergable”.

Summary

Here is a summary of these revised rules:

  1. Metadata transformation is controlled through modifiable templates.

  2. Intermediate forms of metadata should never be persisted.

  3. Metadata must have a single source of truth.

  4. The metadata must be versionable and mergable.

  5. The transformation templates must be versionable and mergable.

MetaSharp Vision for the Future

I was just having some ideas and wanted to put it down somewhere partly for myself and partly to get some feedback.

One of the next things I want to do is to convert the compile-to-CodeDom parts of MetaSharp into a Vistor Pattern so that I can use the same system to compile to CodeDom or Generate MetaSharp or to transform the AST or whatever I want. This will bring a lot of flexibility and power to the whole system.

I was thinking about a post by Ayende Rahien the other day called M is to DSL as Drag and Drop is to Programming and specifically I was thinking about the quote “If you want to show me a DSL, show me one that has logic, not one that is a glorified serialization format.“ And what I took this to mean is that there is no logic in this DSL. Which can still be declarative but will often time have concepts like less-than or greater-than or equal-to. Certainly not limited by this but these are fairly common. To me his complaint (which is valid) is that with an external DSL, no matter how easy it is to write a grammar, it is still hard to expression logic with a grammar, and furthermore it is just as hard to translate that logic into something executable.

With an internal DSL, such as you get with Boo, you can easily just author keywords for your DSL but you get all of the logical operators for free, which is very nice of Boo. But unfortunately with an internal DSL you not only get the logical operators for free you are forced to get them. With an internal DSL you can do less work to get it working but you are not operating in a constrained universe. This has trades offs but lets certainly not dismiss it. There plenty of use cases where this is the preferred way of doing it.

However there are some distinct benefits of an external DSL, the major tradeoff being the effort required to implement it. The main benefit is that you can constrain your universe such that only allowable logic can happen in the correct spots. It’s like a sandboxed language, which I like to call a constrained universe. And believe it or not constraint can actually be freeing.

So my sudden flash of insight this morning was when I realized that actually, with MGrammar, you can choose to import grammars defined in other assemblies and use the syntax and tokens defined there. So when you choose to use MetaSharp by adding a reference to the assembly you can actually also import the MetaSharp.Lang grammar and easily make use of the BinaryExpression syntax in your own DSL (or anything else). Then I was also thinking that you could probably make use of the same AST serialization tools and (soon to be) AST transformation Visitors to build your own DSLs without a lot of the extra work. Using that type of system you could probably transform directly into executable code completely without using the templating at all, haha! Simply transform your custom AST nodes into standard supported Nodes, or write your own visitor that can handle your custom nodes. Your custom visitor could probably also tap into the templating system so you could write the AST transformation as a MetaSharp template if you desired as well.

This would put MetaSharp into the role of being an extensible compiler system where custom external DSLs can opt-in to standard language grammar where appropriate rather than not even being able to opt-out as in current internal DSLs. This is powerful idea and I think it is well within my grasp.

MetaSharp on CodePlex

I finally managed to get the source code for the little side project I’ve been working on into CodePlex.

http://metasharp.codeplex.com

Give it a shot and let me know how it works out for you! There are definitely known limitations at this point, there is almost certainly language syntax that doesn’t work right for starters. Most of the basic stuff is there but l haven’t really tested abstract members much or events and things like that. I also haven’t implemented “macros” yet, which isn’t very hard to do in the parser but is much harder to do in the compiler. I’ll probably hook that up for v0.2.

I’m also in dire need of cleaning up some regular project maintenance stuff such as versions for the assemblies, automated build, many more unit tests, code analysis, etc.

If you’re still scratching your head about what I’m even trying to do here check out these two wiki pages

Basic Workflow

http://metasharp.codeplex.com/Wiki/View.aspx?title=Workflow

Song Sample

http://metasharp.codeplex.com/Wiki/View.aspx?title=Song%20Sample

If you’re still scratching your head please let me know, because I’m either not explaining it right or this is more confusing than I thought.

By the way, this project is not an official Microsoft project in any way. This is something I have done in the evenings and weekends for my own enjoyment. Please don’t ask anyone else for support and don’t blame them if you think this project is lame 🙂

Meta Syntax Ramblings

I’ve been hung up for a while here on simplifying my meta syntax for Meta# since I am not real happy with the way my current prototype works. What I should say is that I have, so far, gone with a purely declarative syntax that ends up very verbose, much like XAML. This isn’t very fun for code that is supposed to be user readable.

I’m thinking of changing the {…} syntax to always mean binding to the model and [|…|] to be meta code. I am also thinking of creating a special expand keyword that allows you to switch into a code generation mode. For example, something like this:
template Blah:
                int count = 0;
                expand f in {Foo} where {Bar as int} < 100 && count % 2 == 0:
                                private field {Type} _{Name};
                                [| count++; |]
                end
end
Which is an ugly example but you get the point. So inside of a template you are writing literally generated code unless you have an expand statement. The expand statement would allow you to specify some things you want to expand upon and an optional where clause to turn sections of the template into meta code. If you want to inject logic in the middle of an expansion block you would use the [|…|] operators to switch back into generating literal code. This is essentially the opposite of Boo, where you write everything in literal code then use the [|…|] to indicate meta blocks rather than literal logic. The sample above would generate to code such as this:
public class BlahTemplate : Template
{
protected override IEnumerable<CodeObject> Generate(object Blah)
{
    int count = 0;
    foreach(object f in Binder.Bind(“Foo”, Blah))
    {
      if(Binder.Bind<int>(“Bar”, f) < 100 && count % 2 == 0)
      {
         string code = “private field {0} _{1};”;
         yield return Compile(string.Format(code, Binder.Bind(“Type”, f), Binder.Bind(“Name”, f)));
         code++;
      }
    }  
}
}
(Except it would be a for block not foreach due to limitations in the CodeDom)
I think I like this a little better (still deciding) because most of the time you won’t need anything other than binding statements inside of your expansion block and then you end up with very clean looking template code. The other thing I like is that using the {…} purely for binding makes it a lot simpler. I was encountering an issue where the property you are trying to bind to was actually a string due the way MGrammar works but you want it to be treated as an int. Doing it purely as a Binding syntax makes this a lot simpler I think {Property(.SubProperty)* (as Type)?}. The only question that concerns me is if you’ll ever actually need anything other than these keywords to solve any problem. I suppose I can always add more…
So if you were doing this for the song sample it might look like this:
template Song:
expand s in {Song}:
    public sealed class {Name} implements Song:
      public constructor:
        expand b in {Bars}:
          this.bars.Add(new Bar([|{Note1}|], [|{Note2}|], [|{Note3}|], [|{Note4}|]);
        end
      end
    end
end
end
The meta statement surrounding the bound notes would be because the Note is actually a string but what you really want is a PrimitiveExpression(string). Using the meta statement syntax would force the string to be interpreted as a code block instead of string. I might be able to figure out a way to not need that but as of now I think it’s necessary.
The other weird thing is how to handle mixed binding expressions. For example this is easy to figure out:
public field {f.Type} {f.Name};
But how do you handle this:
Public field {f.Type} _{f.Name};
Here, for the fields name we have a mix of a string and a binding expression where as in the first sample it was just a binding expression. What I’m thinking is to expand the syntax for a Binding expression to be any token that contains {…} rather than starts and ends with it. It will basically translate everything inside of the brackets into a string format to a Binder call such as:
_{f.Name} -> string.Format(“_{0}”, Binder.Bind(“Name”, f))
Here the bind method knows how to get values out of basic objects with properties using reflection or MGrammar sequence elements using the MGrammar API.
Also, if anyone has something better than “: end” for blocks other than “{ }” please let me know.