Out with Code Generation and in with Transformation

As I’ve been playing around with DSLs for the past couple of years I’ve been focused on Code Generation as my primary strategy. This is all well and good and I think that code generation still servers its purpose in the greater world of DSLs but it’s not quite good enough. I would like to start using the word Transformation as more generalized form of code generation and manipulation. What I used to refer to as Code Generation I will now simply call Textual Transformation. The other main form of Transformation is an AST Transformation. The Groovy folks have also adopted this to be synonymous with Compile-time Meta Programming and the Boo folks would call this a Syntactic Macro.

In order to promote the DRY principle and really allow N levels of arbitrary transformations I’ve been busy changing MetaSharp to adopt the Pipelinepattern for the compilation process (according to that wikipedia article what I have now is more of a psued-pipeline though, since each step is done synchronously). The end result is pretty simple actually.


The Pipeline has a series of steps and a collection of services. Each step depends on certain services and may alter / create certain services. In this way each step can be completely re-usable for different compilation scenarios. For example the MetaCompilePipeline has three steps:

  1. MetaSharpCompileStep
  2. CodeDomTransformStep
  3. CodeDomCodeGenerateStep

Which is to say, if you want to compile MetaSharp code inside of a project of a different language your pipeline needs to perform those three steps. First compile the code into MetaSharp AST nodes. Second transform those nodes into CodeDom objects. Third use a CodeDomProvider to generate code based on those CodeDom objects. The MetaTemplatePipeline is the same as the above with one extra step at the beginning, for transforming the code into something else.

The point here though, is that key to this whole process is the idea of Transformation. In fact the whole theory behind MetaSharp is simply to be a transformation tool. Each step is simply transforming the results of the previous step into something else. This is powerful because your DSL can consist of arbitrary levels of transformation, litterally your DSL could transform into a lower level DSL, which transforms into an even lower level DSL, etc. all the way down to machine code.

Transformation isn’t a new concept it’s, been around forever. At the very root of any software is essentially a bunch of 1’s and 0’s but we haven’t written raw 1’s and 0’s as our software for a long time. The compiler has always been a way for us to transform slightly more complex concepts into lower level ones. Even the extremely low level machine code is a step above raw 1’s and 0’s. General purpose programming languages themselves consist of constructs used to transform into much more verbose machine code or IL.

Taking transformation to the next level of abstraction is necessary for us to effectively create DSLs. If there was a tool to help us easily perform those transformations then it would go a long way towards making external DSL authoring more realistic, which is what I’m hoping to do with MetaSharp.

So to me, at this point, Code Generation is just another form of Transformation, which I will be calling “Textual Transformation” from now on. It has its pros and cons, of which I hope to discuss further in other posts. However, my point today is simply to convey the idea of Transformation as more general and more important to the DSL world than simply Code Generation and also to consciously force myself to update my lexicon.

Axum – A DSL for parallelism

There is an interesting new language called Axum, which appears to be very much a DSL from my perspective, that focuses primarily on parallelism.

Axum, previously known as Maestro, is a Microsoft incubation language project meant to provide a parallel programming model for .NET through isolation, actors and message passing. The language borrows many concepts from Erlang but with a C#-like syntax.

Axum is an imperative language with a C#-like syntax. While it is aware of objects, classes cannot be defined, because the language is domain and actor-oriented instead of being object-oriented. Axum is not a general purpose language being targeted at solving concurrency tasks and is built upon Concurrency and Coordination Runtime (CCR) from Microsoft Robotics.

It’s really interesting to think about when you realize that parallelism is in fact a solvable problem from within a general purpose programming language. However the problem is so complex that it is generally very difficult for anyone to get it right (with object oriented languages at least). So their solution was to create a DSL that doesn’t let you get it wrong (or minimizes it at least). I think this is the perfect example of why an external DSL can be so powerful.

External DSLs can allow you to model, what would be a complex design pattern in a general purpose language, with much lower rate of errors. You couldn’t do this nearly as effectively in an internal DSL.

More from the Axum team: http://blogs.msdn.com/maestroteam/archive/2009/04/17/forging-ahead.aspx

SmallBasic at the Twin Cities Languages User Group

The next meeting of the TCLUG will be on the language Small Basic, presented by Jeff Klawiter. If you have an interest in learning a new language or have children that might be interested in getting into programming you should come to this presentation (and bring the kids)!

Oh yeah, and free pizza.


CodeCamp Evaluation Results

Rating Avg: 7.7

I had a few good comments and a few negative ones. One person said “A bit shallow” and another person said “This went very deep”, so it can be a little hard to take something away from that. Maybe I’ll try to respond to various comments directly.

5, Could’ve been slightly better prepared, perhaps with more demos.

I agree with this, this commenter is probably specifically talking about the Ruby portion of the presentation. I want to apologize to Rubyists for not having a more solid demo there but Mike Frawley helped me show something at least and talk about it. Better than nothing at least :-/

9, This went very deep. I enjoyed the intro to BOO as well. I would have liked to seen at least one concrete example of how to do this with C# using extension methods.

I was thinking about showing a RhinoMocks example. That would have been better than just the simple elevator app samples. If you made this comment check out RhinoMocks or the Umbrella project for good examples of extension method driven APIs in C#.

5, A bit shallow – I wanted to hear more about M and Oslo

I wonder if this person came to the presentation expecting more M and Oslo? Or just saw me talk about it a little bit at the end and wanted to hear more. To be honest, I don’t know a lot about M or Oslo (except conceptually) but I know a lot about MGrammar (which is distinctly different). I’d love to do another presentation on that I think. Trying to hit that depth where its interesting for everyone isn’t easy I guess.

7, I dint had much background information

Well hopefully some of my general overviews of different kinds of DSLs and ways to use them helped you come away with a little something at least.

10, Very concise.  Now I know what a DSL is and why I should care.  Great examples, followed through with the same example through different DSL’s. Interesting points about AST injection and making cross cutting concerns easier to decipher.  Topped it off with an excellent grand finale.  Good job.

Thanks! I have to admit the Grand Finale was pretty sweet. For those of you not there I played my enhanced Song demo using MetaSharp, which was modified to play the Super Mario Brothers Theme Song using Console.Beep. Unfortunately I ran out of time to really dive into the workings of this more but next time MetaSharp might make an interesting presentation in and of itself.

6, Very general topic, hard to figure out how to apply what I learned.

Well that’s actually a big bummer. I might have focused on Boo more than some people would have liked specifically so I could have something concrete that people could take away. I hope this commenter isn’t dismissing Boo because that is what I was hoping, if anything, people could use to apply DSLs here and now. So if you’re still not sure how to apply your newfound DSL knowledge go try Boo!

MGrammar Tokenization

I’m still having some fun playing around with MGrammar but I was pretty stumped by a seemingly simple issue for a while. My problem was how to tokenize Type names vs. Reference Expressions. For example consider the following C# code:

namespace Testing.Example

Now a simple reference expression:


In the first example, I essentially want the entire string, where as in the second example I have three nested references to variables or members. My first inclination was to tokenize this with something similar to this:

token Identifier = Letter+ (Letter | Number)*;
token TypeIdentifier = Identifier ("." Identifier)*;

syntax Namespace = "namespace" TypeIdentifier;
syntax ReferenceExpression = (Identifier ".")* Identifier;
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

This is a very rough example but the idea was to have two different types of identifiers, one for each syntax to use. Well it turns out this doesn’t really work and I think the reason is because the tokenization happens before the syntax evaluation. Meaning if you have a string such as “blah.Foo.Bar” and it matches the TypeIdentifier then that is what it will be. If your syntax expects an Identifier token but gets a TypeIdentifier token you’re out of luck. You cannot optionally get one token or another. There is one solution in MGrammar and that is to use the “final” keyword. Such as:

token Identifier = Letter+ (Letter | Number)*;
final token Namespace = "namespace";
syntax NamespaceExpression = Namespace TypeIdentifier;

This will ensure that the string “namespace” will match the Namespace token rather than the more liberal Identifier token. But that is about it, you get, essentially two levels of priority. This is fine though, you just have to rely more on your syntax’s (syntaxii?) and post processing. For example you might do this:

token Identifier = Letter+ (Letter | Number)*;

syntax NamespaceIdentifier = n:(n:NamespaceIdentifier "." => n)? i:Identifier 
    => NamespaceIdentifier { Parent = n, Name = i };
syntax NamespaceExpression = Namespace i:NamespaceIdentifier 
    => Namespace { Name = i } ;

syntax ReferenceExpression = r:(ReferenceExpression "." => r)? i:Identifier
    => ReferenceExpression { Target = r, Name = i };
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

Of course this is very simplistic and doesn’t take into account other types of expressions but hopefully you get the idea. The trick is that your graph output for the namespace doesn’t yield a long string but instead it yields a nested node structure. When processing the graph you have to take this into account but that isn’t too hard.