Actipro has a WPF MGrammar syntax editor

I haven’t tried to use this yet but it seems pretty interesting.

SyntaxEditor is a powerful text editing control that is packed with features for efficient code editing, including syntax highlighting, line numbers, block selection, IntelliPrompt UI, split views, zooming, bi-di support, and much more. It has many of the same code editing features found in the Visual Studio code editor.

SyntaxEditor is built on top of our next-generation extensible text/parsing framework. While over 20 sample languages are available to get you started (such as C#, VB, XML, and more), custom language definitions can be developed and distributed with your applications as well. SyntaxEditor is designed for use in IDE (integrated development environment) applications, however there are many other applications out there than can take advantage of such a control.

A free add-on is included that integrates domain-specific language (DSL) parsers created using Microsoft Olso’s MGrammar with SyntaxEditor, allowing it to syntax highlight code based on the DSL parser.

Out with Code Generation and in with Transformation

As I’ve been playing around with DSLs for the past couple of years I’ve been focused on Code Generation as my primary strategy. This is all well and good and I think that code generation still servers its purpose in the greater world of DSLs but it’s not quite good enough. I would like to start using the word Transformation as more generalized form of code generation and manipulation. What I used to refer to as Code Generation I will now simply call Textual Transformation. The other main form of Transformation is an AST Transformation. The Groovy folks have also adopted this to be synonymous with Compile-time Meta Programming and the Boo folks would call this a Syntactic Macro.

In order to promote the DRY principle and really allow N levels of arbitrary transformations I’ve been busy changing MetaSharp to adopt the Pipelinepattern for the compilation process (according to that wikipedia article what I have now is more of a psued-pipeline though, since each step is done synchronously). The end result is pretty simple actually.


The Pipeline has a series of steps and a collection of services. Each step depends on certain services and may alter / create certain services. In this way each step can be completely re-usable for different compilation scenarios. For example the MetaCompilePipeline has three steps:

  1. MetaSharpCompileStep
  2. CodeDomTransformStep
  3. CodeDomCodeGenerateStep

Which is to say, if you want to compile MetaSharp code inside of a project of a different language your pipeline needs to perform those three steps. First compile the code into MetaSharp AST nodes. Second transform those nodes into CodeDom objects. Third use a CodeDomProvider to generate code based on those CodeDom objects. The MetaTemplatePipeline is the same as the above with one extra step at the beginning, for transforming the code into something else.

The point here though, is that key to this whole process is the idea of Transformation. In fact the whole theory behind MetaSharp is simply to be a transformation tool. Each step is simply transforming the results of the previous step into something else. This is powerful because your DSL can consist of arbitrary levels of transformation, litterally your DSL could transform into a lower level DSL, which transforms into an even lower level DSL, etc. all the way down to machine code.

Transformation isn’t a new concept it’s, been around forever. At the very root of any software is essentially a bunch of 1’s and 0’s but we haven’t written raw 1’s and 0’s as our software for a long time. The compiler has always been a way for us to transform slightly more complex concepts into lower level ones. Even the extremely low level machine code is a step above raw 1’s and 0’s. General purpose programming languages themselves consist of constructs used to transform into much more verbose machine code or IL.

Taking transformation to the next level of abstraction is necessary for us to effectively create DSLs. If there was a tool to help us easily perform those transformations then it would go a long way towards making external DSL authoring more realistic, which is what I’m hoping to do with MetaSharp.

So to me, at this point, Code Generation is just another form of Transformation, which I will be calling “Textual Transformation” from now on. It has its pros and cons, of which I hope to discuss further in other posts. However, my point today is simply to convey the idea of Transformation as more general and more important to the DSL world than simply Code Generation and also to consciously force myself to update my lexicon.

CodeCamp Evaluation Results

Rating Avg: 7.7

I had a few good comments and a few negative ones. One person said “A bit shallow” and another person said “This went very deep”, so it can be a little hard to take something away from that. Maybe I’ll try to respond to various comments directly.

5, Could’ve been slightly better prepared, perhaps with more demos.

I agree with this, this commenter is probably specifically talking about the Ruby portion of the presentation. I want to apologize to Rubyists for not having a more solid demo there but Mike Frawley helped me show something at least and talk about it. Better than nothing at least :-/

9, This went very deep. I enjoyed the intro to BOO as well. I would have liked to seen at least one concrete example of how to do this with C# using extension methods.

I was thinking about showing a RhinoMocks example. That would have been better than just the simple elevator app samples. If you made this comment check out RhinoMocks or the Umbrella project for good examples of extension method driven APIs in C#.

5, A bit shallow – I wanted to hear more about M and Oslo

I wonder if this person came to the presentation expecting more M and Oslo? Or just saw me talk about it a little bit at the end and wanted to hear more. To be honest, I don’t know a lot about M or Oslo (except conceptually) but I know a lot about MGrammar (which is distinctly different). I’d love to do another presentation on that I think. Trying to hit that depth where its interesting for everyone isn’t easy I guess.

7, I dint had much background information

Well hopefully some of my general overviews of different kinds of DSLs and ways to use them helped you come away with a little something at least.

10, Very concise.  Now I know what a DSL is and why I should care.  Great examples, followed through with the same example through different DSL’s. Interesting points about AST injection and making cross cutting concerns easier to decipher.  Topped it off with an excellent grand finale.  Good job.

Thanks! I have to admit the Grand Finale was pretty sweet. For those of you not there I played my enhanced Song demo using MetaSharp, which was modified to play the Super Mario Brothers Theme Song using Console.Beep. Unfortunately I ran out of time to really dive into the workings of this more but next time MetaSharp might make an interesting presentation in and of itself.

6, Very general topic, hard to figure out how to apply what I learned.

Well that’s actually a big bummer. I might have focused on Boo more than some people would have liked specifically so I could have something concrete that people could take away. I hope this commenter isn’t dismissing Boo because that is what I was hoping, if anything, people could use to apply DSLs here and now. So if you’re still not sure how to apply your newfound DSL knowledge go try Boo!

MGrammar Tokenization

I’m still having some fun playing around with MGrammar but I was pretty stumped by a seemingly simple issue for a while. My problem was how to tokenize Type names vs. Reference Expressions. For example consider the following C# code:

namespace Testing.Example

Now a simple reference expression:


In the first example, I essentially want the entire string, where as in the second example I have three nested references to variables or members. My first inclination was to tokenize this with something similar to this:

token Identifier = Letter+ (Letter | Number)*;
token TypeIdentifier = Identifier ("." Identifier)*;

syntax Namespace = "namespace" TypeIdentifier;
syntax ReferenceExpression = (Identifier ".")* Identifier;
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

This is a very rough example but the idea was to have two different types of identifiers, one for each syntax to use. Well it turns out this doesn’t really work and I think the reason is because the tokenization happens before the syntax evaluation. Meaning if you have a string such as “blah.Foo.Bar” and it matches the TypeIdentifier then that is what it will be. If your syntax expects an Identifier token but gets a TypeIdentifier token you’re out of luck. You cannot optionally get one token or another. There is one solution in MGrammar and that is to use the “final” keyword. Such as:

token Identifier = Letter+ (Letter | Number)*;
final token Namespace = "namespace";
syntax NamespaceExpression = Namespace TypeIdentifier;

This will ensure that the string “namespace” will match the Namespace token rather than the more liberal Identifier token. But that is about it, you get, essentially two levels of priority. This is fine though, you just have to rely more on your syntax’s (syntaxii?) and post processing. For example you might do this:

token Identifier = Letter+ (Letter | Number)*;

syntax NamespaceIdentifier = n:(n:NamespaceIdentifier "." => n)? i:Identifier 
    => NamespaceIdentifier { Parent = n, Name = i };
syntax NamespaceExpression = Namespace i:NamespaceIdentifier 
    => Namespace { Name = i } ;

syntax ReferenceExpression = r:(ReferenceExpression "." => r)? i:Identifier
    => ReferenceExpression { Target = r, Name = i };
syntax MethodInvokeExpression = ReferenceExpression GroupBegin Join(Expression, ",") GroupEnd

Of course this is very simplistic and doesn’t take into account other types of expressions but hopefully you get the idea. The trick is that your graph output for the namespace doesn’t yield a long string but instead it yields a nested node structure. When processing the graph you have to take this into account but that isn’t too hard.

The 5 Laws of Code Generation

This is a refresh of an old blog post. The more I look at Oslo and contrast it with what I have been working on I’m trying to verify that NBusiness isn’t actually redundant or made obsolete by Oslo. Obviously the two will be competitors, but what I’m trying to figure out is what actually is the difference between the two and more importantly why is NBusiness the more correct solution to the problem?

I would like to start out by saying that Oslo and NBusiness are both DSLs and core to any DSL is the transformation of the metadata into more concrete code, be that a less abstract DSL, actual executable code or something else entirely. So what I’m calling Code Generation is essentially that transformation process. Additionally what used to be called an “intermediate format” I’m now simply calling Metadata. To me, in this context, metadata is simply the name for the actual DSL declaration for a given application.

So here are those rules again, to be reframed into the context of Oslo and how I feel it may violate those rules.

1. Code generation is controlled through modifiable templates.

Translation of your metadata into another form should never be done through a black box. I’m not entirely sure how customizable the SQL generation of Oslo is but from what I understand it’s pretty opaque. In fact the entire system of translating MGraph into another form should be completely transparent and built such that it is very easy to be shaped into any form. If anyone can explain to me how Oslo translates M into sql and show me how I can do it myself, and alter the SQL that is generated then I’ll be happy to change my mind on this one but it feels rather opaque to me at this moment.

2. Code generation is done during the build process.

I would like to add an amendment to this one and specify that code generation can also legitimately be done dynamically as well. So the new rule could be more accurately changed to “Intermediate forms of metadata should never be persisted”. Not that you couldn’t write it out to temp folders but the point is that the integration of a DSL into an application should be seamless, you shouldn’t have to have multiple manual steps to get it all integrated. Whether this is done at runtime or at build time is irrelevant. Presumably the many command line apps that come with Oslo in order to transform your DSL into something that goes into the repository will be streamlined but this is the sort of thing that should be avoided.

3. Code generation is done from an intermediate format.

I would like to amend this rule to instead be “Metadata must have a single source of truth.” I think that Oslo has a pretty good system for the intermediate format (M) but it doesn’t follow the single source of truth rule. M is simply translated into the Repository, which is the real source of truth but is not necessarily synchronized in the reverse with the original M code (as far as I know at least).

To me Oslo violates this rule simply by the existence of the Repository. The repository is essentially a “second source of truth” and can be edited from multiple sources. To me the DSLs should be the single source of truth and the repository should be essentially a temp file or part of the output of the build. Editing the repository should simply be a matter of editing your entities.

4. The intermediate format is under source control with versioning.

I would like to amend this rule also to specify that only deltas must be part of each revision. So maybe this rule could be simply changed to “The metadata declaration must be versionable and mergable”. Which, usually means that your DSL needs to be textual. I would be willing to buy into a binary format for metadata but only if it had the ability to be versioned with my preferred source control system as deltas not just a giant binary blob and as long as it didn’t break any of the above rules in the process.

5. The code generation templates are also under source control.

This is to place the same constraints on the transformation system as the metadata itself. This will usually translate into the idea that your templates should also be textual. Again, if you want it to be some binary format then there needs to be ways to allow source control tools to persist only deltas. So this rule could be changed to be “The transformation templates must be versionable and mergable”.


Here is a summary of these revised rules:

  1. Metadata transformation is controlled through modifiable templates.

  2. Intermediate forms of metadata should never be persisted.

  3. Metadata must have a single source of truth.

  4. The metadata must be versionable and mergable.

  5. The transformation templates must be versionable and mergable.