New meta# nuget package published

I uploaded a new meta# NuGet package:

PS> Install-Package metasharp

It only has pattern matching api’s in this package now. To make patterns, just start with MetaSharp.Transformation.Pattern and all of the pattern factories are static methods from there. A simple example:

var p = Pattern.OneOrMore(Pattern.Range(‘a’, ‘z’));
var m = Pattern.Match(p, “metasharp”);
if (m.Matched) { … }

 

This package is .net 2.0 now too, so you can use it in more places (such as Unity3d). I’m planning on putting the actual grammars and build tasks etc into other packages so that the main package remains very portable. Also I changed it so that ‘p’ in the above scenario is a Pattern ast object instead of a Delegate. Which means that you could theoretically visit the pattern and transform it, itself. Also tracing produced objects will give you the pattern that produced it as well as the metadata.

The most major improvement of the API from before is the fact that you don’t have to implement INode on the objects you produce. Before everything had to be an INode and so you would have to do some awkward wrapping / unwrapping if your tree didn’t implement INode but now it just works with any plain old CRL object. Also it uses DynamicInvoke on productions so you can just bind them directly to any method with parameters of any type:

var p = Pattern.Production(
    Pattern.Variable(“x”, Pattern.Any()),
    (int x) => x + 1); // throws if the value consumed isn’t an int

 

I hope that makes sense. My first attempt made use of generics to attempt to flow static typing through the productions (which is definitely possible) but the insufficient type inference and support for monads in C# caused the complexity of the patterns to be unbearable. I found that if you use object and DynamicInvoke instead, you get similar results with reasonable complexity. The end result is that this api isn’t strictly monadic, since there is no Bind or Select/SelectMany. But it is monadic in spirit, all of the objects involved in the pattern matching operation are immutable and are safely parallelizable, except possibly your custom productions.

I’m pleased with the performance so far, the Or pattern executes each branch in parallel. I haven’t really stressed it that far yet but my 210 unit tests take about .45 seconds to run total so that seems pretty good. I want to create a Reflection.Emit visitor so I can get even better performance for rules but, unfortunately, that API isn’t available on all platforms (e.g. WinRT) so I’m not sure how best to do that.

The next big step is to create a new package with a C# grammar + roslyn transformer. Before it was a 1-off language. This time I plan to redo that work except as C# so that it is more appealing to users. I also want to change the grammar language to not be inheritance based, but compositional. This will require a little experimentation but I think it will make it a lot more appealing. The lack of extensibility in Roslyn is a real bummer because as far as I can tell there is no way for me to easily extend Roslyn to add pattern syntax to C# that way. The result is that I can leverage roslyn to generate assemblies from my AST but I still have to implement my own C# grammar if I want to extend it. A bummer but not harder than what I did before.

If you get a chance to try it out, I would appreciate feedback on the Pattern api though 🙂

Enjoy!

Maybe not null

We have a problem in the code of the project I am currently working on. Well actually it’s not just our problem, it’s actually a problem with C# and .NET. And even then it’s actually a fairly pervasive problem with almost our entire industry, it is the problem of null. Or in the words of Tony Hoare the billion dollar mistake.

But I am noticing right now, more so than ever before, the problem with null in the code I am currently working on. I have heard people talk about it before and I have been somewhat aware of the issues and I have studied the Maybe monad but I have never really found null’s to be too painful on the scale that other people seem to have. So I didn’t fully understand the problem.

I have been talking about it with a coworker of mine (Eljay Love Jensen) and he put it into a new light which I think finally cemented the problem in my mind and I was able to put it into C# terms in a way that I think is easy for anyone to understand. This is definitely old news and has been thought of a million times before but I would like to try my hand at putting the problem into terms of C# and hopefully help other people who are wondering what the big deal really is with null.

In my words here is the big problem with null in C#…

  • null is commonly used to express at least three different semantics:
    • uninitialized
    • error
    • missing

But we can do something to solve this problem. The first step to solving it though is to understand why it is a problem. The reason why is that it becomes very easy to make a mistake in determining the difference between the three possible semantics of null at any point in time. It can become even harder for a caller because if the author of the member returning null isn’t very careful it can be impossible for the caller to know what the semantic meaning of the null value it is getting really is.

Here is a simple example that you have probably written before…

class Example
{
    private Foo foo;

    public Foo Foo
    {
        get 
        {
            if (foo == null)
                foo = CreateFoo();

            return foo; 
        }
    }

    public Foo CreateFoo()
    {
        return null; // error or there was actually no foo?
    }
}

When getting Foo we are using null to indicate that the value is unintialized first, but then later we are receiving null in our initialization method either because an error happened or because Foo actually doesn’t exist and null is used to represent all three states. The second time you call this property you do not have enough information to know which of those three states foo is really in.

You may have even written something like this a few times and noticed the issue where the object tries to initialize multiple times in some rare cases but usually initializes and works just fine the first time. So since it’s rare you then you dismissed it and never thought about it again. Also, notice how the caller of the Foo property has no idea why it’s getting null, it could be due to an error or because Foo doesn’t actually exist (such as a missing row in a database when looking for id=100).

This is a very common mistake frankly. Much later on you realize that you are experiencing NullReferenceExceptions due to the confusion about the semantic meaning of null in all your calling code.

This problem amplifies as your program grows in complexity until you find yourself changing code like this…

Do(a.B.C.D);

Into code like this…

if (a.B != null)
{
    B b = a.B;
    if (b.C != null)
    {
        C c = b.C;
        if (c.D != null)
        {
            Do(c.D);
        }
    }
}

This is not good. I therefore would like to propose a few coding guidelines for C#.

Henceforth…

  • null may only be used to mean error state.
  • Lazy<T> is used to mean uninitialized.
  • Maybe<T> is used to mean it is legitimate for it to not exist.

Therefore, if you ever look into your debugger and see a null reference it is a bug in your code. And it is completely unambiguous from the callers perspective that if they are getting a null return value then something went wrong. And if Lazy<T> has no value yet then it is uninitialized and if Maybe<T> is false then they know that the reference is legitimately not there.

If you’re not sure what Lazy<T> is, you can read about the details of it on MSDN page for Lazy<T>.

If you’re not sure what Maybe<T> is, it is not a class in .NET proper but there are many implementations of it on the web you can look at and learn about. For example here is a blog post by Jordan Terrell about his work on Maybe<T> for .NET.

But basically the maybe monad is one of the simplest examples of a monad and is fun to learn. It is essentially just an object that has a both a boolean and a value. If the boolean is false then the value is non-existent and if it is true then the value is there and may be used. It gets to be a brain bender learning about the lambda calculus of monads but in reality it’s not that complicated of an idea in its simplest form, don’t worry about it too much.

Therefore in the above “Do(a.B.C.D)” code, if those properties are not Maybe<T> then you know it is safe to assume that those properties will not ever be null. It might throw an exception in the case of an error but not a NullReferenceException. And if they are Maybe<T> you can use linq (which is actually a syntax for the monadic bind operator) to resolve them in a more elegant way than those icky nested if blocks, like so (depending on how you implement Maybe<T>):

var d = from b in a.B
        from c in b.C
        select c.D;

if (d.HasValue)
{
    Do(d.Value);
}

If any of the values in the chain (a, b, c or d) are HasValue==false, then evaluation will stop and d.HasValue will be false. It will never throw a null reference exception if one of the values are legitimately missing.

I think this is a fairly easy to follow convention and can be used to improve the problem of falling into the pit trap of the billion dollar mistake.

Immutable AVL Tree in C#

I recently decided that I needed an immutable dictionary but after some experimentation decided that wasn’t as easy as it sounded. After doing some searching online I came to the conclusion that the most reasonable alternative was to implement an immutable AVL Tree where the nodes were key/value pairs.

The AVL Tree has O(log N) lookup and insertion times, which means at most it will look at nodes equal to the depth of the tree and the tree is strictly balanced at all times. Surprisingly, I had a hard time finding a really clear example online in a language. So I would like to post my solution here since I feel like it’s reasonably clean and terse (insertion only so far).

The trick to understanding AVL trees is to understand that the tree is to always remain balanced. Which is to say that the height of any Left branch of a node should at most be no more than 1 higher than the Height of its Right branch. If you find that the height difference is 2 or –2 then you have an imbalanced tree. This is expressed in the Add method after insertion like so:

var lh = n.Left == null ? 0 : n.Left.Height;
var rh = n.Right == null ? 0 : n.Right.Height;
var b = lh - rh;

if (Math.Abs(b) == 2) // 2 or -2 means unbalanced
{
  // rebalance
}

If the value is 2 then your Left branch is unbalanced if the value is –2 then the Right branch is unbalanced.

if (Math.Abs(b) == 2) // 2 or -2 means unbalanced
{
    if (b == 2) // L
    {
        // balance Left branch
    }
    else // R
    {
        // balance Right branch
    }
}

The next thing to do is to check the balance of the unbalanced branch, who’s balance should be either 1 or –1, like in the case of an unbalanced Left branch below:

var llh = n.Left.Left == null ? 0 : n.Left.Left.Height;
var lrh = n.Left.Right == null ? 0 : n.Left.Right.Height;
var lb = llh - lrh;
if (lb == 1) // LL
{
    // rotate right
}
else // LR
{
    // rotate left // rotate right
}

So given this there are actually four cases we need to cover for any tree:

L R
L LL LR
R RL RR

And the solution to rebalance the tree is to “rotate” the nodes, in the cases of LL and RR you only need to rotate the unbalanced Right or Left branch but in the case of LR or RL you need to rotate twice, first the branch then the node itself. Here is the matching table for the rotations:

L R
L rotate right rotate leftrotate right
R rotate left rotate rightrotate left

It took me a while to understand it but the Wikipedia article actually has a pretty good diagram showing these rotations.

Here is the Left and Right rotation algorithms in C# (complete with awesome ascii art comments!):

private static Node<TKey, TValue> RotateRight(Node<TKey, TValue> node)
{ 
    //       (5)            4     
    //       / \           / \    
    //      4   D         /   \   
    //     / \           3     5  
    //    3   C    -->  / \   / \ 
    //   / \           A   B C   D
    //  A   B                     
    
    var L = node.Left.Left;
    var R = new Node<TKey, TValue>(node.Key, node.Value, node.Left.Right, node.Right);
    var N = new Node<TKey, TValue>(node.Left.Key, node.Left.Value, L, R);
    return N;
}

private static Node<TKey, TValue> RotateLeft(Node<TKey, TValue> node)
{ 
    //    (3)               4     
    //    / \              / \    
    //   A   4            /   \   
    //      / \          3     5  
    //     B   5   -->  / \   / \ 
    //        / \      A   B C   D
    //       C   D                      
    
    var L = new Node<TKey, TValue>(node.Key, node.Value, node.Left, node.Right.Left);
    var R = node.Right.Right;
    var N = new Node<TKey, TValue>(node.Right.Key, node.Right.Value, L, R);
    return N;
}

Notice that we’re always creating new nodes rather than mutating the existing nodes. This is crucial to maintain immutability. These trees can now be safely used across threads since their states are never mutated.

Download the full sample: Map.zip

Very Handy Addition to Linq

I have found this custom extension method to be extremely useful when working with linq:

public static IEnumerable<T> Cast<T>(this object value)
{
    var enumerableOfTValue = value as IEnumerable<T>;
    if (enumerableOfTValue != null)
        return enumerableOfTValue;

    var enumerableValue = value as IEnumerable;
    if (enumerableValue != null)
        return Enumerable.Cast<T>(enumerableValue);

    return Enumerable.Cast<T>(new[] { value });
}

Basically it lets you turn any object into an IEnumerable<T>. Its especially handy when working with an object and you don’t know if it’s a single object or a set.

List<T> is dead, long live LINQ

Well, ok, maybe not completely dead. It has it’s uses but for the most part I find myself using an actual list less and less. I do end up using ObservableCollection<T> for ViewModels still, of course, but that can’t be avoided.

I do find myself slowly banishing it from my daily coding habits however, thanks to LINQ. I am finding that the covariant interface IEnumerable<T> is almost always a better choice as a return value for methods and properties and with the help of the higher order set functions provided along with LINQ there is nothing you can’t do without List<T> that you could have done with it.

Also, I would like to mention that when I am talking about LINQ I am talking exclusively about the extension methods found in System.Linq.Enumerable not the actual internal DSL syntax in C# / VB. I almost never use that syntax since it can’t express all of the functions I need and is sometimes hard to debug.

I did have to add one extension method to help ease general use of LINQ however.

public static class EnumerableExtensions
{
    public static IEnumerable<T> ToEnumerable<T>(this T @object)
    {
        if (@object == null)
            return System.Linq.Enumerable.Empty<T>();

        var enumerable = @object as IEnumerable;
        if (enumerable == null)
            return new[] { (T)@object };

        return enumerable.Cast<T>();
    }
}

This allows you to ensure than any object can be used as an IEnumerable<T>. With this in hand you can replace any method on List<T>. Here is an example of a bunch of methods on List<T> and their equivalent LINQ translations.

var a = 1;
var b = 2;
var c = 3;

var list = new List<int>();
var set = Enumerable.Empty<int>();

list.Add(a);
set = set.Concat(a.ToEnumerable());

list.Remove(a);
set = set.Where(i => i != a);

list.AddRange(new[] { a, b });
set = set.Concat(a.ToEnumerable()).Concat(b.ToEnumerable());

list.Sort();
set = set.OrderBy(i => i);

list.Contains(a);
set.Any(i => i == a);

var count1 = list.Count;
var count2 = set.Count();

list.FindAll(i => i == a);
set.Where(i => i == a);

list.IndexOf(a);
set.TakeWhile(i => i == a).Count();

list.FindLast(i => i == a);
set.Last(i => i == a);

list.GetRange(0, 2);
set.Skip(0).Take(2);

list.Insert(1, c);
set = set.Take(1).Concat(c.ToEnumerable()).Concat(set.Skip(1));

list.Reverse();
set = set.Reverse();

list.TrueForAll(i => i == 0);
set.All(i => i == 0);

Note that in order to get some of the features of List<T> you have to compose various LINQ functions together, such as Insert. At first this may seem overly verbose but the benefit is an incredible flexibility. There is nearly no limit on what you can express with LINQ because of this composability (also you could create extensions that simply wrap up the complex composition with a neater signature). List<T> has a few more methods that I will leave up to you to translate but LINQ has many more functions and function composition possibilities that I would highly recommend discovering.

Executing the above code and printing list and set results in this output.

list: { 2 3 1 }
set: { 2 3 1 }

When you use LINQ your set is immutable at every point in time, which is very helpful in case you would like to try to use the Parallel Extensions (aka PLINQ). One other interesting aspect of the LINQ extensions is deferred execution. This means that none of my set operations above are even executed until I iterate over the set in my print function. This allows me to change the value of ‘a’, reiterate and receive a different result. This can be confusing if you’re unaware of it but it can be a huge win if you understand it well. This is all possible because of the functional nature of LINQ and (as best as I can understand) is an example of a Monad in .NET. And, of course, coroutines in C# compliment LINQ very nicely as well (aka yield return). I think it would be very interesting to experiment with a language where every variable is essentially treated as a set, NULL is not possible but empty sets are. An object is simply a set of 1. It would be interesting.

Another very cool aspect of LINQ is Lambda Expressions and how they are related to Expression Trees and the DLR. It’s extremely powerful and they are intimately tied together.

Also, do check out IEnumerable<T>’s mirrored twin: IObservable<T>.

Object Matching in MetaSharp

I’ve been delaying this feature in MetaSharp for a while because I wasn’t entirely sure how I wanted it to work and it’s a very important feature. I also haven’t seen any other examples in the wild that matched what I’m trying to do exactly so I was slightly unsure of the syntax.

But I have finally dug in and got it working! It’s by no means set in stone, so if anyone has any good suggestions I will likely incorporate them. Feel free to send me suggestions.

If you’re wondering what I mean by Object Matching I’ll try to explain, first what it means and second why I think it’s important.

Object Matching

Most parsers in the world today are concerned with general purpose programming languages and, as a result, parsing text and transforming it into executable code. The patterns around this problem set are well known and have many implementations. However, the interesting thing is that they are largely geared towards accepting only text as an input and therefore rely on multiple patterns to solve the problems of compilers.

In MetaSharp there is a now Pattern for matching complex objects as well as characters. What this means is that your input can actually be streams of object trees instead of just strings.

For comparison, suppose you wanted to match a social security number from a string using MetaSharp. Here is how you might do it:

SSN = Number#3 “-” Number#2 “-” Number#4;

If your string input was “123-45-6789” then this SSN rule would match that. This is not all that different from regular expressions, though there are some notable differences.

But suppose what you actually had was a Customer object, which has an SSN property? With object matching to you can match on the entire object instead of just the string.

PersonWithSsn = Person { Ssn = SSN };

In this case the curly braces indicate an object match, where the object type is of Type Person and possesses the property Ssn with a value that matches the SSN rule. This is a pretty trivial example but the value of this becomes more apparent when you consider large object graphs and you have the desire to transform that object graph into something else.

Transforming Object Graphs

This is actually a very common operation in software and the standard two design patterns for doing this are the State Machine and Visitor patterns. In fact both techniques are required as multiple steps in order to implement a basic compiler. Using Object Matching patterns you can effectively create a Visitor pattern in the same grammar language as needed to create a State machine.

This is the real benefit of object matching, being able to use the same grammar to create State Machines and Visitors means that you can create every step in a compiler using a single language, tool or skill.

For example, in MetaSharp I have a sample application that plays songs using Console.Beep. It’s pretty silly but its a nice integration test for me. In this sample I have two songs and after parsing the DSL text what I used to do was to pass the AST resulting from that parse into a T4 template, which would then generate some code that would get compiled into linq expressions.

Here is an example of the T4 template I am referring to:
http://metasharp.codeplex.com/SourceControl/…/DynamicSong.template

<#@ template language=”C#” inherits=”SongMetaTemplate” hostspecific=”true” debug=”true” #>
<#@ assembly name=”DynamicSongBuilder” #>
<#@ import namespace=”DynamicSongBuilder” #>
<# foreach(var song in this.Songs) { #>
    <# foreach(var b in song.Bars) { #>
        <# foreach(var n in b.Notes) { #>
            play(<#=song.Duration #>, “<#=n.Key #>”, <#=n.Octave #>);
        <# }
    }
} #>

This works and is a legitimate technique and I don’t think it’s very popular to say this but string transformations like this have a couple serious drawbacks.

  1. The general purpose languages are hard coded, if you wanted to make this reusable you’d have to rewrite it for every language you wanted to support.
  2. You lose contextual information about your original object.
    • C# is a little better at this than some by giving you the #line directive but it’s still not ideal.
  3. It’s slow. With T4 you either get an unwanted assembly loaded into memory or you have to create a separate AppDomain which is very slow and cludgy.
  4. Tag Soup (http://www.codinghorror.com/blog/2008/07/web-development-as-tag-soup.html)

These drawbacks are usually acceptable. You’re usually only creating a website once, for example. And your views typically don’t get reused as part of another application. Being the top most part of the stack, it’s ok to lose a little context. But in a world where your intention is to create reusable grammars and inheritable grammars it’s very non-ideal.

Here is how you would implement the same transformation using a grammar with Object Matching:

grammar DynamicSongTransform:

Main = SongTransform;

SongTransform
= Song { b:Bars = [BarTransform*] }
-> BlockStatement { Statements = b.Flatten() };

BarTransform
= Bar { n:Notes = [NoteTransform*] }
-> n;

NoteTransform
= n:Note { }
-> InvokeExpression {
Target = ReferenceExpression { Name = "play" },
Parameters = [
PrimitiveExpression { Value = n.Parent.Parent.Duration.ToNumber() },
PrimitiveExpression { Value = n.Key },
PrimitiveExpression { Value = n.Octave.ToNumber() }
]
};
end

This grammar accepts a Object Stream containing a single Song object. That Song object will contain a collection of Bar object which in turn contains a collection of Note objects.

As a refresher the ‘->’ token indicates a projection. Which means the pattern to match is on the left side of the projection and if the match is successful then the right hand side is what is produced. If no projection is specified then the matched value is projected by default.

In the projection of the NoteTransform we are projecting an InvokeExpression, which has a Target and some parameters. This then gets passed down into a DynamicCompile step i created which transforms these objects into Linq expressions, which in turn translates those expressions into IL! In this way every step of our DSL is re-usable. This will work for VB or C# developers. It will also work by passing in a textual DSL or by passing in a Song AST created by other means. You could create a different textual DSL that produces the same AST and this will still work and we have debugging context data all the way to the end. Also, the general purpose code transformers are all implemented by MetaSharp so you don’t have to do that every time.

My theory is that creating a custom DSL should be as simple as creating multiple grammars and snapping them together. The next step will be to implement a general purpose UI that will let you use attributes to annotate your grammars to give hints about highlighting and other IDE related features. Keep an eye out for basic VS and NuGet integration in the near future!