New meta# nuget package published

I uploaded a new meta# NuGet package:

PS> Install-Package metasharp

It only has pattern matching api’s in this package now. To make patterns, just start with MetaSharp.Transformation.Pattern and all of the pattern factories are static methods from there. A simple example:

var p = Pattern.OneOrMore(Pattern.Range(‘a’, ‘z’));
var m = Pattern.Match(p, “metasharp”);
if (m.Matched) { … }

 

This package is .net 2.0 now too, so you can use it in more places (such as Unity3d). I’m planning on putting the actual grammars and build tasks etc into other packages so that the main package remains very portable. Also I changed it so that ‘p’ in the above scenario is a Pattern ast object instead of a Delegate. Which means that you could theoretically visit the pattern and transform it, itself. Also tracing produced objects will give you the pattern that produced it as well as the metadata.

The most major improvement of the API from before is the fact that you don’t have to implement INode on the objects you produce. Before everything had to be an INode and so you would have to do some awkward wrapping / unwrapping if your tree didn’t implement INode but now it just works with any plain old CRL object. Also it uses DynamicInvoke on productions so you can just bind them directly to any method with parameters of any type:

var p = Pattern.Production(
    Pattern.Variable(“x”, Pattern.Any()),
    (int x) => x + 1); // throws if the value consumed isn’t an int

 

I hope that makes sense. My first attempt made use of generics to attempt to flow static typing through the productions (which is definitely possible) but the insufficient type inference and support for monads in C# caused the complexity of the patterns to be unbearable. I found that if you use object and DynamicInvoke instead, you get similar results with reasonable complexity. The end result is that this api isn’t strictly monadic, since there is no Bind or Select/SelectMany. But it is monadic in spirit, all of the objects involved in the pattern matching operation are immutable and are safely parallelizable, except possibly your custom productions.

I’m pleased with the performance so far, the Or pattern executes each branch in parallel. I haven’t really stressed it that far yet but my 210 unit tests take about .45 seconds to run total so that seems pretty good. I want to create a Reflection.Emit visitor so I can get even better performance for rules but, unfortunately, that API isn’t available on all platforms (e.g. WinRT) so I’m not sure how best to do that.

The next big step is to create a new package with a C# grammar + roslyn transformer. Before it was a 1-off language. This time I plan to redo that work except as C# so that it is more appealing to users. I also want to change the grammar language to not be inheritance based, but compositional. This will require a little experimentation but I think it will make it a lot more appealing. The lack of extensibility in Roslyn is a real bummer because as far as I can tell there is no way for me to easily extend Roslyn to add pattern syntax to C# that way. The result is that I can leverage roslyn to generate assemblies from my AST but I still have to implement my own C# grammar if I want to extend it. A bummer but not harder than what I did before.

If you get a chance to try it out, I would appreciate feedback on the Pattern api though 🙂

Enjoy!

Clipboard app released to Win8 App Store

Leo and I recently released an application to the Windows 8 app store. It’s a very simple app called Clipboard! Download it at the app store here:

http://apps.microsoft.com/webpdp/app/clipboard/d12111fe-bfa8-4050-96c1-c6571e1f3230

Clipboard is OSS on codeplex here as well: https://winrtclipboard.codeplex.com/

Maybe not null

We have a problem in the code of the project I am currently working on. Well actually it’s not just our problem, it’s actually a problem with C# and .NET. And even then it’s actually a fairly pervasive problem with almost our entire industry, it is the problem of null. Or in the words of Tony Hoare the billion dollar mistake.

But I am noticing right now, more so than ever before, the problem with null in the code I am currently working on. I have heard people talk about it before and I have been somewhat aware of the issues and I have studied the Maybe monad but I have never really found null’s to be too painful on the scale that other people seem to have. So I didn’t fully understand the problem.

I have been talking about it with a coworker of mine (Eljay Love Jensen) and he put it into a new light which I think finally cemented the problem in my mind and I was able to put it into C# terms in a way that I think is easy for anyone to understand. This is definitely old news and has been thought of a million times before but I would like to try my hand at putting the problem into terms of C# and hopefully help other people who are wondering what the big deal really is with null.

In my words here is the big problem with null in C#…

  • null is commonly used to express at least three different semantics:
    • uninitialized
    • error
    • missing

But we can do something to solve this problem. The first step to solving it though is to understand why it is a problem. The reason why is that it becomes very easy to make a mistake in determining the difference between the three possible semantics of null at any point in time. It can become even harder for a caller because if the author of the member returning null isn’t very careful it can be impossible for the caller to know what the semantic meaning of the null value it is getting really is.

Here is a simple example that you have probably written before…

class Example
{
    private Foo foo;

    public Foo Foo
    {
        get 
        {
            if (foo == null)
                foo = CreateFoo();

            return foo; 
        }
    }

    public Foo CreateFoo()
    {
        return null; // error or there was actually no foo?
    }
}

When getting Foo we are using null to indicate that the value is unintialized first, but then later we are receiving null in our initialization method either because an error happened or because Foo actually doesn’t exist and null is used to represent all three states. The second time you call this property you do not have enough information to know which of those three states foo is really in.

You may have even written something like this a few times and noticed the issue where the object tries to initialize multiple times in some rare cases but usually initializes and works just fine the first time. So since it’s rare you then you dismissed it and never thought about it again. Also, notice how the caller of the Foo property has no idea why it’s getting null, it could be due to an error or because Foo doesn’t actually exist (such as a missing row in a database when looking for id=100).

This is a very common mistake frankly. Much later on you realize that you are experiencing NullReferenceExceptions due to the confusion about the semantic meaning of null in all your calling code.

This problem amplifies as your program grows in complexity until you find yourself changing code like this…

Do(a.B.C.D);

Into code like this…

if (a.B != null)
{
    B b = a.B;
    if (b.C != null)
    {
        C c = b.C;
        if (c.D != null)
        {
            Do(c.D);
        }
    }
}

This is not good. I therefore would like to propose a few coding guidelines for C#.

Henceforth…

  • null may only be used to mean error state.
  • Lazy<T> is used to mean uninitialized.
  • Maybe<T> is used to mean it is legitimate for it to not exist.

Therefore, if you ever look into your debugger and see a null reference it is a bug in your code. And it is completely unambiguous from the callers perspective that if they are getting a null return value then something went wrong. And if Lazy<T> has no value yet then it is uninitialized and if Maybe<T> is false then they know that the reference is legitimately not there.

If you’re not sure what Lazy<T> is, you can read about the details of it on MSDN page for Lazy<T>.

If you’re not sure what Maybe<T> is, it is not a class in .NET proper but there are many implementations of it on the web you can look at and learn about. For example here is a blog post by Jordan Terrell about his work on Maybe<T> for .NET.

But basically the maybe monad is one of the simplest examples of a monad and is fun to learn. It is essentially just an object that has a both a boolean and a value. If the boolean is false then the value is non-existent and if it is true then the value is there and may be used. It gets to be a brain bender learning about the lambda calculus of monads but in reality it’s not that complicated of an idea in its simplest form, don’t worry about it too much.

Therefore in the above “Do(a.B.C.D)” code, if those properties are not Maybe<T> then you know it is safe to assume that those properties will not ever be null. It might throw an exception in the case of an error but not a NullReferenceException. And if they are Maybe<T> you can use linq (which is actually a syntax for the monadic bind operator) to resolve them in a more elegant way than those icky nested if blocks, like so (depending on how you implement Maybe<T>):

var d = from b in a.B
        from c in b.C
        select c.D;

if (d.HasValue)
{
    Do(d.Value);
}

If any of the values in the chain (a, b, c or d) are HasValue==false, then evaluation will stop and d.HasValue will be false. It will never throw a null reference exception if one of the values are legitimately missing.

I think this is a fairly easy to follow convention and can be used to improve the problem of falling into the pit trap of the billion dollar mistake.

Immutable AVL Tree in C#

I recently decided that I needed an immutable dictionary but after some experimentation decided that wasn’t as easy as it sounded. After doing some searching online I came to the conclusion that the most reasonable alternative was to implement an immutable AVL Tree where the nodes were key/value pairs.

The AVL Tree has O(log N) lookup and insertion times, which means at most it will look at nodes equal to the depth of the tree and the tree is strictly balanced at all times. Surprisingly, I had a hard time finding a really clear example online in a language. So I would like to post my solution here since I feel like it’s reasonably clean and terse (insertion only so far).

The trick to understanding AVL trees is to understand that the tree is to always remain balanced. Which is to say that the height of any Left branch of a node should at most be no more than 1 higher than the Height of its Right branch. If you find that the height difference is 2 or –2 then you have an imbalanced tree. This is expressed in the Add method after insertion like so:

var lh = n.Left == null ? 0 : n.Left.Height;
var rh = n.Right == null ? 0 : n.Right.Height;
var b = lh - rh;

if (Math.Abs(b) == 2) // 2 or -2 means unbalanced
{
  // rebalance
}

If the value is 2 then your Left branch is unbalanced if the value is –2 then the Right branch is unbalanced.

if (Math.Abs(b) == 2) // 2 or -2 means unbalanced
{
    if (b == 2) // L
    {
        // balance Left branch
    }
    else // R
    {
        // balance Right branch
    }
}

The next thing to do is to check the balance of the unbalanced branch, who’s balance should be either 1 or –1, like in the case of an unbalanced Left branch below:

var llh = n.Left.Left == null ? 0 : n.Left.Left.Height;
var lrh = n.Left.Right == null ? 0 : n.Left.Right.Height;
var lb = llh - lrh;
if (lb == 1) // LL
{
    // rotate right
}
else // LR
{
    // rotate left // rotate right
}

So given this there are actually four cases we need to cover for any tree:

L R
L LL LR
R RL RR

And the solution to rebalance the tree is to “rotate” the nodes, in the cases of LL and RR you only need to rotate the unbalanced Right or Left branch but in the case of LR or RL you need to rotate twice, first the branch then the node itself. Here is the matching table for the rotations:

L R
L rotate right rotate leftrotate right
R rotate left rotate rightrotate left

It took me a while to understand it but the Wikipedia article actually has a pretty good diagram showing these rotations.

Here is the Left and Right rotation algorithms in C# (complete with awesome ascii art comments!):

private static Node<TKey, TValue> RotateRight(Node<TKey, TValue> node)
{ 
    //       (5)            4     
    //       / \           / \    
    //      4   D         /   \   
    //     / \           3     5  
    //    3   C    -->  / \   / \ 
    //   / \           A   B C   D
    //  A   B                     
    
    var L = node.Left.Left;
    var R = new Node<TKey, TValue>(node.Key, node.Value, node.Left.Right, node.Right);
    var N = new Node<TKey, TValue>(node.Left.Key, node.Left.Value, L, R);
    return N;
}

private static Node<TKey, TValue> RotateLeft(Node<TKey, TValue> node)
{ 
    //    (3)               4     
    //    / \              / \    
    //   A   4            /   \   
    //      / \          3     5  
    //     B   5   -->  / \   / \ 
    //        / \      A   B C   D
    //       C   D                      
    
    var L = new Node<TKey, TValue>(node.Key, node.Value, node.Left, node.Right.Left);
    var R = node.Right.Right;
    var N = new Node<TKey, TValue>(node.Right.Key, node.Right.Value, L, R);
    return N;
}

Notice that we’re always creating new nodes rather than mutating the existing nodes. This is crucial to maintain immutability. These trees can now be safely used across threads since their states are never mutated.

Download the full sample: Map.zip

Very Handy Addition to Linq

I have found this custom extension method to be extremely useful when working with linq:

public static IEnumerable<T> Cast<T>(this object value)
{
    var enumerableOfTValue = value as IEnumerable<T>;
    if (enumerableOfTValue != null)
        return enumerableOfTValue;

    var enumerableValue = value as IEnumerable;
    if (enumerableValue != null)
        return Enumerable.Cast<T>(enumerableValue);

    return Enumerable.Cast<T>(new[] { value });
}

Basically it lets you turn any object into an IEnumerable<T>. Its especially handy when working with an object and you don’t know if it’s a single object or a set.