Earlier this week I read a blog post by Donn Felker who wrote about a FireFox plugin he discovered. What he was realizing was that a giant mess minified of JSON wasn’t actually human readable and that a plugin that simply formats the text was necessary to make sense of it.
It’s at this point that I have to start questioning the value of “human readable” data formats in general. We could have just as easily created a Firefox plugin that converts binary data format into something human readable, so why didn’t we? When I look at JSON I just can’t help but think how terribly verbose and wasteful it is. Ok, I’ll concede that it’s somewhat better than XML but it has the name of every property present for every object. This is ridiculous. Could we not have at least dropped the property names and provide a prototype to the deserializer to determine property names? How much bandwidth is wasted on completely unnecessary duplicate data? How much slower is parsing and deserializing when we have all this extra muck? It’s actually not that hard to parse binary file formats but it is becoming a lost art nonetheless.
The first argument to my hypothesis that comes to mind is that JSON is more flexible this way, since there is no static Type it’s easier to parse and more flexibly handled by code. I would just like to say that I think this is largely a myth. The consumer of your JSON will inevitably come to count on particular fields and object structures. By convention or documentation or inference you will be as unable to alter properties and object structures as if you had static types. If you were to try to change them you would create null references and code errors in consuming code as surely if they were statically compiled.
Furthermore, the parsing of a text file is much slower than the parsing of a binary file. In my opinion the reading and writing of binary files is a much simpler problem overall and I am officially voicing doubts about the human readability of non-domain specific textual formats.
One of my first encounters with a binary file format was the lowly bitmap (.bmp). My goal was to take two bitmaps and combine them side by side into one. I was doing this so that I could display stereoscopic images on a special stereoscopic projector rig taken by my DYI stereoscopic camera set. It was a lot of fun and I learned a lot about binary files and my fear of them waned.
The bitmap is, in my opinion, one of the simplest file formats and is easy to emulate in form. It begins with a header of a fixed number of bytes. If you create a structure of the right size it’s easy to map that header directly onto that structure in memory. That structure contains information about the rest of the file, such as the length of the image buffer and the height and width of the file, from which you can infer the number of other structures to read from the file. If you do try this technique I would highly recommend putting some type of version information in your header so that it’s easy to alter the format while still being able to read the older formats as well.
In .NET there are several tools that make reading and objects as binary data quite easy. Such as the method Marshal.PtrToStructure(…), which is useful for mapping a byte[] directly to a structure and BitConverter which lets you convert bytes to and from smaller, common structures.