justinmchase

Bing AI can write code in my custom programming language and its blowing my mind

I’ve been working on a programming language project for quite a while. Its been challenging and fun as my on-again-off-again side project for probably 10+ years.

Needless to say this programming language isn’t known by many people and it is fairly original while borrowing from a few other similarly obscure language projects.

The language is a parser generator language, a programming language programming language if you will. Its intent is to make it easier for people to create DSLs and the accompanying tools. You can check it out at https://github.com/justinmchase/uffda

So I figured it would be an interesting opportunity to see how good the new Big AI actually is by asking it to do something cognitively difficult for a human and which there is little to no literature available online for it to have been heavily trained in prior to my request.

So I asked it to “write a program in the Uffda language that parses Morse Code?”

It gave me a little guff saying that Uffda is actually a scandanavian expression, which is closer to what I expected it to do. But then it went on to say that its actually a TypeScript library for a parser generator and offered “If you want, I can write a program in TypeScript that uses the Uffda library to define a parser for Morse Code.😊”

It then gave me the option to press a button to “Yes, please write the program.”

And so I did… And the results kind of astonished me. It actually wrote what appears to be a viable parser and beyond that it took advantage of a few concepts which really suprised me:

The TypeScript code was correct (unsprisingly)
The Uffda code inside the string inside TypeScript is correct (a language inside a language!)
Its using Direct Left Recursion which is a pretty complex concept
It added completely accurate and relevant code comments
It actually included an example of running the code, as well as the expected output in a code comment, which implies that it simulated running the program of a language… even if it isn’t actually capable of doing that giving the impression it did is still pretty amazing.

The Code

// Import the Uffda library
import { dsl, uffda } from "https://deno.land/x/uffda/mod.ts";

// Define a parser for Morse Code using Uffda syntax
export const Morse = await uffda()`
  // A dot is represented by a single dot character
  Dot = ({ kind = 'Token', value = '.' } -> '.');
  
  // A dash is represented by a single dash character
  Dash = ({ kind = 'Token', value = '-' } -> '-');
  
  // A letter is composed of one or more dots and dashes
  Letter 
    = (l:Letter Dot r:Letter -> l + '.' + r) // Concatenate dots and dashes
    | (l:Letter Dash r:Letter -> l + '-' + r)
    | Dot // A single dot is also a letter
    | Dash // A single dash is also a letter
    ;
  
  // A word is composed of one or more letters separated by spaces
  Word 
    = (l:Word { kind = 'Token', value = ' ' } r:Letter -> l + ' ' + r) // Concatenate letters with spaces
    | Letter // A single letter is also a word
    ;
  
  // A sentence is composed of one or more words separated by slashes
  Sentence 
    = (l:Sentence { kind = 'Token', value = '/' } r:Word -> l + '/' + r) // Concatenate words with slashes
    | Word // A single word is also a sentence
    ;

  Main 
    > Sentence; // The main parser expects a sentence as input

`;

// Create a DSL function that parses Morse Code using the defined parser
export const morse = dsl(import.meta.url, Morse);

// Parse some Morse Code and print the result
const { value } = morse`.... . .-.. .-.. --- / .-- --- .-. .-.. -..`;
console.log(value); // Prints ".... . .-.. .-.. --- / .-- --- .-. .-.. -."

Now as I’m expanding this I can see it actually got the code a little bit wrong. The last line of the parser is the Main rule which should actually be…

Main = ${Basic} > Sentence;

So I told it that it made a mistake and forgot to import the Basic pattern and pipe it into the Sentence pattern. I asked it to correct and reprint the program…

It emitted the same program again but this time it actually included the missing Basic pattern import.

However it did completely omit the Main rule as well. So I gave it another hint and asked it to try again.

And so this time it printed it out but still not fully correct (> instead of =).

Also, somewhat astonishing here is that the code comment is saying “The main parser expects a sentence as input after applying the Basic pattern”, which is correct despite me calling it a “rule” in the previous comment (A rule is a kind of a pattern in this language).

Moving on I asked it to print out each letter as Ascii instead of printing out the input dot’s and dashes. Here it made a pretty big mistake by adding a switch statement which isn’t even valid syntax in uffda.

So I then gave it a hint about pattern matching and the correct syntax and it had pretty amazing results.

It sort of broke down around the Letter O and stopped printing spaces between the patterns, the Z pattern completely stopped using Dot and Dash patterns and just printed out literal dots and dashes. But honestly I’m still pretty impressed with the results.

Asking it to fix the O through Z patterns by adding spaces seems to not be working. Asking it to use Dot and Dash pattern references instead of verbatim . and – characters seems to have exacerbated the problem. So at this point I think we hit the edge of its capabilities.

That being said I’m highly impressed. For it to be able to write a language within a language is pretty surprising, especially for a language that it couldn’t possibly have had much training data on.

Being able to talk to it naturally and have it build context is a really nice way to work with a search engine. Being able to simply correct it and have it seem to grow its understanding of the programming lanaguage as we went felt really natural. I honestly have given many code reviews that felt pretty similar to this conversation I had with the AI and that went less well too.

I do feel like I hit a limit to its capabilities but its very impressive nonetheless. I hope someday I will be able to essentially give the AI more and more constraints in a natural way and have it fully write the code needed to make it work.

Hybrid Microservice Architecture

In the article What are Microservices, Amazon Web Services does a great job of defining Microservices.

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. These services are owned by small, self-contained teams.

Additionally, they do an excellent job of outlining the many strengths of Microservices as contrasted to Monolithic applications.

With a microservices architecture, an application is built as independent components that run each application process as a service. These services communicate via a well-defined interface using lightweight APIs.

They make many excellent points and do a good job of highlighting true strengths of microservice architectures.

However, what AWS doesn’t mention in this article are the downsides. If you were to read only this article you would be led to believe that there are no downsides and that all is rainbows and unicorns in the land of microservices, and you’d be a fool to do anything else.

Yet, when you actually attempt to develop a system as microservices there are some clear downsides which are hard to ignore and its common to find yourself wishing someone would have explained these downsides along with the good.

So here they are, I will outline a couple of the biggest problems with microservices and then I’d like to propose a way to help reduce the effects of these downsides with what I call a Hybrid Microservices Architecture.

Notable Microservice Downsides

Developer context switching
Maintenance costs
Communication costs
Duplication

Developer Context Switching

Developers frequently need to switch between areas of code in the same application. The feature they may be working on may span multiple, or all, microservices in an application. They may have a bug which is blocking them but appears in a different teams microservice. They may have tight coupling between several microservices, and they need them running in tandem in their development environments.

In all of these cases a developer may need to clone, understand, build and run the code for multiple microservices simultaneously. That set of microservices tends to grow over time as well. Most of the time this is a pretty reasonable thing to do but it’s hard to deny that for each microservice you need to work with in this way more and more cognitive complexity is loaded onto the developer. Eventually there is a tipping point where it can feel “too hard” and the time it takes to get the code to a point where you can work on it feels very slow.

Maintenance Costs

Each Microservice has a cost. Both in actual server and network costs but also in terms of code maintenance. For example, you may have multiple microservices all with a reference to the same library which needs to be updated. That work has to be duplicated across multiple repos, multiple teams.

Each microservice may have its own deployment pipeline, or its own testing infrastructure, its own linting rules, its own repo configuration, its own secrets to rotate.

Even if you utilize shared libraries or tools, when you make a feature improvement for the tool you have to roll out the new version to every independent repository. The patterns and the code are duplicated across multiple repositories, increasing the effort needed to make improvements.

Communication Costs

Because microservices are independent processes, they need to communicate with each other through their respective APIs. This communication requires careful specification of inputs and outputs as well as versioning and backward compatible support for other services which may lag behind on their own updates.

In a monolithic application this communication may simple be between two functions in the same code and a simple refactor is all that’s needed. But microservices cannot be monolithically refactored, they need to carefully coordinate their breaking changes and use additive techniques and deprecation schedules.

Additionally, communicating over a network effectively may require special communication patterns such as Queues or Streams to manage throughput and reliability. And further, special considerations may need to be put into place to prevent circular messages. This has a monetary and cognitive complexity cost.

Duplication

Microservices are typically isolated; they have their own database and services such as Queues, Streams and File Storage will likely be considered an internal implementation detail of the service. That infrastructure will need to be duplicated for each service.

Many times, a service will rely on the data owned by another API, but because those services are isolated, they must not access each other’s internal infrastructure directly. So, they go through the API of the other service to fetch or stream data and duplicate it into their own system. Both the storage and the code needed to do this duplication of data has a non-trivial cost.

Hybrid Microservices

Despite all of these downsides the decision tree to choose a microservice architecture vs. a monolithic architecture is still usually pretty clear; yes, you should do microservices, it is worth the cost.

However, I would like to share an additional approach which I am calling a Hybrid Microservice Architecture which I believe can help reduce the costs of developing microservices.

The primary idea of a Hybrid Microservice is to strike a balance between the costs of a microservice and the strengths of a monolith. The strength of the monolith, despite all its problems, is that the code is consolidated together in a single repository.

The second idea of the hybrid architecture is to define the boundary of a microservice not to be based on features or data but rather teams. Each team will consolidate all of the code of their microservices into a single repository.

Yet even though we’re consolidating our code into single repositories per team, I am not suggesting we should simply have multiple smaller monoliths.

Rather, the hybrid approach should adopt a “Mode” pattern which will allow this single repository to run in multiple modes. This will allow us to structure the code into multiple runtime components, which can continue to leverage the strengths of microservices and scale independently based on features and usage.

Mode Pattern

The mode pattern is simple, rather than having a single entry point, the service declares all possible modes that it can run as, and command line arguments are used to cause the process to run in a single mode. Each process can only run in a single mode.

Example modes:

Web Server
Cron Job
Function or Lambda
Stream Handler

The advantage of this is that the business logic of the application is shared in the same code base for all modes of the application. The models, the repositories the utilities, etc. You don’t need to an extra repository to create the shared code between them and increase the maintenance and cognitive costs of publishing a library and referencing it across multiple other repositories.

Additionally, since the code is identical between all processes running in the same microservice (just in different modes) you can safely communicate with shared data storage, such as the database. Normally there is a tension between two applications calling the same database directly because their code may be different and altering the schema of data that another application depends on can cause issues with the other applications. When this happens your database becomes an API in and of itself and can become extremely difficult to change safely.

Additionally, facilitating network-based communication between two microservices of the same team can be expensive, feel unnecessary and be slow. Requiring a Cron Job to call into web APIs to simply to get the data it needs to do some work can be very prohibitive, especially when it needs to crunch large amounts of data.

Therefore, the hybrid solution posits this hypothesis:

It’s safe for multiple microservices to access the same database directly provided they have identical code at all times.

And here is the summary in bullet point format:

Organize into as few of teams as possible
Each team has a single repository for all of their code
Each team repository implements all required Modes
Each microservice owned by the same team runs the same code
Each microservice runs a single mode
Each microservice running the same code is safe to access the same internal services

Grove

I have made a public, Open Source Hybrid Microservice framework called Grove for use in code or as a proof of concept.

Tao of Leo #36

Any requirement that says how a feature “should” behave is not a requirement. It is merely a polite request.

I contributed to Deno

I’ve been developing with Node.js for a while now and I have been enjoying it quite a bit, despite some of its flaws. Overall its been a great experience and I am a big node.js fan.

Much to my surprise however a new project called Deno has emerged. After taking a look at it I realized it is essentially a spiritual successor to node.js but it’s also better than node in pretty much every single way… Well, every way except for its ubiquity! There are tons of high quality open source modules for node which just don’t quite work with Deno out of the box. Deno took a hard line stance on its adoption of ESM modules, which is actually better than common.js, and enables a variety of other features such as not needing npm at all anymore… Its just that ESM is not very wide spread and is only backwards compatible with commonjs modules with some hacks that only work about 75% of the time it seems.

Deno also has a few areas which are still underdevelopment related to certs, TLS and websockets. But fortunately the project has a very active and responsive team of developers! I noticed an issue I was having related to connecting to an internal site due to my CA certificates not being loaded and took the time to debug it. Eventually I figured out that the propery CA cert was stored in my systems keystore and Deno couldn’t find it there. So I managed to find a simple rust crate which supported loading certificates right out of the keystores for each major platform and figured out it was pretty trivial to integrate it in with the crates Deno was already using to do TLS! The Deno developers worked with me to craft the proper changes and do some necessary refactoring and testing, and now I am a Deno contributor.

Here’s my commit:

https://github.com/denoland/deno/commit/02c74fb70970fcadb7d1e6dab857eeb2cea20e09

https://github.com/denoland/deno_std/commit/396445052d25b206e0adb00826c7365783fa578a

Tao of Leo Proven Right, Once Again

I was just reading about UTC and saw this tidbit of history

The official abbreviation for Coordinated Universal Time is UTC. It came about as a compromise between English and French speakers.
Coordinated Universal Time in English would normally be abbreviated CUT.
Temps Universel Coordonné in French would normally be abbreviated TUC.
UTC does not favor any particular language.

Therefore the abbreviation UTC was selected because everyone hated it equally 🤣

Tao of Leo #33

A committee that makes fair decisions will not choose the best solution, but the one everyone hates equally.

More