Uncategorized – justinmchase

I contributed to Deno

I’ve been developing with Node.js for a while now and I have been enjoying it quite a bit, despite some of its flaws. Overall its been a great experience and I am a big node.js fan.

Much to my surprise however a new project called Deno has emerged. After taking a look at it I realized it is essentially a spiritual successor to node.js but it’s also better than node in pretty much every single way… Well, every way except for its ubiquity! There are tons of high quality open source modules for node which just don’t quite work with Deno out of the box. Deno took a hard line stance on its adoption of ESM modules, which is actually better than common.js, and enables a variety of other features such as not needing npm at all anymore… Its just that ESM is not very wide spread and is only backwards compatible with commonjs modules with some hacks that only work about 75% of the time it seems.

Deno also has a few areas which are still underdevelopment related to certs, TLS and websockets. But fortunately the project has a very active and responsive team of developers! I noticed an issue I was having related to connecting to an internal site due to my CA certificates not being loaded and took the time to debug it. Eventually I figured out that the propery CA cert was stored in my systems keystore and Deno couldn’t find it there. So I managed to find a simple rust crate which supported loading certificates right out of the keystores for each major platform and figured out it was pretty trivial to integrate it in with the crates Deno was already using to do TLS! The Deno developers worked with me to craft the proper changes and do some necessary refactoring and testing, and now I am a Deno contributor.

Here’s my commit:

https://github.com/denoland/deno/commit/02c74fb70970fcadb7d1e6dab857eeb2cea20e09

https://github.com/denoland/deno_std/commit/396445052d25b206e0adb00826c7365783fa578a

Tao of Leo Proven Right, Once Again

I was just reading about UTC and saw this tidbit of history

The official abbreviation for Coordinated Universal Time is UTC. It came about as a compromise between English and French speakers.
Coordinated Universal Time in English would normally be abbreviated CUT.
Temps Universel Coordonné in French would normally be abbreviated TUC.
UTC does not favor any particular language.

Therefore the abbreviation UTC was selected because everyone hated it equally 🤣

Tao of Leo #33

A committee that makes fair decisions will not choose the best solution, but the one everyone hates equally.

Try / Finally with AWS Step Functions

AWS Step Functions has some built-in features for catching and handling errors but, surprisingly, it doesn’t have semantics for the usually accompanying “finally” concept.

In my scenario I am creating an ephemeral Kinesis stream in my State Machine, which I then stream a large number of records into while executing one lambda. I then process those records more slowly in a series of subsequent lambda functions. Once completed I then delete the ephemeral kinesis stream.

The problem with this approach is that if there is an unexpected error anywhere in one of my steps it can cause the whole Step Function to fail and end up orphaning the kinesis stream. Therefore I needed a way to reduce the likelihood of this problem with a try/finally pattern.

To accomplish this, first imagine we have this step function:

[code]
StartAt: ConfigureIterator
States:
ConfigureIterator:
Type: Pass
Result:
limit: 500
ResultPath: $.iterator
Next: InitializeIterator
InitializeIterator:
Type: Task
Resource: iterator
InputPath: $.iterator
ResultPath: $.iterator
Next: ConfigureXmlStream
ConfigureXmlStream:
Type: Pass
Result:
gz: true
root: item
ResultPath: $.options
Next: XmlStream
XmlStream:
Type: Task
Resource: xmlstream
ResultPath: $.xml
Next: SendItemsToApi
SendItemsToApi:
Type: Task
Resource: items2api
ResultPath: $.iterator
Next: IterateNext
IterateNext:
Type: Choice
Choices:
– Variable: $.iterator.state
StringEquals: done
Next: Cleanup
Default: SendItemsToApi
Cleanup:
Type: Pass
Result: done
ResultPath: $.iterator.state
Next: IteratorDone
IteratorDone:
Type: Task
Resource: iterator
InputPath: $.iterator
ResultPath: $.iterator
Next: Finally
Done:
Type: Pass
End: true
[/code]

In the InitializeIterator step we are creating our ephemeral Kinesis stream. In the XmlStream step we are streaming items from a large xml document into JSON objects which are then written to the stream. Next, in the SendItemsToApi we are reading items out of the kinesis stream, doing some formatting and validation on those items, and then sending each item to a REST endpoint for storage and/or other actions. Finally in the IteratorDone step we are destroying the Kinesis stream.

You could imagine a variety of other possible scenarios where one would need to cleanup resources allocated in a previous Step. In this particular scenario we need to ensure that the IteratorDone step is called regardless of errors that may happen between it and the InitializeIterator step.

To do this we first will wrap then XmlStream and SendItemsToApi steps in a Parallel block with a single branch. The reason we want to do this is so that these steps can be treated like a single block where any errors in any state can be caught and handled in a single Catch clause.

The three steps wrapped in a Parallel block now look like this:
[code]
Main:
Type: Parallel
Branches:
– StartAt: XmlStream
States:
XmlStream:
Type: Task
Resource: xmlstream
ResultPath: $.xml
Next: SendItemsToApi
SendItemsToApi:
Type: Task
Resource: items2api
ResultPath: $.iterator
Next: IterateNext
IterateNext:
Type: Choice
Choices:
– Variable: $.iterator.state
StringEquals: done
Next: Cleanup
Default: SendItemsToApi
Next: Cleanup
ResultPath: $.main
Retry:
– ErrorEquals: [ ‘States.ALL’ ]
MaxAttempts: 3
Catch:
– ErrorEquals: [ ‘States.ALL’ ]
ResultPath: $.error
Next: Cleanup
[/code]

It’s important to note here that the result of the block is an array of results where each index in the array is the result object from the last step of each branch. So in this case we will have an array with a single object in it [ { iterator: ... } ]. If you don’t specify a ResultPath it will replace the entire context object $, which is undesirable in this case since we need to still access the iterator object in a later step.

It’s also important to note that we are storing the caught exception into the $.error field, which we will rethrow later, after cleanup.

[code]
Cleanup:
Type: Pass
Result: done
ResultPath: $.iterator.state
Next: IteratorDone

IteratorDone:
Type: Task
Resource: iterator
InputPath: $.iterator
ResultPath: $.iterator
Next: Finally

Finally:
Type: Task
Resource: throwOnError
Next: Done

Done:
Type: Pass
End: true
[/code]

So now if an error occurs while processing our xml file or sending items to the api it will retry a couple of times and then ultimately capture the error and move to the Cleanup phase. We’ve added a new Finally Step, which will throw an exception if there is a value stored in $.error, which will allow the Step Function to complete in an Error state rather than a Success state so we can further trigger alarms through Cloud Watch.

Here is the code for the throwOnError lambda:

[code language=”javascript”]
import { log, parse, handler } from ‘mya-input-shared’

function RehydratedError (message, name, stack) {
const tmp = Error.apply(this, arguments)
this.name = tmp.name = name
this.message = tmp.message = message
Object.defineProperty(this, ‘stack’, {
get: () => [`${this.name}: ${this.message}`].concat(stack).join(‘\n at ‘)
})
return this
}

RehydratedError.prototype = Object.create(Error.prototype, {
constructor: {
value: RehydratedError,
writable: true,
configurable: true
}
})

export const throwOnError = handler((event, context, callback) => {
const { feed, error } = event
if (error) {
const Cause = error.Cause || ‘{}’
parse(Cause, (err, cause) => {
if (err) return callback(err)
const { errorMessage, errorType, stackTrace } = cause
err = new RehydratedError(
errorMessage || ‘An unknown error occurred.’,
errorType || ‘UnknownError’,
stackTrace || ”)
log.error(‘feed_error’, err, { feed }, callback)
})
} else {
callback(null, event)
}
})
[/code]

Iterating with AWS Step Functions

One interesting challenge I immediately encountered when attempting to work with AWS Lambda and Step functions was the need to process large files. Lambda functions have a couple of limitations namely memory and a 5 minute timeout. If you have some operation you need to perform on a very large dataset it may not be possible to complete this operation in a single execution of a lambda function. There are several ways to solve this problem, in this article I would like to demonstrate how to create an iterator pattern in an AWS Step Function as a way to loop over a large set of data and process it in smaller parts.

Screenshot 2017-03-08 00.39.02

In order to iterate we have created an Iterator Task which is a custom Lambda function. It accepts three values as inputs in order to operate: index, size and count.

Here is the code for this example step function:

[code language=”javascript”]
{
"Comment": "Iterator Example",
"StartAt": "ConfigureCount",
"States": {
"ConfigureCount": {
"Type": "Pass",
"Result": 10,
"ResultPath": "$.count",
"Next": "ConfigureIterator"
},
"ConfigureIterator": {
"Type": "Pass",
"Result": {
"index": -1,
"step": 1
},
"ResultPath": "$.iterator",
"Next": "Iterator"
},
"Iterator": {
"Type": "Task",
"Resource": "arn:aws:lambda:{region}:{accountId}:function:iterator",
"ResultPath": "$.iterator",
"Next": "IterateRecords"
},
"IterateRecords": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.iterator.continue",
"BooleanEquals": true,
"Next": "ExampleWork"
}
],
"Default": "Done"
},
"ExampleWork": {
"Type": "Pass",
"Result": {
"success": true
},
"ResultPath": "$.result",
"Next": "Iterator"
},
"Done": {
"Type": "Pass",
"End": true
}
}
}
[/code]

ConfigureCount

In this step we need to configure the number of times we want to iterate. In this case I have set the number of iterations to 10 and put it into a variable called $.count. In a more complete example this may be the number of files you want to iterate over. For example in my real world scenario I am receiving a substantial CSV file which is then broken into many smaller CSV files, all stored in s3, the number of smaller files is then set into the count variable here. The large CSV file can be read entirely in a single lambda execution, streaming sections into smaller files, never loading the entire file into memory at the same time; but it cannot be processed entirely in a single function. Thus we split it and then iterate over the smaller parts.

ConfigureIterator

Here we set the index and step variables into the $.iterator field, which the iterator lambda uses to determine whether or not it should continue iterating.

Iterator

This is the iterator itself, a small lambda function that simply increments the current index by the step size and calculates the continue field based on the current index and count.

[code language=”javascript”]
export function iterator (event, context, callback) {
let index = event.iterator.index
let step = event.iterator.step
let count = event.count

index += step

callback(null, {
index,
step,
count,
continue: index < count
})
}
[/code]

The reason why we want to support a step size is because we may have multiple workers which operate on data in parallel. In this example we have a single worker but in other cases we may need more in order to complete the overall work in a timely fashion.

IterateRecords

From there we need to immediately move into a Choice state. This state simply looks at the $.iterator.continue field and if it is not true then our iteration is over and we exit the loop. If iteration is not over then we move to the worker tasks which may use the $.iterator.index field to determine which unit of work it should operate on.

ExampleWork

In this example this is just a Pass state, but in a real example this may represent a series of Tasks or Activities which process the data for this iteration. When completed, the last step in the series should point back to the Iterator state.

Its also important to note that all states in this chain must use the ResultPath field to bucket their results in order to preserve the state of the iterator field throughout theses states. Do not override the $.iterator or $.count fields while doing work or you may end up in an infinite loop or error condition.

Done

This state simply signifies the end of the step function.

New Job at Evolve

I’m happy to officially announce that I have accepted a job at a local start-up here in Minneapolis called Evolve.

We’re going to be a very small crew, working closely together to bring Evolve to the next level. I’m extremely excited to take this next step closer to my original passion: video games. I am also very excited to learn more about start-ups and what it takes to put them together and make them successful.

And if you want to play some games head over to my Evolve profile and add me as a friend!

Category: Uncategorized