Machine Learning Made Easy With ml5.js
In my last entry, I gave an introductory tutorial into the world of Processing, an open-source graphical Java library that can be used to connect coding skills with visual art.
The beautiful thing about ml5 is that you don’t actually need to have a comprehensive understanding of machine learning in order to utilize this technology. ml5 provides ‘pre-trained’ machine learning models that we can essentially plug and play into our apps, which gives us some pretty amazing functionality. The creators of ml5 developed an artificial neural network, comprised of data structures that actually mimic the behavior of human neurons. They then “trained” this network in multiple ways, using an enormous amount of data to teach the network how to identify certain images, how to detect motion, read human faces, and much more.
For a more in-depth explanation of how ml5 works under the hood, I suggest starting here.
In the following example, we will be using p5 with ml5 to create a sketch that accesses our webcam, detects a human body, and then draws a series of dots and lines along the body to create a simple skeleton that changes and morphs to match the body’s movements in real time. Here’s an example of the end result below.
poseNet was originally created by Dan Oved using another machine learning software called TensorFlow.js.
To get started, fire up your VSCode, Sublime, or text editor of choice and create an HTML boilerplate with the following code. I like to use the web-hosted CDN versions of both p5 and ml5, as they will always contain the most up-to-date codebases available.
The third script in the body is an optional p5 sound add-on. It’s not needed to get the base functionality of p5, but it’s a great option if you want to incorporate music, sound effects, or some kind of audio processing into your sketch.
Just like in Processing, every p5 sketch needs a setup() and draw() function. First, let’s declare some initial variables to be used by the ml5 later on. We then need to specify the dimensions of our canvas, tell the sketch to access our webcam, and set our poseNet variable equal to an ml5 object which calls the poseNet method. The subsequent lines of code sets up an event which detects changes in motion, and fills the poseNet model with data to be used to draw and alter the skeleton accordingly.
Next, our draw function will use the p5 image() method to display our camera feed. It then calls two functions, drawKeypoints(), and drawSkeleton(). These functions work in tandem to create the dynamic skeleton.
Below is the code for both of those functions. There’s a lot happening here, but I’ll try to break down the code as best I can.
Essentially, the drawKeypoints() function loops through an array of objects, with each object containing position data about the human form that is being detected by the camera(the array is created by some ml5 magic under the hood). Each object contains multiple keypoints, each corresponding to a body part, such as your left arm, right leg, etc. A second loop is initiated to iterate through these keypoints, and then draw a small, filled circle at the location on the canvas specified by the keypoint.
This creates all of the little dots you see in the GIF above. Finally, drawSkeleton() completes the sketch by using another nested loop to draw lines on the canvas, connecting the dots created by drawKeypoints(). Every time the on-camera subject moves, the whole dataset is re-created, and both of these functions run again to re-draw the skeleton. Amazing!
A similar technique can be used with ml5 to create another kind of body detection, shown below. This sketch uses the ml5 model to detect and predict human facial landmarks. You can then use nested loops to iterate through these landmarks, much like the “poses” shown above, and draw dots and lines that connect to form a dynamic face mask!
This type of easy-to-use, pre-packaged machine learning library is a fantastic recent innovation in the open-source coding world. It allows developers and artists of widely different expertise and skill-level to engage in machine learning, a subject which, to most of us, seemed impossibly complex and opaque, until now! Accessibility is everything!