#3 - Weekend fun with SOMs, FastAPI and AWS Lambda Container Images

Today we take a quick pit stop from our F1 Prediction project, and explore a fun little algorithm with our very own python implementation of Kohonen Self-Organising Maps

Mar 30, 2024

Hi friends,

A couple of weeks ago, I stumbled upon an unsupervised learning algorithm called Kohonen Self-Organising Maps. Koho-wwhattt?!!

Github project for this article

Self-organising maps are nothing novel, in fact it was first introduced by a Finnish professor Teuvo Kohonen in the 1980s. It is a type of artificial neural network, but instead of the usual backpropagation (gradient descent) used in neural networks, Kohonen SOMs use competitive learning instead. It was quite popular around 2005, however, I seldom (never) hear them being used these days. There are SOM libraries in popular languages like Python, R, and Matlab to name a few.

What attracted me to SOMs was it’s ability to create pretty abstract blobs of colour which are almost like art, if you ask me. (This colorful representation is possible if you use a 3-component vector like the RGB-component we use in this article.) Above is one such example of an output generated by our Python implementation.

What are SOMs?

Kohonen Self-Organising Maps (SOM) are a type of unsupervised learning algorithm. Kohonen Maps are typically used for clustering and visualising so that higher-dimensional data can be represented in lower dimensions, typically in 2D like in a rectangular topology or grid.

In addition to segmentation and clustering analysis, it is also a form of dimensionality reduction technique so that the high-dimensional data in the input layer can be represented in the output grid.

SOM Training

I have created a notebook, a Python implementation and a FastAPI client that I have made available in GitHub to use any which way you want. This Jupyter notebook goes through the training of self-organising maps as shown below:

Each node's weights are initialized.
We enumerate through the training data for kohonen_some number of iterations (repeating if necessary). The current value we are training against will be referred to as the current input vector
Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).
The radius of the neighbourhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighbourhood.
Each neighbouring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered.
Go to step 2 until we've completed N iterations.

Kohonen Input and Output Layer

A Kohonen SOM has two layers, the input layer and the output layer. The input layer is made up of the features of the data, while the output layer is made up of the nodes that will be trained to represent the input data. In this project, our input layer is made up of floats (which represent RGB colours), and the output layer is a 2D grid of nodes, each node also made up of RGB components.

The image below shows a very simple SOM with an input layer of 3 features and an output layer of 16 nodes. In this project, we only have 3 features (RGB colour components), however, you can have more features in the input layer. For example, if you are trying to segment customers, the features can be income, sex, age, race, etc.

With our input features (and output node weights) conveniently RGB components, we can easily visualise the output layer as a 2D grid of colours.

If you look at each node in the output layer, you will notice that there are 3 lines into each node. These lines represent the weight of each feature in the input layer. Each output node weight is also a 3-dimension vector, the same shape as the input layer, and also represent RGB dimension. When updating the node weights during training, the changes to the weight are easily visualised as colours.

SOM Output

Given the input layer made up of 20 random RGB colours, later you will notice that the output image will contain colour groups from this layer.

In the below left image, a 100x100 output layer is initialised with random colours. Running the SOM algorithm for 1000 iterations will produce the output on the right, where we can see roughly 20 colours grouped together.

Vectorisation

In my first attempt at implementing the Kohonen SOM algorithm, I used the typical Python nested loops following the algorithm to the letter. However, I quickly realised that increasing the iterations to 200, 500, 1000 or more would slow it to a crawl, not very exciting when deploying it to production.

The algorithm could be vectorised using Numpy, which would make it more efficient, and faster. I have implemented both versions and compared the execution times.

The vectorised implementation is around 76x faster (from around 10 minutes down to 9 seconds), so it’s worth remembering when working on your Python implementations that involve numerous loops.

Production deployment

In the spirit of full-stack machine learning, we should not disregard production deployment of the resulting API. This project uses FastAPI to implement the web API to serve our ML model.

For this project, my deployment preference is serverless first, so I would use AWS Lambda Container Images to deploy it. As with many APIs for data and machine learning applications, the required dependencies will surely be over the 250MB uncompressed limit even when using multiple Lambda Layers. All the popular Python libraries are quite chunky and will easily go over this Lambda hard limit. Numpy alone is already over 100MB, and we still have FastAPI, Uvicorn, and Mangum to add.

With Lambda Container Images, this will allow us up to 10GB container size limit for our API. I have supplied a Docker image for Lambda containers, as well as a standard image for any cloud provider that can deploy containers.

However, for applications that have serious production requirements, a more comprehensive ML serving solution such as AWS SageMaker, BentoML, or TensorFlow Serving may be more appropriate.

Till then,

Handy Resources

Github project for this article

Kohonen Self Organising Map (SOM)

Self-Organising Maps

Kohonen Self-Organising Maps - A special type of Artificial Neural Network

Wikipedia - Self-organising map

Full Stack ML

Discussion about this post