#8 - From Training to Deployment: A Simple Approach to Serving Embedding Models

Now that our MLOps pipeline has trained and built our embedding model, it is now time to serve the model in a Docker container using FastAPI

Mar 22, 2025

This article is part of a blog series about demistifying vector embedding models for use in image search use case:

Part 1. So how do you build a vector embedding model? - Introduces vector embedding models and the intuition behind the technologies we can use to build one ourselves.

Part 2. Let's build our image embedding model - Shows a couple of ways to build embedding models - first by using a pre-trained model, and next by fine-tuning a pre-trained model. We use PyTorch to build our feature extractor.

Part 3. Modelling with Metaflow and MLFlow - Here we are using Metaflow to build our model training workflow, where we introduce the concept of checkpointing, and MLFlow for experiment tracking.

Part 4. From Training to Deployment: A Simple Approach to Serving Embedding Models - (this article) -Packaging your ML model in a Docker container opens it up to a multitude of model serving options.

Part 5. Putting Our Bird Embedding Model to Work: Introducing the Web Frontend -For our embedding model to prove useful to others, we have created a modern frontend to serve the similarity inference to our users.

Hi friends,

In our last few posts, we talked about embedding models and showed you how to fine-tune a pre-trained ResNet model using PyTorch. We also went over some MLOps ideas to set up a workflow for training this model consistently.

Now, we want to take things a step further and actually use this model in a way that's helpful for others, not just me. So, we're going to build a system around it. This means creating an API to let other programs talk to our model, and a nice, modern website for people to interact with it directly.

Getting Started: The API

FastAPI is a really popular tool for building APIs with Python. It's known for being super fast, which makes it a great choice for serving machine learning models. Plus, it has a bunch of other cool stuff built-in, like built-in validations, to make sure the data sent to the API is in the right format.

It's built on top of Pydantic, which is this incredibly popular library for checking data. Seriously, it gets downloaded like 272 million times every month! That tells you how much developers trust it and find it useful. Pydantic has tons of features, but the ones I find most handy are: making sure the data is valid, managing settings easily, and turning Python stuff into formats that can be sent over the internet (serialisation).

Making Things Easy with Docker

Docker... well, everyone knows Docker, right? For me, the biggest thing is that it makes life so much easier for me and my team. It gives us the best developer experience (DX) possible. With Docker, we can be pretty confident that if our API runs inside the container on our computers, it's going to run the same way when we put it in the cloud. No more "it works on my machine!" surprises. We also use Docker Compose to not just build the container for the API, but also to set up the whole system around it, like the website that will actually use the API.

The Power of Vector Databases

Finally, the real magic behind finding similar things in a project like this is the "semantic search" power of vector databases. In this project, we're using LanceDB, which is a vector database that keeps all its data in a file. It's open-source, has a lot of cool features, and the best part? It's super easy to get started with. Remember that feeling the first time you used SQLite? You get the same feeling using LanceDB.

Image: Serving our embedding model using FastAPI, Docker containers, and LanceDB vector database.

My Thoughts & Where We're Headed

So, our ResNet 50 model sees those bird pictures and creates these "vector embeddings" – basically, a list of 2048 numbers that represent each image. Now that Generative AI is a big deal, these vector databases are making a comeback, and there are tons of them out there. But for what we're building, we need something solid for production: fast, easy to work with, and packed with features.

I actually first heard about LanceDB from this blog called The Data Quarry by Prashanth Rao. Specifically, it was in his article – "Embedded databases (3): LanceDB and the modular data stack." The idea of a vector database that felt like SQLite really clicked with me. Think about SQLite – it's fast, lightweight, has a lot of cool stuff, and you can use it for your local experiments, but more importantly, you can rely on it in real production systems. That's what got me excited about LanceDB.

What really surprised me was seeing how LanceDB stacked up against Elasticsearch as shown in the article. In a lot of ways, LanceDB actually performed better, and it did it using way fewer resources on a single computer. That's pretty impressive!

Using Docker containers is also a big part of our plan. It's like future-proofing our project for different ways we might want to deploy it, both while we're building it and when it's live. Because we're using Docker, we're not locked into one specific way of putting our API out there. We could use virtual machines, Kubernetes (which is super popular for managing containers), or even serverless options that can run containers, like AWS Lambda and Azure Functions. Plus, as we talked about earlier, Docker makes life so much easier for developers. We can be pretty sure that if it works in our Docker container, it'll work anywhere else.

Challenges & Unexpected Issues

Okay, so this project threw a little curveball my way. It's the first time I've really used conda to manage the Python stuff. I've always been a pyenv + python venv kind of guy. For me, that setup just feels lighter and I don't have to think about it too much anymore – it's just natural. I gotta say, conda and mamba (which is supposed to be a faster version of conda) feel a bit heavier, and it seems like there are a few more commands I need to keep in mind.

But hey, I'm willing to give it some more time. I've heard from a lot of people that conda is actually better when it comes to making sure all your project's dependencies play nicely together. So, I'll stick with it for a bit and see if it wins me over. And then there’s uv, but I will save that for another day.

Iterations & Lessons Learned

The whole point of starting this series was to get a better handle on how those vector embedding models actually work with images. With everyone talking about using vector databases for semantic search in those RAG (Retrieval-Augmented Generation) applications, I was curious if we could do the same thing with images – you know, compare them for similarity just like we do with text in NLP. It was pretty cool to find out that ResNet, which is actually a pretty old technology in the world of computer vision, can be trained to recognise objects, and that making an embedding model isn't as complicated as I first thought once you understand the basics.

Once I had a good grasp on building and training the embedding model, I moved on to creating an MLOps pipeline using Metaflow and keeping track of my experiments with MLflow. Before I really got into building models myself, I kind of thought that the model building part was the biggest chunk of creating a machine learning system. And don't get me wrong, it's definitely super important. But there are so many other things that happen around that modeling part – before you even start and after you're done – that when you look at the whole end-to-end system, the actual model building almost looks like a smaller piece of the puzzle.

After setting up the modeling pipeline, I now have a system that can handle all the steps – getting the data ready, training the model, and checking how good it is – in a way that I can repeat whenever I need to retrain in the future. That's pretty neat, but the thing is, nobody else can actually use this yet.

To make this image similarity thing available to everyone, we need an API. And that's where serving the model using FastAPI and Docker containers comes into play.

Plus, using Docker not only gives us more options for putting this thing in the cloud, but it also makes the whole development process way easier. It helps us developers focus on actually building the stuff without getting bogged down in environment issues.

In addition to more cloud deployment options, a superior developer experience is produced, which help the developer even more focus on the job at hand.

Final Results & Reflection

Whenever I build anything that uses Docker, I always make sure to use Docker Compose too. Honestly, this is mostly about making the development process smoother than anything else.

The kinds of systems I usually work on involve one or more Docker containers, and it's just so handy and reliable to have a complete setup that just works on my computer and in the cloud without having to change any code. It's a real time-saver and makes things way less frustrating.

For this project, I actually used Cursor, which is an AI-powered code editor, to help me put together the API and the ReactJS website (I'll be talking more about the website in my next post). I've used FastAPI before, and instead of the usual endless Googling and copy-pasting to get things going, Cursor really sped up the initial setup. I definitely had to guide it along the way quite a bit, but it definitely made my work easier. It's like having a coding buddy that knows a lot of the basics.

And LanceDB? It just works, plain and simple. If you ever need a solid, production-ready vector database that you can embed right into your application, I can now confidently say that LanceDB is the way to go. Even though it saves its data to your local hard drive by default, you can easily switch it to use cloud storage like AWS S3 or Azure Blob Storage if you need to. That makes it really flexible.

Building the API with FastAPI is fantastic, especially when you compare it to how you build APIs using something like AWS Lambda handlers (trust me, FastAPI is much nicer!). The built-in data validation, the easy way it handles data formats, and the fact that it can handle things asynchronously (doing multiple things at once) are all great features.

But honestly, you don't really see the magic of similarity search with vector databases until you have a user interface to play with. Seeing the search results with your own eyes is just... well, it feels like magic. You type in a description, or in our case, you'll upload a picture, and it instantly finds similar things. It's pretty cool.

Why don’t you give it a go!

You can find all the code for this project on GitHub. Feel free to check it out and try it for yourself – it's all open source!

If you have any questions, drop a comment here on Substack, or send me a DM on LinkedIn.

And hey, if you're interested in collaborating on something cool, let's chat! I'm always up for new ideas and working with other passionate people.

Full Stack ML

Discussion about this post