#7 - Streamline Your AI Workflow: A Factory Class for LLMs and Embedding Models
LLMFactory and EmbeddingModelFactory classes that will improve the developer experience of an AI Engineer
Hi friends,
In today’s article, we will be talking about some software engineering best practices that will improve the developer experience as we build Generative AI applications. One of my annoyances in this space is the rapid advancements where we see new models in what feels like every week.
This means I regularly have to update my applications or scripts to accomodate this. I know that LlamaIndex and LangChain have a LLMFactory pattern that handles this scenario, however, I haven’t seen a library that does this for Instructor-based projects (which is my preference), until now.
I forked this from one of Dave Ebbelaar's repo to play with the code and understand how it works. It is awesome to have a repo filled with best practices and a well-documented codebase. I’ve been following his work for a few months now, and its still amazing to me how I continuously find diamonds in Youtube, the calibre of learning materials and generosity is top-notch.
Here are some learnings from this codebase:
I have always used Jupyter notebooks to experiment with code, but this is the first time I have used interactive Python in VSCode, which I thought is a great experience. Although I will not be doing this all the time since I still enjoy the usefulness of Jupyter notebooks, I will be using interactive Python in VSCode more often.
I like his use of the LLMFactory to make it easy to swap out different models, as I mentioned, frequent changes in this space, there’s always a new SOTA model around the corner. This is a great way to make the codebase more modular and easier to maintain. Also, its simple enough to extend and modify to adapt to your workflow.
It is still using the instructor library, which I find indispensable when working with LLMs and structured outputs.
Use Pydantic Settings as powerful way to manage and centralise your application’s settings, in this case, we are loading our application settings from an .env file, and we want to manage all from the one source file.
As I was running and playing with the code, there are a few issues that I thought would have been better, so here are the improvements I have added to this fork:
I have improved LLMFactory so that the LLM registrations are done separately, meaning you will not need to update the LLMFactory class every time you add a new LLM. When a new model comes around, all we have to is register the new model by adding the register_llm_client decorator on the new model-specific code that returns the instructor client.
I have added support for Amazon Bedrock to have access the Anthropic Claude models there. A bit of adjustments to Bedrock embedding models so that the calls are now identical to OpenAI or Ollama embedding models.
I have created a similar factory class for embedding models (EmbeddingModelFactory), so that you can now mix and match different LLMs with different embedding models.
This means I can now use say DeepSeek R1 running on Ollama locally and Titan embedding model from AWS.
Or Anthropic Claude Sonnet 3.5 in Amazon Bedrock with Ollama local embedding model.
Or both the LLM and embedding model from Amazon Bedrock. This is a great way to experiment with different models and not have to change a lot of code.
I have added a new decorator register_llm_client which is nothing but a lookup table for the registered LLM clients.
To support a new LLM, just decorate the function with register_llm_client, and that is all you need to do for the registration.
In the LLMFactory constructor, simply call get_llm_client, and this will return that provider’s client code!
If you want to play with LLMFactory, to check this out and play with this more, please have a look at the Dave’s repo at https://github.com/daveebbelaar/pgvectorscale-rag-solution or my fork at https://github.com/jaeyow/pgvectorscale-rag-solution.
Till then,
JO