Working with Embeddings: Closed versus Open Source

Working with Retrieval

Using techniques to improve semantic search

17 min read

2 hours ago

Demonstration of clustering before performing semantic search | Image by author

If you’re not a member but want to read this article, see this friend link here.

Embeddings are a cornerstone of natural language processing. You can do quite a lot with embeddings, but one of the more popular uses is semantic search used in retrieval applications.

Although the entire tech community is abuzz with understanding how knowledge graph retrieval pipelines work, using standard vector retrieval isn’t out of style.

You’ll find multiple articles showing you how to filter out irrelevant results from semantic searches, something we’ll also be focusing on here using techniques such as clustering and re-ranking.

The main focus of this article, though, is to compare open source and closed source embedding models of various sizes.

The models that will be in focus — there are many more available | Image by author

We will compare up to 9 different embedding models that are high on the MTEB leaderboard. This will give you an idea of how a large versus a small model can perform and what the costs would be as you…