In the majority of applications we develop, a search component facilitates rapid retrieval of hard-to-find items. These searches typically rely on text data, and for locating images or videos, tagging with label data is necessary. This approach is effective when dealing with structured datasets.
Over the past 5–6 years, unstructured data has experienced substantial growth, driven by the social media boom and shifts in consumer buying habits. Consumers have become more advanced in their purchasing behaviour, seeking options like finding products similar to their previous purchases or conducting image-based searches for similar items.
Can these be achieved through conventional search methods? Definitely not; they demand something unique, which is where Vector search becomes essential.
While traditional search depends on keyword mentions, lexical similarity, and word occurrence frequency, vector search engines utilize distances in the embedding space to depict similarity.
What is embedding: In layman’s language embedding is a mathematical representation of any image or text.
So, how do we build a Vector search? Below are the simple processes to build vector search but it can become complex when more no. of features get added.
1. Generate embeddings for all search data (text, image, video) using ChatGPT Embedding API or an open-source language model (LLM).
2. Utilize a vector database like PGSQL Vector extension or other proprietary options to store these embeddings.
3. Create an embedding for a search query using the same ChatGPT API or an open-source LLM.
4. Construct a WHERE clause to match the search query embedding with the stored embeddings.
Note: The results will be approximate matches, as the nearest neighbour algorithm is used, rather than exact matching.
For hands-on experience and guidance on building a vector search using React, Next.js, and PGSQL Vector extension, you can refer to the GitHub project.
We would love to hear your thoughts and comments on vector search and recommendation engine development.