How to Implement Hybrid Search for RAG with BigQuery

Published in

Google Cloud - Community

11 min readMar 18, 2025

How to implement both keyword and vector search with BigQuery

At this point, it seems like we’re all familiar with the idea of using RAG (Retrieval Augmented Generation) as part of a LLM-based chatbot.

With a RAG approach, we retrieve data from a datastore based on a user query, add that data to our prompt, and then pass that prompt to a model like Gemini. Since we supply (hopefully) relevant information in the prompt itself, the RAG approach drastically reduces the likelihood of hallucinations that can result in incorrect answers.

As great as that is, there are certain use cases where a typical RAG approach falls short. For example, creating a chatbot that can help users browse and search a product catalog is one use case where a traditional RAG approach won’t work well. Let’s dive in to understand why.

Shortcomings of Traditional RAG

A traditional RAG approach typically relies on using a vector (also known as semantic) search only, which searches based on similar meaning, rather than exact words.

Vector search works by converting a search query into a vector (a list of numbers) that encodes the meaning of the query. Mathematically, vectors with similar meanings should be numerically close to each other, so we can simply look for vectors that are numerically as close to the query vector as possible, which should (theoretically) give us results that are most similar in meaning to the search vector, and therefore most relevant.

That’s great as long as you want to search by meaning, but it falls short when trying to search for words like “Google” or “HP” or “Ricoh” or something like that. Those brand names don’t have meanings per se, so searching for something of similar meaning doesn’t generally work well. In this case, a vector search alone isn’t enough — we need a hybrid search that combines keyword and vector searches.

This article will demonstrate how to use Google Cloud BigQuery as a datastore for RAG. By the end of the article, you’ll understand:

When to use a hybrid search instead of a vector-only search
How to implement vector and keyword search using BigQuery
How to call Vertex AI to obtain embeddings for vector search
The main ideas involved in creating a chatbot that allows searching through a large product catalog (as an example)

Why BigQuery?

One of the choices that must be made when designing a RAG solution is choosing which datastore to use. For a catalog solution I recently created, I chose Google Cloud’s BigQuery (BQ) as the datastore for product information

Although BQ is a data warehouse and data warehouses are typically used for analytics, in this case you can think of it as a regular SQL database optimized for queries, which makes it very effective as a datastore for a RAG hybrid search.

As a matter of fact, BigQuery works well as a vector database, a keyword-based search database, or a hybrid combination of both (which is what we will discuss in this article.)

Here are a number of reasons for choosing BigQuery for RAG:

BigQuery is designed to handle petabyte-scale datasets. This is crucial for RAG applications that might need to index and search through vast amounts of text and vector embeddings (like a large product catalog.)
BigQuery includes built-in support for vector search. It also supports keyword search through standard SQL queries (using the LIKE keyword and wildcard expressions.)
BigQuery’s columnar storage and distributed query processing enable extremely fast query execution, even on large datasets. This translates to low latency in retrieving relevant information for your RAG model.
BigQuery has excellent integration with other Google Cloud Services, like Cloud Storage, Vertex AI, and Cloud Functions. This simplifies the development and deployment of your RAG application.
BigQuery provides robust data governance and security features, ensuring that your data is protected and managed effectively.
BigQuery is ultimately a data warehouse, which means if your RAG application does require data analysis and reporting, BigQuery’s capabilities can be leveraged to gain valuable insights.
BigQuery uses a Serverless Architecture. BigQuery’s serverless nature eliminates the need for infrastructure management, allowing you to focus on building your RAG application.

For those reasons, BigQuery was an easy choice for the RAG datastore.

Storing Products in BigQuery

A first step for any RAG solution is to collect and index data. Since we’re creating a catalog chatbot, in this case we need to store information about the available products and services. Here’s what we need to include:

Product Name
Product Description
Product Category
Filename that the product information came from
Keywords for the product
Embeddings for the product (calculated from extracted information)

A few notes about these fields:

The product name and description are simple text descriptions of a product.
The product category is important since we can filter product searches based on our current category. Each product should have a single product category associated with it.
The filename of the source PDF is important since it allows us to remove products in the future. This would be done by using a Cloud Run Function to detect the removal of a file from a storage bucket, which would then remove all of the associated products and services. This allows our users to change and update the list of products simply by adding or removing files from a storage bucket.
Keywords are quite important when dealing with products and services. This is the field that will allow us to do keyword matching, as with brand names.
Finally, embeddings are calculated by combining the product name and description into a string, and then create an embedding for that string. This allows our users to search based on the meaning of the words in the product description.

As we mentioned earlier, we can use BigQuery to store all product information. Here’s a schema for what we’re going to store:

Generating Embeddings

There are a variety of ways to create embeddings, including using the built-in BigQuery function GENERATE_EMBEDDING(). However, in this case we’ll call Vertex AI to generate them, using some Python code.

That code would normally be part of a larger overall RAG solution, which handles both ingesting and storing product information into BigQuery, and also the actual chatbot operation when you retrieve data from BigQuery in order to respond to the user.

Note that there are two functions to generate embeddings: one for embedding the text that describes a product, and another for generating embeddings to be used during retrievals.

from vertexai.preview.language_models import TextEmbeddingModel, TextEmbeddingInput

EMBEDDING_MODEL_ID = "text-embedding-005"

def get_embeddings_for_storage(title: str, text: str) -> List[float]:
    model = TextEmbeddingModel.from_pretrained(EMBEDDING_MODEL_ID)
    text_embedding_input = TextEmbeddingInput(
        task_type='RETRIEVAL_DOCUMENT',
        title=title,
        text=text)
     embeddings = model.get_embeddings([text_embedding_input])
     return embeddings[0].values

def get_embeddings_for_retrieval(text: str) -> List[float]:
     model = TextEmbeddingModel.from_pretrained(EMBEDDING_MODEL_ID)
     text_embedding_input = TextEmbeddingInput(
        task_type='RETRIEVAL_QUERY',
        text=text
     )
     embeddings = model.get_embeddings([text_embedding_input])
     return embeddings[0].values

It’s important to notice that there are differences in the value of the task_type parameter, which indicates why we are generating the embeddings. RETRIEVAL_DOCUMENT is for the initial embedding operation, and the code takes both a title and body text. RETRIEVAL_QUERY, on the other hand, is used when getting embeddings for retrieval.

Product Categories

One of the challenges that RAG systems have is ensuring that they retrieve the right information. Even when you use embeddings (and potentially keywords), you can end up retrieving too much information, or the wrong information. This is especially true when dealing with very large amounts of data. Imagine that our catalog contains hundreds of thousands of products — we want to ensure that we are retrieving only the most relevant.

By assigning a category to each product, and then also understanding which product category the user is asking about, we can filter the product data to be highly relevant.

In this example, we’re putting together a chatbot for IT products and services, so we’ll start with the following categories:

Hardware / Networking
Hardware / Servers
Hardware / Storage
Hardware / Printers
Hardware / Telecommunications
Hardware / Desktop computers
Hardware / Laptops and Tablets
Hardware / Computer Accessories
Software / Virtualization and Operating Systems
Software / Management Software
Software / Database Systems
Software / Telecommunication Systems
Software / Productivity Software
Software / Security Software
Software / Other Software
Services / Cloud Computing
Services / Network Services
Services / End-User Support
Services / Staffing
Services / Telecommunication

Each category is simply a string. Although this example uses two levels in its taxonomy, you can create your own list of categories in any way you wish, with any number of layers. Use whatever structure makes the most sense for your use case. Just make sure that categories don’t overlap, or else the model will have a hard time determining which category is the current one.

Understanding the Current Category

During the conversation with the user, we can have Gemini look at the list of categories and the conversation history in order to determine which category is the current topic of conversation. We do this using the following prompt:

You are an expert in the field of IT provisioning and supplies.

Each product or service that can be found in the catalog has a category. 
Those categories are listed below.

**Categories:**
{categories}

Look at the following conversation between a potential buyer and an AI guide, 
and pay close attention to the latest part of the conversation. 
You should return the category of the product or service that the buyer is 
looking for.

**Conversation:**
{conversation}

It's possible that the buyer may be asking questions that are very generic 
and not directly related to a particular category. 
For example, if they ask about "Software",but don't specify what kind of 
software they are looking for, you should return "Software".

However, if they are asking about a specific category, you should return 
the category. For example, if they are asking about hardware servers, 
you should return "Hardware / Servers".

If you can't figure out what category the buyer is looking for, 
you should return "Unknown".

Notice how we check if the conversation is still somewhat generic (meaning the user hasn’t been specific enough about what they are looking for), we get a category of “Unknown”, which we can interpret as a trigger to ask Gemini to provide general information about the categories and ask the user for more detail.

However, once we know which category the user is interested in, we can ask Gemini to extract likely search keywords based on their query. Along with that, we can also ask Gemini to give us a short string that describes what the user is searching for. We take that string and turn it into a vector embedding.

Then, using the current category, the current product keywords, and the embedding for the user query, we can perform our hybrid search.

Retrieving Products From BigQuery

At this point, we have the user query, its corresponding embeddings (calculated separately), a list of keywords that should help with searching, and a current category name. Here’s how we do the retrieval of the relevant products:

def get_products(current_category: str, 
                 keywords: str, 
                 embedding: List[float]) -> str:

     # using a combination of keywords and the embeddings, search
     # through the relevant category and return product information as a
     # string that can be included in the prompt
     # set up our keyword query so it's like:
     # SELECT…WHERE LOWER(keywords) LIKE ANY ('%hp%', '%color%')
     # keywords come in looking like: '"HP", "color printer", "etc"'
     keywords = [f"'%{k.lower().strip()}%'" for k in keywords.split(',')]
     keyword_match_string = ", ".join(keywords)

     if keyword_match_string:
           QUERY = (
              'SELECT product_name, product_description '
              'FROM `dataset.products` '
              f"WHERE category = '{current_category}' AND "
              "LOWER(keywords) LIKE ANY ({keyword_match_string}) "
              'LIMIT 10')
           query_job = bqclient.query(QUERY)
           keyword_rows = query_job.result()
     else:
           keyword_rows = []

     # notice the different SELECT when dealing with a vector search.
     # Also, need to select "base.product_name" (etc.)
     # because that's how it's returned (there are other vector-related 
     # fields also returned, which we ignore)
     QUERY = (
           'SELECT base.product_name, base.product_description '
           "FROM VECTOR_SEARCH("
           f"(SELECT * from dataset.products where category = '{current_category}'), "
           "'embeddings', "
           f'(SELECT {embedding} as embed), '
           'top_k => 10, '
           "distance_type => 'COSINE');"
     )
     query_job = bqclient.query(QUERY)
     vector_rows = query_job.result()
  
     # now combine the results from the keyword search and the vector search
     product_rows = list(vector_rows) + list(keyword_rows)
     if len(product_rows) == 0:
         return "No products found"

     final_prods = [f"**Product: {prod.product_name}**: {prod.product_description}" for prod in product_rows]
     return "\n".join(final_prods)

The product retrieval is done in two steps: retrieving products by keyword, then by embeddings. Once both sets are retrieved, we combine the two and create a string that lists out the products.

Note that we pass in the keywords to this function as a single string separated by a comma. This is how we asked Gemini to return it, so we need to modify it a bit to make it work with our SQL SELECT statement.

First, we split and clean each keyword by turning it into lower case and stripping surrounding blanks. Then we surround each keyword with percentage symbols, since that signifies a wildcard match. Finally, we combine everything into a single SQL statement that may look something like this:

SELECT product_name, product_description
FROM `dataset.products`
WHERE category = 'Hardware / Printers'
AND LOWER(keywords) LIKE ANY ('%hp%', '%ricoh%')
LIMIT 10

Notice how we first select only products that match our current category, which drastically reduces the possible matches and makes the selection more accurate. Then we use the LIKE ANY condition to match the keywords against a list of wildcard patterns, and limit our results to the first ten rows returned.

The second query uses BigQuery’s built-in vector search. Here’s what the vector selection statement looks like:

SELECT base.product_name, base.product_description
FROM VECTOR_SEARCH(
  (SELECT * from dataset.products where category = 'Hardware / Printers'),
  'embeddings',
  (SELECT {embedding} as embed),
  top_k => 10,
  distance_type => 'COSINE'
)

You can see that the structure of the SQL SELECT statement is quite different from a standard SQL query. In this case, we use the VECTOR_SEARCH() function, which takes a subquery (for the data to search), the field name of the embeddings column (“embeddings”, in our case), the embeddings to match on, and parameters like top_k and distance_type (which is how embedding vectors are compared to each other.)

Finally, notice that we select “base.product_name” and “base.product_description” rather than just “product_name” and “product_description”. This is because the vector search returns a number of fields, and the “base.” prefix refers to the fields from the products table, rather than other data that is related to the vector search.

One last thing to talk about is using an index for performance reasons. Although indexing is typically used to improve retrieval speed, it’s mostly automatic in BigQuery for traditional queries. However, BigQuery does support indexes for vector search, and this is recommended once your set of potential matches (i.e., the number of products in the table) gets very large. See this web page for more information: https://cloud.google.com/bigquery/docs/vector-index

Conclusion

When you need a RAG data source (hybrid or otherwise), BigQuery is a very effective solution. With its built-in vector search capabilities, practically unlimited scalability, and fast querying capabilities, it can be a powerful tool when building a RAG LLM solution.

Although keyword search and vector search are two distinct approaches, BigQuery can support both as part of a hybrid solution by using the techniques described in this article.

For more information about using vector search with BigQuery, please see this web page.

Google Cloud - Community

How to Implement Hybrid Search for RAG with BigQuery

Shortcomings of Traditional RAG

Why BigQuery?

Storing Products in BigQuery

Generating Embeddings

Product Categories

Understanding the Current Category

Retrieving Products From BigQuery

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Google Cloud - Community

Written by Greg Sommerville

No responses yet