Databases

Build richer gen AI experiences using model endpoint management

March 20, 2025

Tabatha Lewis-Simo

Product Manager

Bala Narasimhan

Group Product Manager, Google Cloud

Join us at Google Cloud Next

April 9-11 in Las Vegas

Model endpoint management is available on AlloyDB, AlloyDB Omni and Cloud SQL for PostgreSQL.

Model endpoint management helps developers to build new experiences using SQL and provides a flexible interface to call gen AI models running anywhere — right from the database. You can generate embeddings inside the database, perform quality control on your vector search and analyze sentiment in the database, making it easier to monitor results. This feature is available through the google_ml_integration extension, which enables an integration with Vertex AI for both AlloyDB and Cloud SQL for PostgreSQL.

Previously, the google_ml_integration extension only allowed users to call models hosted on the Vertex AI platform. With model endpoint management, you can leverage models running on any platform — including your own local environment. We also added ease-of-use support for models running on Open AI, Hugging Face, and Anthropic, as well as Vertex AI’s latest embedding models so you can easily access these models. We have preconfigured the connectivity details and input/output transformation functions for these providers, so that you can easily register the model and simply set up the authentication details.

For Vertex AI models, we have pre-registered embedding and Gemini models so that you can easily start calling them. Plus, newer embedding models have built-in support meaning you are able to access the latest versions of pre-registered models allowing you to start making prediction calls out-of-the-box.

In this blog, we’ll walk you through three example workflows that leverage model endpoint management to build richer generative AI experiences.

Generating embeddings with Open AI embeddings models
Leveraging Gemini to evaluate vector search results
Running sentiment analysis to analyze user sentiment

First, register your model.

To use your own model, register your model using the create model function, where you specify model endpoint connectivity details. You can then configure a set of optional parameters that allow you to transform the input and output of the model arguments to a format suitable for your database. Here’s an example of registering Anthropic’s Claude model.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_Nfq5PUi.gif

Once you register your model, you can call it with the predict row function for any AI model — or you can use the embedding convenience function to call an embedding model.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_qj2QZZQ.gif

#1: Generate embeddings with Open AI embeddings models

Model endpoint management allows you to leverage the embedding convenience function with any embeddings model, even ones that don’t run on Google Cloud. Say you want to generate embeddings with OpenAI’s ada embeddings model. With our ease-of-use support you need only register your authentication credentials, register the model, and start generating embeddings. You first need to configure the authentication for the endpoint you would like to reach — you can do so either by creating a PostgreSQL function to specify your API key in the header of the API call or by creating a secret with secret manager and then registering the secret with model endpoint management.

To register your secret, you simply need to specify the secret path and create an ID for the secret. You can find the secret path in the resource manager by clicking on the secret, and then clicking “copy resource name” on the specific version of the secret you want to use.

Once your secret has been registered, you can register your model and point to the secret, open_ai_secret, when you register the openai-ada model. Our ease-of-use support handles the input and output formatting so that you can generate embeddings from data in your database and directly use the output embedding for vector search.

You then need only specify the name of the model you have registered in the first argument and the text in the second argument. For instance, if you want to generate an embedding on the word “I love Google Databases”, you would invoke the embedding function like so:

If you want to generate an embedding in-line while performing a vector search, combine the embedding function with vector search in SQL using the following syntax:

Model endpoint management also has built in integrations with Vertex AI’s latest embedding models, allowing you to access any of Vertex AI’s supported text embedding models. We recommend the embedding() function for in line SQL queries or to generate stored embeddings on datasets smaller than 100k rows.

#2: Leverage Gemini to evaluate vector search results

In addition to a deep integration with embedding models, model endpoint management provides developers out-of-the-box support for the latest Gemini models. Gemini Pro and Gemini Flash Light are both available as pre-registered models in AlloyDB and Cloud SQL for PostgreSQL. Leveraging Gemini, you can generate content, perform sentiment analysis or analyze the quality of vector search results. Let’s see how you might analyze the quality of your vector search results with Gemini using the predict row function.

Suppose you have a table apparels with an ID, product_description and embedding column. We can use model endpoint management to call Gemini to validate the vector search results by comparing a user's search query against the product descriptions. This allows us to see discrepancies between the user's query and the products returned by the vector search.

SELECT
LLM_RESPONSE
FROM (
SELECT
json_array_elements( google_ml.predict_row( model_id =>'gemini-1.5-pro:streamGenerateContent',
    request_body => CONCAT('{
    "contents": [
      { "role": "user",
        "parts":
           [ { "text": "Read this user search text: ', user_text, ' Compare it against the product inventory data set: ', content, ' Return a response with 3 values: 1) MATCH: if the 2 contexts are at least 85% matching or not: YES or NO 2) PERCENTAGE: percentage of match, make sure that this percentage is accurate 3) DIFFERENCE: A clear short easy description of the difference between the 2 products. Remember if the user search text says that some attribute should not be there, and the record has it, it should be a NO match."
           } ]
       }
] }'
)::json))-> 'candidates' -> 0 -> 'content' -> 'parts' -> 0 -> 'text'
AS LLM_RESPONSE
  FROM (
        SELECT
          id || ' - ' || product_description AS literature,
          product_description AS content,
        'I want womens tops, pink casual only pure cotton.' user_text
        FROM
          apparels
        ORDER BY
            embedding <=> embedding('text-embedding-005',
            'I want womens tops, pink casual only pure cotton.')::vector
        LIMIT
          5 ) AS xyz ) AS X;

We are able to pass in the vector search results to Gemini to evaluate how well the user’s query matches the descriptions qualitatively, and note differences in natural language. This allows you to build quality control to your vector search use case so that your vector search application improves over time. For the full end to end use case follow this code lab.

#3: Run sentiment analysis to analyze user sentiment

One of the benefits of calling Gemini in the database is its versatility. Above, we showed how you can use it to check the quality of your vector search. Now, let’s take a look at how you might use it to analyze the sentiment of users.

Say you are an e-commerce company and you want to perform sentiment analysis on user review information stored in the database. You have a table products which stores the name of the product and their descriptions. You have another table of product reviews, product_reviews, storing user reviews of those products joined on the id of the product. You just added headphones to your online offering and want to see how well they are doing in terms of customer sentiment. You can use Gemini through model endpoint management to analyze the sentiment as positive or negative in the database and view the results as a separate column.

First, create a wrapper function in SQL to send a prompt and the text you want to analyze the sentiment on to Gemini with the predict row function.

Now let’s say you want to analyze the sentiment on a single review — you could do it like so:

You can then generate predictions on only reviews containing the word “headphones” by using a LIKE clause and calling your get sentiment function:

This should output whether the review was “positive, negative or neutral” for user reviews regarding headphones. Allowing you to see what the user sentiment is around this new product. Later, you can use aggregators to see whether the majority of the sentiment is positive or negative.

Get started

Model endpoint management is now available in AlloyDB, AlloyDB Omni and Cloud SQL for PostgreSQL. To get started with it, follow our documentation on AlloyDB and Cloud SQL for PostgreSQL.

Posted in

Databases

Nuro drives autonomous innovation with AlloyDB for PostgreSQL

By Fei Meng • 5-minute read

Databases

Build gen AI agents using Google Cloud databases

By Sean Rhee • 5-minute read

Databases

Build GraphRAG applications using Spanner Graph and LangChain

By Amarnath Mullick • 8-minute read

Databases

Vector similarity search for Cloud SQL for MySQL is now GA

By Isabella Lubin • 3-minute read

Build richer gen AI experiences using model endpoint management

Tabatha Lewis-Simo

Bala Narasimhan

Join us at Google Cloud Next

#1: Generate embeddings with Open AI embeddings models

#2: Leverage Gemini to evaluate vector search results

#3: Run sentiment analysis to analyze user sentiment

Get started

Related articles

Nuro drives autonomous innovation with AlloyDB for PostgreSQL

Build gen AI agents using Google Cloud databases

Build GraphRAG applications using Spanner Graph and LangChain

Vector similarity search for Cloud SQL for MySQL is now GA