Generative AI — PaLM-2 model deployment with Cloud Run

Rafa Sanchez
Google Cloud - Community
3 min readMay 26, 2023

--

This post shows a frontend in Gradio, deployed in Cloud Run, that exposes one of the PaLM-2 foundational models text-bison@001.

text-bison@001 is one of the foundational models based on PaLM-2 that is available in Vertex AI. This post shows a front-end exposing this model and its main parameters (temperature, output tokens, top-P and top-K) via a Gradio app.

The model text-bison@001 is fine-tuned for language tasks such as classification, summarization, and entity extraction.

The frontend is deployed through a Gradio app in Cloud Run. A screenshot of the app follows:

Fig.1 Gradio app exposing PaLM-2 model with its main parameters: temp, output, topP, topK

PaLM-2 in Vertex AI

text-bison@001 is one of the foundational models available in Vertex AI, based on PaLM-2, and fine-tuned for certain language tasks. Details about PaLM-2 can be found in the technical report. Using the Vertex AI SDK, you can easily call the publisher endpoints for this model:

vertexai.init(project=PROJECT_ID, location=LOCATION)
model = TextGenerationModel.from_pretrained("text-bison@001")
model.predict(
prompt,
max_output_tokens=max_output_tokens, # default 128
temperature=temperature, # default 0
top_p=top_p, # default 1
top_k=top_k) # default 40

User-managed service account for Cloud Run

Since the application is deployed in Cloud Run, it uses the permissions of the compute service account by default to call the model. It’s recommended to use a separate service account with the minimum permissions. To do that, create a service account with impersonation and the following two extra roles: roles/aiplatform.user to be able to call predictions and roles/logging.logWriter to be able to write logs:

# Create service account
gcloud iam service-accounts create cloud-run-llm \
--description="Service account to call LLM models from Cloud Run" \
--display-name="cloud-run-llm"

# add aiplatform.user role
gcloud projects add-iam-policy-binding argolis-rafaelsanchez-ml-dev \
--member="serviceAccount:cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"

# add logging.logWriter role
gcloud projects add-iam-policy-binding argolis-rafaelsanchez-ml-dev \
--member="serviceAccount:cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com" \
--role="roles/logging.logWriter"

# add permission to impersonate the sa (iam.serviceAccounts.actAs), since this is a user-namaged sa
gcloud iam service-accounts add-iam-policy-binding \
cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com \
--member="user:<REPLACE_WITH_YOUR_USER_ACCOUNT>" \
--role="roles/iam.serviceAccountUser"

Build and deploy in Cloud Run

To build and deploy the Gradio app in Cloud Run, you need to build the docker in Artifact Registry and deploy it in Cloud Run.

Note the --allow-unauthenticated parameter (no authentication required to access the app) and the --service-account parameter pointed to the one configured earlier:

gcloud auth configure-docker europe-west4-docker.pkg.dev
gcloud builds submit --tag europe-west4-docker.pkg.dev/argolis-rafaelsanchez-ml-dev/ml-pipelines-repo/genai-text-demo
gcloud run deploy genai-text-demo --port 7860 --image europe-west4-docker.pkg.dev/argolis-rafaelsanchez-ml-dev/ml-pipelines-repo/genai-text-demo --service-account=cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com --allow-unauthenticated --region=europe-west4 --platform=managed --project=argolis-rafaelsanchez-ml-dev

Conclusions

This post shows how to deploy a simple Gradio app that exposes a PaLM-2 model for text generation deployed in Cloud Run.

text-bison@001 use cases includes dialog summarization, text generation, scoring for marketing, and many others.

You can find the repo with all the code in this link.

References

[1] PaLM-2 technical report
[2] YouTube video: Generative AI on Google Cloud
[3] YouTube video: Build, tune, and deploy foundation models with Vertex AI
[4] YouTube video: Build, tune, and deploy foundation models with Generative AI Support in Vertex AI
[5] Overview of Generative AI support on Vertex AI.

--

--

Rafa Sanchez
Google Cloud - Community

I'm Rafa, Machine Learning specialist working @GoogleCloud. Ph.D. and Lecturer at the @uc3m University about IoT and on-device ML.