Cloud Run Service with a Python module FastApi and Uvicorn

Published in

Google Cloud - Community

10 min readMay 26, 2023

1. Explanation of the use case presented in this article

The goal of this article is showing a complete use case with a Cloud Run service written with a Python module, multiple files, Uvicorn and FastApi.

All the examples from the official documentation show a Python service written on a single file deployed on Cloud Run with Flask and Gunicorn.

In real life, some business logics need to be separated on multiple files in order to have better readability, clean code and a separation of concerns.

These reasons led us to write this article and present this complete use case to help the Google Cloud community.

Here you can see the diagram of this use case :

The Cloud Run Service Reads an input Json file from Cloud Storage
Applies business rules in multiple Python files and module
Write the result to a BigQuery table
The deployment of Cloud Run Service is done with Cloud Build and a manual trigger based on the project from a Github repository
The Docker image used for Cloud Run is published in Artifact Registry

An example of raw data in a Json format :

{
  "teamName": "PSG",
  "teamScore": 30,
  "scorers": [
    {
      "scorerFirstName": "Kylian",
      "scorerLastName": "Mbappe",
      "goals": 15,
      "goalAssists": 6,
      "games": 13
    },
    {
      "scorerFirstName": "Da Silva",
      "scorerLastName": "Neymar",
      "goals": 11,
      "goalAssists": 7,
      "games": 12
    },
    {
      "scorerFirstName": "Angel",
      "scorerLastName": "Di Maria",
      "goals": 7,
      "goalAssists": 8,
      "games": 13
    },
    {
      "scorerFirstName": "Lionel",
      "scorerLastName": "Messi",
      "goals": 12,
      "goalAssists": 8,
      "games": 13
    },
    {
      "scorerFirstName": "Marco",
      "scorerLastName": "Verrati",
      "goals": 3,
      "goalAssists": 10,
      "games": 13
    }
  ]
}

The corresponding computed domain data :

{
  "teamName": "PSG",
  "teamScore": 30,
  "teamTotalGoals": 48,
  "teamSlogan": "Paris est magique",
  "topScorerStats": {
    "firstName": "Kylian",
    "lastName": "Mbappe",
    "goals": 15,
    "games": 13
  },
  "bestPasserStats": {
    "firstName": "Marco",
    "lastName": "Verrati",
    "goalAssists": 10,
    "games": 13
  }
}

The goal is to calculate :

The total goals per team
The top scorer node
The best passer node
Set the slogan per team

I also created a video on this topic in my GCP Youtube channel, please subscribe to the channel to support my work for the Google Cloud community :

English version

French version

2. Structure of the project

2.1 Python local environment

The Python local environment uses PipEnv as a package manager and to automate the creation of virtual env.

You can check this video from my GCP Youtube channel that shows :

How having a Python comfortable local environment with PyEnv, PipEnv, DirEnv and Intellij IDEA and navigate in all the files, classes and methods
How to automate the creation of the virtual env for our Python project

The PipFile contains the following packages :

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[dev-packages]
pytest = "==6.1.2"

[packages]
google-cloud-storage = "==2.9.0"
google-cloud-bigquery = "==3.5.0"
fastapi = "==0.95.2"
uvicorn = "==0.22.0"
dacite = "==1.6.0"
toolz = "==0.12.0"

[requires]
python_version = "3.8"

The Cloud Storage Python client to read the input raw file from Cloud Storage
The BigQuery Python client to write the result domain data to a BigQuery table
FastApi package to serve the Python service
Uvicorn an ASGI web server implementation for Python, FastApi is an ASGI framework and compatible with Uvicorn
The other packages are used in the Cloud Run service

2.2 The Cloud Run service code logic

2.2.1 Python code structure

We have a Python root folder called team_league

This root has a service folder containing all the files and logics used for the Cloud Run service part.

For this example we chose to have the Dockerfile inside the service folder instead of put it at the root of the project.

We prefer this approach because if we had had several Cloud Run services, we would have created a directory per service with the Dockerfile inside.

The main.py file :

import dataclasses
import json
import os
import pathlib
from datetime import datetime
from typing import Dict, List

import uvicorn
from fastapi import FastAPI
from google.cloud import bigquery
from google.cloud import storage
from pydantic import BaseModel
from toolz.curried import pipe, map

from team_league.service.domain.team_stats import TeamStats
from team_league.service.domain.team_stats_raw import TeamStatsRaw

app = FastAPI()

current_iso_datetime = datetime.now().isoformat()


def add_ingestion_date_to_team_stats(team_stats_domain: Dict) -> Dict:
    team_stats_domain.update({'ingestionDate': current_iso_datetime})

    return team_stats_domain


def deserialize(team_stats_raw_as_dict: Dict) -> TeamStatsRaw:
    from dacite import from_dict
    return from_dict(
        data_class=TeamStatsRaw,
        data=team_stats_raw_as_dict
    )


class Request(BaseModel):
    team_slogans: Dict


class Response(BaseModel):
    message: str


@app.post('/teams/statistics')
async def teams_league_service(request: Request):
    project_id = os.environ.get('PROJECT_ID', 'PROJECT_ID env var is not set.')
    output_dataset = os.environ.get('OUTPUT_DATASET', 'OUTPUT_DATASET env var is not set.')
    output_table = os.environ.get('OUTPUT_TABLE', 'OUTPUT_TABLE env var is not set.')
    input_bucket = os.environ.get('INPUT_BUCKET', 'INPUT_BUCKET env var is not set.')
    input_object = os.environ.get('INPUT_OBJECT', 'INPUT_OBJECT env var is not set.')

    table_id = f'{project_id}.{output_dataset}.{output_table}'

    bigquery_client = bigquery.Client(project=project_id)
    storage_client = storage.Client(project=project_id)

    bucket = storage_client.get_bucket(input_bucket)
    blob = bucket.get_blob(input_object)
    team_stats_raw_list_as_bytes = blob.download_as_bytes()

    team_stats_domains: List[Dict] = list(pipe(
        team_stats_raw_list_as_bytes.strip().split(b'\n'),
        map(lambda team_stats_bytes: json.loads(team_stats_bytes.decode('utf-8'))),
        map(deserialize),
        map(TeamStats.compute_team_stats),
        map(lambda team_stats: team_stats.add_slogan_to_stats(request.team_slogans)),
        map(dataclasses.asdict),
        map(add_ingestion_date_to_team_stats)
    ))

    current_directory = pathlib.Path(__file__).parent
    schema_path = str(current_directory / "schema/team_stats.json")

    schema = bigquery_client.schema_from_json(schema_path)

    job_config = bigquery.LoadJobConfig(
        create_disposition=bigquery.CreateDisposition.CREATE_NEVER,
        write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
        schema=schema,
        source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
    )

    load_job = bigquery_client.load_table_from_json(
        json_rows=team_stats_domains,
        destination=table_id,
        job_config=job_config
    )

    load_job.result()

    print("#######The GCS Raw file was correctly loaded to the BigQuery table#######")

    return Response(message="Load Team Domain Data to BigQuery")


if __name__ == "__main__":
    uvicorn.run(app, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

The FastApi application is instantiated with app = FastAPI()

The POST request and endpoint :

@app.post('/teams/statistics')
async def teams_stats_service(request: Request):

The endpoint here is a POST HTTP request because we create a new resource and takes a required Request Body parameter. The API will create the statistics for all teams.

We gave a request object with Pydantic containing a Dict with the following association team name -> team slogan.

The slogan field will be associated with the computed statistics per team. In this example, we chose to not pass the entire object to create, because most of the data will be retrieved from the input Cloud Storage file (raw data).

Example of Request Body with Teams Slogans :

{
 "team_slogans": {
 "PSG": "Paris est magique",
 "Real": "Hala Madrid"
 }
}

Environment variables for the service :

Some elements are passed in the service by environment variables :

project_id = os.environ.get('PROJECT_ID', 'PROJECT_ID env var is not set.')
output_dataset = os.environ.get('OUTPUT_DATASET', 'OUTPUT_DATASET env var is not set.')
output_table = os.environ.get('OUTPUT_TABLE', 'OUTPUT_TABLE env var is not set.')
input_bucket = os.environ.get('INPUT_BUCKET', 'INPUT_BUCKET env var is not set.')
input_object = os.environ.get('INPUT_OBJECT', 'INPUT_OBJECT env var is not set.')

In the Python 3.7 runtime, we could retrieve the project ID with a predefined environment variable, but it’s not possible with newer runtimes.

You can check this link for more details

We can also execute an API call to get the current project ID :

http://metadata.google.internal/computeMetadata/v1/project/project-id

In this case, we chose to pass the project ID as an environment variable for simplicity and to not add an additional API call.

ASGI web server with Uvicorn :

The FastApi application is run with Uvicorn

import uvicorn

if __name__ == "__main__":
    uvicorn.run(app, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

The service and POST request return an object with Pydantic. Behind the scenes Pydantic is used with FastApi. It presents some advantages like fields and types validation. Usually the created object is returned but for simplicity we return a Response object.

return Response(message="Load Team Domain Data to BigQuery")

The rest of code logic :

Read raw file and data from Cloud Storage
Apply the business rules and transform the raw data to domain data
Write the result and domain data to a BigQuery output table

It’s worth noting that all the internal imports in the Python code are done from the module root folder team_league, example :

from team_league.service.domain.team_stats import TeamStats
from team_league.service.domain.team_stats_raw import TeamStatsRaw

2.2.2 Cloud Build part and working directory

We want to keep the working directory in Docker at the root of the project.

It’s natively not possible with gcloud builds summit :

gcloud builds submit --tag "$LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:IMAGE_TAG" ./team_league/service

In this case, we can give the folder containing the Dockerfile but the working directory in Docker would be from service folder.

This is not the desired behavior because in the Docker COPY or ADD command, we want having the working directory from the root of the project, in order to more easily copy the elements in the container.

There is a solution to tend to this behaviour with a Cloud Build yaml file using Docker commands instead of gcloud builds summit :

steps:
  - name: 'gcr.io/cloud-builders/docker'
    script: |
      docker build -f team_league/service/Dockerfile -t $SERVICE_NAME .
      docker tag $SERVICE_NAME $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG
      docker push $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG
    env:
      - 'PROJECT_ID=$PROJECT_ID'
      - 'LOCATION=$LOCATION'
      - 'REPO_NAME=$_REPO_NAME'
      - 'SERVICE_NAME=$_SERVICE_NAME'
      - 'IMAGE_TAG=$_IMAGE_TAG'
  - name: google/cloud-sdk:429.0.0
    args: [ './scripts/deploy_cloud_run_service.sh' ]
    env:
      - 'PROJECT_ID=$PROJECT_ID'
      - 'LOCATION=$LOCATION'
      - 'REPO_NAME=$_REPO_NAME'
      - 'SERVICE_NAME=$_SERVICE_NAME'
      - 'IMAGE_TAG=$_IMAGE_TAG'
      - 'OUTPUT_DATASET=$_OUTPUT_DATASET'
      - 'OUTPUT_TABLE=$_OUTPUT_TABLE'
      - 'INPUT_BUCKET=$_INPUT_BUCKET'
      - 'INPUT_OBJECT=$_INPUT_OBJECT'

The Docker commands can be used from the gcr.io/cloud-builders/docker image.

The following command allows to specify the Docker file location and keep the working directory at the root of the project :

docker build -f team_league/service/Dockerfile -t $SERVICE_NAME .

-f for the Dockerfile location
. for the working directory at the root of the project

We can then tag and publish the image to Artifact Registry :

docker tag $SERVICE_NAME $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG
docker push $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG

The second Cloud Build step launches a gcloud deploy command to deploy the Cloud Run service based on the image built previously, via a Shell script :

gcloud run deploy "$SERVICE_NAME" \
  --image "$LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG" \
  --region="$LOCATION" \
  --allow-unauthenticated \
  --set-env-vars PROJECT_ID="$PROJECT_ID" \
  --set-env-vars OUTPUT_DATASET="$OUTPUT_DATASET" \
  --set-env-vars OUTPUT_TABLE="$OUTPUT_TABLE" \
  --set-env-vars INPUT_BUCKET="$INPUT_BUCKET" \
  --set-env-vars INPUT_OBJECT="$INPUT_OBJECT"

2.2.3 The Dockerfile

FROM python:3.10-slim

ENV PYTHONUNBUFFERED True

COPY team_league/service/requirements.txt ./

RUN pip install -r requirements.txt

ENV APP_HOME /app
WORKDIR $APP_HOME
COPY team_league $APP_HOME/team_league

CMD ["uvicorn", "team_league.service.main:app", "--host", "0.0.0.0", "--port", "8080"]

The current image is built from python:3.10-slim image
We copy the requirements.txt file containing the needed Python package for our Cloud Run service and install them in the container
We copy the team_league Python module root folder to the container (in app)

We use Uvicorn Python web server to expose our Cloud Run Service

CMD ["uvicorn", "team_league.service.main:app", "--host", "0.0.0.0", "--port", "8080"]

We have a Python module and the convention is python_root.subfolder.main::app

We can navigate on subfolders with .
main is the name of the Python entrypoint and the main file
app is the variable name used in the main.py file for FastApi object

app = FastAPI()

......

if __name__ == "__main__":
    uvicorn.run(app, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

Uvicorn runs a single process and can also be used with Gunicorn if you want to have some replication of processes to take advantage of multiple cores and to be able to handle more requests. Check this article to have more details.

Containers have their own isolated running processes, so in this context and in a distributed container management system like Kubernetes, it’s better using directly Uvicorn. Check this article to have more details.

3. Deployment of Cloud Run service

The deployment of the service is done with Cloud Build.

Set the following environment variables :

export PROJECT_ID={{PROJECT_ID}}
export LOCATION={{LOCATION}}
export SERVICE_NAME=load-and-transform-team-stats-to-bq-service
export REPO_NAME={{REPO_NAME}}
export IMAGE_TAG="latest"
export OUTPUT_DATASET={{OUTPUT_DATASET}}
export OUTPUT_TABLE="team_stat"
export INPUT_BUCKET={{INPUT_BUCKET}}
export INPUT_OBJECT="{{FOLDER}}/input_teams_stats_raw.json"

The deployment can be done with an execution from the local machine :

gcloud builds submit \
    --project=$PROJECT_ID \
    --region=$LOCATION \
    --config deploy-cloud-run-service.yaml \
    --substitutions _REPO_NAME="$REPO_NAME",_SERVICE_NAME="$SERVICE_NAME",_IMAGE_TAG="$IMAGE_TAG",_OUTPUT_DATASET="$OUTPUT_DATASET",_OUTPUT_TABLE="$OUTPUT_TABLE",_INPUT_BUCKET="$INPUT_BUCKET",_INPUT_OBJECT="$INPUT_OBJECT" \
    --verbosity="debug" .

Or with a Cloud Build manual trigger :

gcloud beta builds triggers create manual \
    --project=$PROJECT_ID \
    --region=$LOCATION \
    --name="deploy-cloud-run-service-team-league" \
    --repo="https://github.com/tosun-si/teams-league-cloudrun-service-fastapi" \
    --repo-type="GITHUB" \
    --branch="main" \
    --build-config="deploy-cloud-run-service.yaml" \
    --substitutions _REPO_NAME="$REPO_NAME",_SERVICE_NAME="$SERVICE_NAME",_IMAGE_TAG="$IMAGE_TAG",_OUTPUT_DATASET="$OUTPUT_DATASET",_OUTPUT_TABLE="$OUTPUT_TABLE",_INPUT_BUCKET="$INPUT_BUCKET",_INPUT_OBJECT="$INPUT_OBJECT" \
    --verbosity="debug"

As a reminder, the deploy command is :

gcloud run deploy "$SERVICE_NAME" \
  --image "$LOCATION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$SERVICE_NAME:$IMAGE_TAG" \
  --region="$LOCATION" \
  --allow-unauthenticated \
  --set-env-vars PROJECT_ID="$PROJECT_ID" \
  --set-env-vars OUTPUT_DATASET="$OUTPUT_DATASET" \
  --set-env-vars OUTPUT_TABLE="$OUTPUT_TABLE" \
  --set-env-vars INPUT_BUCKET="$INPUT_BUCKET" \
  --set-env-vars INPUT_OBJECT="$INPUT_OBJECT"

The Cloud Build trigger :

Build and publishes the Docker image to Artifact Registry
Deploys the Cloud Run Service based on the previous image
The service is public and exposed without authentication

The service was correctly installed and visible in the Cloud Run page in Google Cloud Platform console :

The detail of the service gives the URL to execute it :

Open the service in FastApi :

Enter the service URL followed by docs :

To execute the service, click on Try it out button and execute with the Request Body :

The result is :

Conclusion

This article showed a complete use case with a Cloud Run Service based on a Python module and multiple files.

For a Python module, there is no example in the Google Cloud official documentation. Moreover after checking some links, it’s harder to deal with Python modules with Gunicorn and Flask in a container.

FastApi is a good alternative and proposes more functionalities than Flask, it’s compatible with ASGI web server and Uvicorn.

FastApi and Uvicorn are good candidates for Python APIs and Cloud Run services and allow them to work more easily with Python modules.

In real life, applications are separated on multiple files, to follow the Single Responsibility Principle and separation of concern, we wanted to present this complete and real world use case to help the Google Cloud Community.

All the code shared on this article is accessible from my Github repository :

GitHub - tosun-si/teams-league-cloudrun-service-fastapi: Project showing a complete use case with a…

Project showing a complete use case with a Cloud Run Service written with a Python module and multiple files. The…

github.com

If you like my articles, videos and want to see my posts, follow me on :