Google Cloud Data Fusion — Enterprise Configuration Part 2

Ryan Haney
2 min readDec 31, 2022

In Part 1, I outlined how to set up a namespace and RBAC to use Data Fusion as a secure, multi-tenant ETL development platform. In this continuation, I’ll be reviewing how to create configuration objects (like a database connection) scoped to a namespace, and how to run the pipelines from each namespace in a corresponding unique project with a unique service account. This allows us to separate permissions by namespace (or really by service account) and to isolate billing for the actual ETL pipeline runs to a particular project for easier finops-style cost management and charge backs.

In this example, there are three projects we’re concerned with. They are:

  1. the Shared Analytics project
  2. the shared VPC host project
  3. the data domain project (continuing our configuration from part 1, this would be the dedicated project for HR).

Our first step is to go to the Service Account section of the data domain project, and create a service account with the Dataproc Worker role assigned. We’ll name this service account svc-hr-dataproc-dev.

Create the service account and assign the Dataproc Worker role

We also need to find the project ID for our shared services project — to do that, select the project, then go to Cloud Overview -> Dashboard. The Data Fusion service account is named “service-<PROJECT_ID>@gcp-sa-datafusion.iam.gserviceaccount.com”, and you should substitute in the project ID number for your shared services project where Data Fusion is running.

On the Service Account creation screen, grant service-<PROJECT_ID>@gcp-sa-datafusion.iam.gserviceaccount.com the service account user role:

Grant the Data Fusion service account the user role on the Dataproc service account.

This ensures that the Data Fusion service account will be able to switch identities to the Dataproc service account when needed to launch the Dataproc cluster in the HR analytics project.

Note the name of the service account we just created, and hop over to the IAM console for the shared VPC host project. We need to grant the new Dataproc service account Compute Network User access in the host VPC account, so the service account can create/consume network resources in the shared VPC account:

Add the new Dataproc service account to the shared VPC project with the Compute Network User role.

--

--

Ryan Haney

I'm a Senior Customer Engineer with Google Cloud, and a long-time cloud architect with the unachievable goal of learning everything about everything.