Minimal Downtime Cloud Spanner Migrations Using HarbourBridge

Priyakondapi
Google Cloud - Community
6 min readMar 7, 2023

--

Google Cloud Spanner is a fully managed, strongly consistent and highly available distributed database providing up to 99.999% availability.

The distributed scaling nature of Spanner’s architecture, transaction external consistency and zero downtime schema updates properties makes it an ideal solution for business with global presence, unpredictable workloads and strict consistency requirements such as Gaming, HealthCare, Financial and Retail services.

Migrating a fully stable & highly complex source database system into Spanner should be carefully planned through a proper discovery and risk analysis as it carries significant business risks due to the internal architecture differences between various traditional database systems and Spanner. The common challenges with transactional database migrations are ensuring minimal application downtime and high throughput of data transfer.

There are various tools available to facilitate and ease this migration process. However, these tools should only be considered as the starting point to get the basic working model on Spanner which later should be enhanced with the features ignored by the tools.

HarbourBridge is one such tool under the Cloud Spanner Ecosystem. It is a stand-alone open source tool for Cloud Spanner evaluation and migration. It can be used for various stages of migration like assessment, schema migration and data migration. The tool supports existing PostgreSQL, MySQL, SQL Server, Oracle or DynamoDB databases.

Latest release of HarbourBridge 2.0 has enhanced capabilities like gCloud Integration, improved schema assessment and recommendations and end to end migrations with minimal downtime.

Connecting HarbourBridge to Source

There are three ways to connect HarbourBridge to your source database:

  1. Direct connection to Database — for minimal downtime and continuous data migration for a certain time period
  2. Data dump — for a one time migration of the source database dump into Spanner (Oracle as source is not supported)
  3. Session file — to load from a previous HarbourBridge session

Migration Type

Data migration with HarbourBridge can be of 2 types:

  1. Minimal downtime migration
  2. Migration with downtime

Migration Mode

With HarbourBridge users can choose to migrate:

  1. Schema-only
  2. Data-only
  3. Both Schema and Data

Let’s take a closer look at migrating an Oracle database to Spanner with minimal downtime using Harbour Bridge 2.0 UI.

Pre-requisites

  • Create and set up Google Cloud Project and enable billing.
  • Enable Cloud Spanner API.For minimal downtime migrations Harbour Bridge uses GCP Datastream and Dataflow under the hood. So, enable the DataStream and Dataflow API from Google Cloud Console.
  • Oracle is installed and a database is set up with all the required DDLs for the source and a Spanner instance is created for the target.
  • Data validation queries for both Oracle and Spanner

For the purpose of this demo, CHINOOK online sample database is created on Oracle 11g XE edition installed on a GCP VM instance. Below are the high level steps involved in the migration.

  1. Install HarbourBridge from gCloud

gcloud components install harbourbridge (or)

sudo apt-get install google-cloud-sdk-harbourbridge

2. Run the HarbourBridge tool in GUI mode

gcloud alpha spanner migration web

3. Configure the Spanner database.

4. Select ‘Connect to Database’ and provide the Oracle connection details.

NOTE: HarbourBridge UI supports only Google SQL dialect as a Spanner destination right now.Support for PostgreSQL dialect will be added soon.

5. After a successful connection to the source and target database, HarbourBridge schema assessment page is displayed.

6. Harbour Bridge provides comprehensive assessment results and recommendations for improving the schema structure and performance. Click “View Assessment” on the page to view the assessment.

7. At this point, any schema changes like column types, primary keys, foreign keys, indexes etc.. can be performed.

For example, table Album can be interleaved in the Artist table thus enabling a parent child relationship with colocation of data. Edit the primary key of the Album table to include “ArtistId” in the first.

8. Confirm the schema changes before preparing for the migration.

9. Click the “Prepare Migration” button on the HarbourBridge page.

10. Select Migration Mode as “Schema and Data” and Migration Type as “Minimal Downtime Migration”

11. Set up Target Cloud Spanner Database.

12. Select the existing source connection profile or create a new by allow-listing the IP Addresses displayed on the “Create a new connection profile” screen to the source database. This is the connection to the Oracle data source.

13. Select the Target Connection Profile or create a new one.This is the connection to the Datastream job destination which is the Cloud Storage.

14. Once the details are provided, trigger the migration process by hitting the “Migrate” button.

15. Once the message “Schema migration completed successfully” is displayed on the same page, you can validate the schema created on the Spanner database.

16. To manage a low-downtime migration,HarbourBridge orchestrates the following processes under the hood.

  • Setting up a Cloud Storage bucket to store incoming change events on the source database while the snapshot migration progresses
  • Setting up a datastream job to bulk load a snapshot of the data and stream incremental writes.
  • Setting up the Dataflow job to migrate the change events into Spanner, which empties the Cloud Storage bucket over time

17. Monitor the dataflow and stream jobs running.

18. Validate the initial data by running the count sqls in source and the target.

19. To validate streaming data, insert a record(s) into any table at source.

20. Check if the Spanner database table got the updates in migration. Go to the Spanner table and query the same table.

21. Wait until all the data is migrated over and validated.For this purpose, GCP Data Validation Tool tool or any third party tool or SQL queries can be used.

22. End the migration and let Harbour Bridge remove the jobs and dependencies that were created in this process.

Advantages

  1. Single tool to perform the schema, bulk and incremental data migration
  2. Easy to use gCloud Integration
  3. Spanner schema customisation is possible to some extent
  4. Generates reports for migration result and validation
  5. Cost effective solution in maintaining a single code base

Constraints

  1. Harbour bridge is suitable for migrating small-sized databases (up to about 100GB) to Spanner without strict downtime requirements.
  2. Schema modifications like multi level table interleaving are not supported.
  3. Conversion to Spanner ARRAY type is currently not supported.
  4. As Datastream is used under the hood for low downtime migrations, the limitations of Datastream on Oracle version compatibility apply for this tool also.

--

--