GCP Routing Adventures vol. 2: enterprise multi-regional deployments in Google Cloud

Luca Prete
Google Cloud - Community
14 min readMar 12, 2023

--

I’m thrilled to share another article on advanced routing in GCP. If you haven’t read GCP routing adventures vol. 1 yet, I highly recommend doing so before continuing with this reading. Otherwise, let’s dive in!

In this article, I discuss how to use Network Connectivity Center Router Appliance (NCC RA) to create multi-regional, hub-and-spoke architectures using Network Virtual Appliances (referred as NVAs or appliances in the following sections). This approach allows you to avoid many of the constraints that users normally face, such as the use of tagged routes, the need for manual routes programming during regional disasters, asymmetric routing, unnecessary cross-regional traffic, and cost increases.

I promise this article will be worth the read, even though it’s a bit long. By the end, you’ll know how to create multi-region architectures without too much hassle.

Want to jump straight to the solution? Go to the Leveraging NCC-RA section.

This is the architecture we’ll focus on:

The reference hub-and-spoke architecture we’ll focus on in this reading.

The diagram shows a hub-and-spoke architecture. The net-landing project hosts an untrusted VPC and a trusted VPC. These connect through some Network Virtual Appliances (NVAs) which act as Intrusion Prevention Systems (IPSs). The spoke VPCs live in standalone projects and connect to the trusted hub VPC through network peerings (we’ll also see variants with Cloud VPNs). Each VPC has workloads (including NVAs) in two regions (region-1 and region-2). The NVAs need to be resilient to zonal failures, so two of them are in different zones in each region.

Moving forward, I’ll use europe-west1 (ew1) and europe-west4 (ew4) as regions, and a and b as zones.

The goals of this design are to:

  • Avoid using network tags to route traffic.
  • Route traffic symmetrically to prevent breaking stateful NVAs.
  • Avoid unnecessary NAT traffic at the NVAs.
  • Avoid cross-regional traffic unless absolutely necessary for disaster recovery.
  • Automatically send all traffic through the cross-regional NVAs if the ones in-region fail.
  • Keep the trusted hub VPC unique, rather than having one per region.

State of the art and alternatives

The main problem with legacy designs is that they rely on static routes. Google Cloud installs static routes with the same cost in each regional VPC routing table. To learn more about this, check out my previous article, GCP Routing Adventures vol. 1.

Here are some common design decisions that users make, and the issues they face with each one.

Two next hops for my traffic

To avoid zonal failures, users often cluster their NVAs in regional Instance Groups. Next, they configure regional Internal Load Balancers (ILBs), so that traffic is distributed among the backend VMs.

Note that GCP has recently added the ability to reference the VIPs of ILBs living in other (peered) VPCs as a next hop. To learn more about it, read this article from my colleague Osvaldo [Oz] Costa.

Finally, they configure default static routes in the trusted VPCs that point to the ILB VIPs, so that traffic from the trusted networks always goes through the appliances, before reaching an untrusted destination. Google Cloud will configure these static routes with the same cost, in both regions.

Using static routes, resources in each VPC have two next hops for the same prefix. This causes cross-regional traffic, thus increasing costs.

You’ll end up having two next hops for your traffic (to 0.0.0.0/0). Google Cloud will deterministically select one of the two next hops for all your traffic, based on an internal algorithm.

This will cause traffic to unnecessarily go cross-region, resulting in higher latencies and extra costs. Definitely, a no-go.

Automatic regional failover

To work around the issue and route traffic through regional NVAs, you can configure tagged routes in your trusted VPCs as follows:

  1. Add network tags to your resources (e.g., VMs, GKE cluster nodes) representing the region where they live.
  2. Create static routes using these tags as a filter (aka tagged routes). These routes will forward traffic to the NVAs in that region.
  3. In each region, add backup tagged routes at a higher cost, pointing to the cross-regional NVAs.

The following diagram shows how this works for one of the spokes.

Tagged routes allow traffic to go through the in-region NVA. Users can also pre-provision backup routes, so if the in-region NVAs fail, a route to reach the cross-regional NVAs is already present. However, backup routes cannot be used until the primary routes are manually removed.

This approach allows you to avoid cross-region traffic during normal operations. However, it comes with two important limitations:

  • You’ll need to tag your resources. While you can create automated scripts to automatically tag them as they get created, this is still additional work. Additionally, some third-party vendor services do not support tags.
  • Static routes cannot monitor the health of the next hop. If a regional NVA fails, Google Cloud will not automatically withdraw the corresponding route. You will need to manually do so or create some scripts to do so.

Despite these limitations, let’s finish configuring the system by adding routes in the untrusted VPC so that traffic is routed through the NVAs living in the same region of the destination. This ensures symmetric traffic flows.

  • One route for the ew1 trusted subnets, that sends traffic through the ew1 NVAs.
  • One route for the ew4 trusted subnets, that sends traffic through the ew4 NVAs.
Static routes programmed in the untrusted VPC, so that traffic can remains symmetric.

I have not included backup routes in this example for the sake of simplicity, but you should have them ready in case of regional NVA failures.

Untagged routes also ensure that Cloud VPNs and Cloud Interconnects (included in the diagram) can work as well. Indeed, Cloud VPN and Cloud Interconnect do not support tagged routes.

Again, using static routes will not allow you to automatically route traffic to the cross-regional NVAs in the event of a regional NVA failure.

Spokes connected via VPN

If your spokes are connected via Cloud VPN instead of VPC peering, it will be more difficult to manage your routes. You won’t be able to have routes in the spokes that point to ILBs living in VPCs connected via VPN.

While you can still use tags to select the regional spokes traffic and send it through the corresponding regional VPN tunnel, you’ll need to create untagged routes in the trusted hub VPC so that traffic can go through the NVAs.

This will again result in two routes and two next hops.

Spoke connected with VPNs would generate cross-regional, asymmetric traffic traffic, being the traffic routed from the trusted VPC through both the NVA clusters.

Again, traffic would go unnecessarily cross-region, generating extra costs and breaking your (stateful) appliances.

Network Connectivity Center Router Appliance

Network Connectivity Center (NCC) allows you to create a central hub for your hybrid resources (NCC “spokes”), including VPNs, interconnects, and third-party appliances. In this architecture, we’ll focus on using third-party appliances.

The router appliances then establish BGP sessions with Cloud Router. Cloud Router receives the routes from the appliances and programs them on the VPC. Vice versa, Cloud Router announces the VPC routes to the router appliances.

NCC Router Appliance (RA) is a specialized NCC spoke that allows you to create BGP sessions between Cloud Routers and any third-party appliances that can speak BGP. These appliances can be for example virtual routers, or SD-WAN appliances. Once activated, Cloud Routers will announce the VPC routes to the peering device, while Cloud Routers will receive the routes from the appliances and program them in the VPC.

Leveraging NCC-RA

Dynamic routes allow resources in your VPCs to automatically leverage routes with different next hops, without the need for tags. When global routing is enabled in your VPCs, Cloud Routers program dynamic routes in each regional routing table, adding a cross-regional cost when the route is programmed in a different region from the one where the Cloud Router lives. This allows traffic to follow the routes pointing to next hops within the same region first, given they have a lower cost (BGP MED), and other next hops in other regions later, in case the primary fail.

As you’ll see in the diagrams below, you’ll no longer need ILBs, as the next hop of the BGP routes are the NVAs.
At the moment of writing, Google Cloud doesn’t support announcing routes whose next hop is different from the IP of the appliance that announce the route.

The failover will happen automatically, given that in case of a failure, the routes pointing to the primary next hop will automatically be withdrawn.
In our scenario, if the in-region NVAs fail, the routes pointing to the NVAs will also be removed (being the NVAs themselves the ones announcing the routes) and the traffic will automatically flow through the cross-regional NVAs.

Implementation

To create a hub-and-spoke architecture using NCC, you will need to create four NCC hubs and four NCC spokes. You will bind a Cloud Router with two redundant interfaces to each spoke. Given we have two NVAs in each region for zonal failures, each interface will form a BGP session with both the NVAs within its network, in the region where the Cloud Router lives.

While you can avoid configuring the redundant BGP sessions, you still need to create both redundant interfaces to be able to associate a Cloud Router to an NCC spoke.

In summary, each Cloud Router will have 4 BGP sessions, 2 per interface, with the NVAs.

One NCC hub with four NCC spokes. Each spoke binds to a Cloud Router. Each Cloud Router has two redundant interfaces, each connecting to both the appliances in its VPC and in its region.

Dropping this in our infrastructure, things should roughly look like this:

How Cloud Routers would link to our infrastructure, once NCC is configured.

For BGP routing, we want to keep things simple. I recommend using the following:

  • Three unique AS numbers: one for the untrusted Cloud Routers, one for the NVAs, and one for the trusted Cloud Routers.
  • Active/active appliances that can synchronize the session: this allows you to advertise routes from the NVAs within the same cluster (region) to the Cloud Routers with the same priority. I’ll talk more about this in the following paragraphs.

Route Advertisements

To keep things simple, I recommend advertising IP ranges from the Cloud Routers only. The NVAs will simply re-advertise those ranges from one side to the other.

Reminder: you can only advertise routes from Cloud Routers with the same base cost.

For this example, I will use 0.0.0.0/0 as the default route. This means that all unknown/non-local traffic will be routed through the NVAs. You can opt for smaller subnets. The concepts we’ll go through will stand.

Let’s start looking at the routes advertised “left to right”: from the Cloud Routers in the untrusted VPC to the NVAs, and from the NVAs to the Cloud Routers in the trusted VPC.

In the diagrams below, I won’t draw the spokes for simplicity and I will add sample subnets in the hub VPCs.

Routes exchanged “left-to-right”: from the Cloud Routers in the untrusted VPC to the NVAs, and from the NVAs to Cloud Routers in the trusted VPC.

Let’s see what our trusted VPC regional routing tables would look like:

The trusted VPC regional routing tables after the default route has been received.

Each regional routing table has a default, in-region next hop, and a secondary cross-regional next hop for the same prefix, programmed by the Cloud Router living in the other region. The cross-regional backup routes have a higher cost, given cross-regional Cloud Routers installed them adding a dynamic cross-regional cost (X in the diagrams).

The default route will be the in-region next hop. This means that traffic for unknown/non-local destinations will be routed through the in-region NVAs.

The cross-regional routes will be used as backup routes in case the in-region route fails. This is because they have a higher cost, which means that they will only be used if the in-region route is unavailable.

Let’s now look at the routes advertised right to left: from the Cloud Routers in the trusted VPC to the NVAs, and from the NVAs to the Cloud Routers in the untrusted VPC.

Routes exchanged “right-to-left”: from the Cloud Routers in the trusted VPC to the NVAs, and from the NVAs to the Cloud Routers in the untrusted VPC.

Notice we’re announcing /22 ranges, as opposed to the two /24 trusted IP ranges. These represent the two “trusted side” regional aggregates. They include the /24 ranges of the trusted hub VPC and potentially other subnets ranges (i.e. used for the spokes) for those regions. More around the motivations for this later in the article.

We announce again all the routes with cost 100. Each Cloud Router announces the routes of both regions. We want indeed each Cloud Router (and NVA) to route our requests, even if the cross-regional Cloud Router (or NVA) is down. Again, given we configured global routing, the Cloud Routers in the untrusted VPC program the routes in both regions.

Let’s see the result:

Untrusted VPC, regional routing tables, once Cloud Routers have programmed the dynamic routes they received from the NVAs.

Unfortunately, we still miss something! Can you spot what? We’ll see it in the next paragraph.

Asymmetric routing hitting us again

Let’s try again to make a VM in the untrusted VPC in europe-west1 to communicate with another VM in the trusted VPC in europe-west4. What would happen? Let’s go through the routing tables and check them out.

Untrusted/trusted cross-regional communication is still asymmetric, thus breaking our (stateful) NVAs.

We have previously encountered a similar issue when we discussed a possible solution using tagged routes. In this approach, the source VM in the untrusted VPC would forward the packets to its regional NVAs (1), which would forward the packets out of the trusted interface (2). The packets would then reach the destination VM in the trusted VPC in europe-west4. The destination VM would send the packets back through the europe-west4 regional NVAs (3, 4), which would then forward the packets back to the source (5).

The stateful appliances would break again, as traffic would hit different NVA clusters on the way to the destination and on the way back.

To work around this, we need to force the traffic to change region in the untrusted VPC, before moving to the trusted area.

You need to always change the region in the untrusted VPC.

This can be achieved by modifying the MED values advertised by the NVAs to the Cloud Routers in the untrusted VPC. Using route maps, you need to advertise from each NVA the cross-regional trusted subnets with a much higher value. Given that cross-regional penalties may theoretically get up to 9999, we’ll set those MED values to 10000. Once Cloud Routers program their routes in both regions, you’ll achieve the following result:

Acting on the costs of the cross-regional routes advertised by the NVAs to the Cloud Routers in the untrusted VPC, you can force traffic to always go through the same appliances.

Now, the source VM in the untrusted VPC forwards the packets to its cross-regional NVAs (1), which would forward the packets out of the trusted interface (2). The packets would then reach the destination VM in the trusted VPC in europe-west4. The destination VM would send the packets back again through the europe-west4 regional NVAs (3, 4), which would then forward the packet back to the source (5).

We fixed our routing, guaranteeing traffic to remain symmetric.

Spokes routing

Regardless how you connected your spokes to the trusted VPC, they will be able to properly route traffic with almost no extra configurations.
If your spokes connect to the trusted VPC via Cloud VPN, the dynamic routes will be automatically imported. If you use VPC peering, you will need to configure it to export/import the custom routes, which include the dynamic routes.

Cross-spokes routing

Some users want their spokes to communicate with each other through the NVAs. While this may not be your priority, I recommend keeping it in mind from a routing perspective. Again, you’ll need to act on MED costs to avoid issues with your appliances. If you don’t, you’ll end up generating again asymmetric traffic.

Cross-regional, cross-spoke asymmetric traffic if we don’t add additional routes.

The most reasonable approach I found to go cross-region from one spoke to another and avoid asymmetric traffic, is to send traffic through both the NVAs.

The cross-region, cross-spoke traffic needs to pass through both NVAs.

You can achieve this by establishing additional eBGP sessions between the NVAs. They will share the trusted routes of the regions they are in at a lower cost than the other routes. For example, if we used 100 as the base MED cost for the previous routes, we will use 50 for these.

NVAs exchange trusted regional routes to allow the spokes to communicate.

Once the NVAs exchange the routes, this should be the result.

NVAs routing tables to support cross-regional, cross-spoke communication.

Congratulations! Your completed your routing setup and all your VMs and on-premise resources should be able to talk to each other with no issues.

Design your IP space the right way

I’d like to add an important consideration to the design of your IP space. You want to minimize the number of routes exchanged, avoid exceeding your VPC quotas, and minimize management overhead. To do this, I recommend segmenting your IP space so you can summarize your trusted routes per region. This allows you to:

  • Advertise fewer summarized trusted subnets as you move from the Cloud Routers in the trusted VPC to the NVAs, and from the NVAs to the Cloud Routers in the untrusted VPC.
  • Advertise fewer summarized trusted subnets between your NVAs.
  • Minimize the maintenance of IP prefix lists in your route maps on your NVAs.

Appliances and session sync

A word of caution on NVAs session syncs. In our architecture, load balancing functionality is handled by BGP, meaning that MED values determine how traffic is distributed across appliances in a region. Using the values of the example above, Google will do Equal Cost Multi Path (ECMP) to the appliances in the same region. As such, the NVAs will need to guarantee that the sessions remain in sync. Working with NVAs that don’t provide such functionality means that:

  • your traffic will need to go through one appliance at a time, de-facto making them work as active-passive (aka active-standby)
  • you will need to advertise the same route with different priorities from the NVAs within the same region

The following example shows how MED costs should be modified within a region to make this happen.

The NVA announcements towards Cloud Routers and other appliances, so that appliances without session sync can still work.

In order to make sure that the traffic is balanced across the NVAs in the case of a failure, I set the MED value of the first NVA to a lower value than the second NVA. This means that the first NVA will be the one that serves traffic during normal operations, and the second NVA will be on standby and will take over if the first NVA fails.

Using NVAs to balance traffic does add some complexity to the configuration, but it also limits the use of load balancers that are not internal load balancers (ILBs). ILBs are the only ones that, when used in pairs on the two sides of your NVAs, can provide symmetric hashing. This means that Google Cloud guarantees that each connection will pass through the same NVAs. This is not true for global eXternal Load Balancers (XLBs) or Network Load Balancers (NLBs). In this case, you will not have any other choice but to source NAT your traffic as it goes through the NVAs.

Conclusions and next steps

If you made it this far, it’s likely you found the article helpful and interesting.

This architecture is becoming more and more common, so I wanted to help you avoid the headaches I had to go through.

As of this writing, NCC-RA is still the only product at my knowledge that allows you to build multi-regional enterprise architectures using NVAs on GCP without giving you headaches. I know it’s not straightforward at first, but once you master the concept, I think you’ll be glad you made the decision. The product is reliable and well-tested.

I’ll be sharing a code lab soon that should help you get a more concrete idea of what configuring such an environment means, at least from the GCP perspective.

Thanks for reading, and I hope you’ll stay tuned for more articles on GCP! 😎🤟

--

--

Luca Prete
Google Cloud - Community

Strategic Cloud Engineer (former Cloud Consultant) at Google. Deployment engineer, DevOps. Working on systems and networks. SDN specialist.