Let’s cost optimization with new BigQuery pricing

pict3
6 min readApr 8, 2023

New BigQuery pricing has unveiled at Google Data Cloud & AI Summit today!

That can help our cost if you use effective.

It is not only heavy users but also light users or beginners. So I’m going to describe the review of basically knowledge for BigQuery. Also, I’ll show how to estimate and the change of the cost with the use of actual dataset which using now.

Review of pricing structure

First of all, I describe current pricing structure of BigQuery.

The architecture of BigQuery is separated for storage and computing. Accordingly, The price of BigQuery is that summed up to storage pricing and analysis pricing.

Storage pricing is the cost for data-size which stored on BigQuery. Data is classified automatically for “Active Storage” and “Long-term Storage”. Active Storage contains tables whose data changed in 90days. Otherwise Long-term Storage. Long-term Storage cost is cheeper than Active one.

Analysis pricing has two types. On-demand costs by data quantity which scanned. Flat Rate costs by your reserved slots (virtual CPU). You can scan any data no additional cost within the reserved slots.

Today’s update contains both of Storage pricing and Analysis pricing.

The change of Storage pricing

The established storage pricing costs by logical data size.

But actually, BigQuery is storing the data by compressed. Compressed data size is much smaller. It called physical data size.

Due to this change, the storage costs by physical data size from logical data size. The unit price is rise a bit, but size will be much smaller. So I think general storage cost will be down.

This is the Previewed function that launched last year, this time it will be GA as the part of BigQuery Editions.

According to this change, Time Travel storage cost and Fail Safe storage cost will add as Active Storage.

Time travel can rollback data to optional time in Time Travel Window(default 7days). Fail Safe is the back-up of Time Travel.

Those cost is not big if the data change or delete not so much. Time Travel Window can customize, so you can tuning the cost of it.

You can use physical storage cost, if the Organization meets as below. To sum up, not using Flat Rate.

  • Edditions only
  • On-demand only
  • Edditions and On-demand

This change effects for only cost. All data has been compressed already. So this change can’t cause performance effect.

The change of Analyze pricing

Analyze pricing also has big change!

First, On-demand price will rise up 25%.Don’t be negative! The main is after this.

Flat Rate will be closed down. Edditions plan will launch instead of that. I believe this plan is a loss leader!

Editions has three types, Standard, Enterprise and Enterprise Plus.

Former Flat Rate is not meet cost for small environment, because monthly cost is not inexpensive. For not constant user, there is the minimum-term plan called Flex Slots. It needs to operation for reservation and assign, so it’s not easy to make substantial use of it.

Editions make it easier for not constant user. therefore Editions has long-term commit plan. You can use BigQuery lower cost using them.

I’m going to describe overwhelming cost-optimization with Editions, later.

I describe about BigQuery AutoScaler that is a function of auto-scaling. You can use BigQuery AutoScaler if you use any plan Editions.

BigQuery AutoScaler

BigQuery AutoScaler equivalent to Slots AutoScalling which has ben released as Pre-GA.

As I described as above, Flat Rate is a plan not been cost for scan volume for using reserved slots. Depend on use-case, Flat Rate can realize big cost optimization. But the use-case for batch process which using at only specific time-span, it is hard to use. If you reserve slots for peek time, it is so expensive. Though, if you reserve limited slots, you should have to wait for long time.

BigQuery AutoScaler helps this use-case.

It can keep performance and cost thanks for allocating slot dynamically, even under large variations in scan quantity use-case.

This is an example. The blue polygonal line graph is slots usage.

Autoscaler make slots scale be elastic as usage. The minimum unit is just only 100 slot.

Scale-up needs time rug for warming up. The Baseline function is to handle that. If you set a Baseline, you can keep specific spec. In above graph use-case, the green broken line is a candidate of Baseline.

On the other hand, if you use BigQuery in a limited range of time, you would be better set the Baseline zero.

Max Number Slots is also powerful function. I’ve heard that involuntary scan caused burst BigQuery cost. If you set Max Number Slots, you can avoid burst cost in exchange for scan time. In above graph use-case, the orange line is a candidate of Max Number Slots.

Estimate of cost

Then, Let’s estimate real environment as former and new pricing, and compare them.

Cost of Storage

At first, about cost of storage.

You can get usage of storage by “INFORMATION_SCHEMA.TABLE_STORAGE_BY_*”.

The query and result which I run are shown below. If you use this, you should change the price for your region.

“active_physical_bytes” contains the bytes that used for Time Travel, so you don’t need consider it. I couldn’t find information of Fail Safe. There is possibility of adding the cost for Fail Safe, but it’ll be not so much. I guess it’ll be similar to Time Travel bytes.

What!! the cost of storage will be of one tenth or less!!

The usage of storage is not so large, the price impact is not big. But in the large usage case, this price change will help our cost.

Cost of computing

I estimated computing cost as on-demand model one month from now .

The query which I run are shown below. If you use this, you should change the price for your region.

The result as below. It is similar to that I can see on Billing dashboard.

Then, how Editions effects this cost?

Let’s check the usage of slots.

AutoScale expand slots by 100 slots and time interval is about a minute. So, I wrote query as below.

I found some irregular scan, but almost of scans are regular scan and not too long time.

So I consider it is better to set Baseline zero.

(I didn’t zero padding, polygonal line graph is not correct.)

I used 6,443,900/slot msec in a month.

As Standard Eddition in Tokyo region($0.051/slot hour), The Computing cost will be just only $0.091!!

6,443,900 / (1,000 * 60 * 60) *0.051 = $0.091

In actuality, warm-up needs time, and scale-down slower than scale-up. This estimate is not correct. But BigQuery Edditions has potential for big cost optimisation.

Summary

For most cases, new storage price will make our storage cost be down. And more, Edditions’s function BigQuery AutoScaler has possibility of a substantial markdown.

In my case, the results of cost estimate is as below.

Storage cost: $1.82 → $0.17 (about 90% down)

Analysis cost: $64.94 → $0.091 (about 99.9% down)

Go on! The total, 99.6% markdown!!

It depends on use-cases. Some cases are fit to former On-Demand model can fit your use-case, but there are any more effective case.

I strongly recommend consider this chance!

This time, I wrote about BigQuery. All Google Cloud services update quite often. So, catch up them, and Let’s continue optimize your cost!

Reference

Google Data Cloud & AI Summit

BigQuery overview

BigQuery pricing

Forcast storage billing

--

--