If you use BigQuery, especially for large amounts of data, you can optimize your work by partitioning your tables - that is, dividing them into smaller segments. This helps to improve query performance and reduce costs because each query would then read a lesser number of bytes.
BigQuery supports three types of partitioning:
-
Integer range partitioning: a table is divided based on ranges of values in a specific INTEGER column.
-
Time-unit partitioning: a table is partitioned on a DATE, TIMESTAMP, or DATETIME column.
-
Ingestion time partitioning: when a table partitioned by ingestion time is created, BigQuery automatically assigns rows to partitions based on the time it ingested the data.
One important limitation is that BigQuery only supports partitioning by a single column.
In this article, we’ll describe two ways to convert a non-partitioned table into a partitioned one: for integer range and time-unit partitioning and for ingestion time partitioning.
Creating an integer range or time-unit partitioned table
If you want to partition a table based on the integer range or time unit, the first step is pausing the connector in your dashboard.
When the connector is paused, go to the BiqQuery console and execute one of the following SQL statements, depending on the column you want to partition your table on:
- INTEGER:
CREATE TABLE [schema-name].copy
PARTITION BY range_bucket([integer-column], GENERATE_ARRAY([start],[end],[interval]))
AS SELECT * FROM [schema-name].[table-name];
DROP TABLE [schema-name].[table-name];
- DATE:
CREATE TABLE [schema-name].copy
PARTITION BY [date-column]
AS SELECT * FROM [schema-name].[table-name];
DROP TABLE [schema-name].[table-name];
- TIMESTAMP:
CREATE TABLE [schema-name].copy
PARTITION BY date([timestamp-column])
AS SELECT * FROM [schema-name].[table-name];
DROP TABLE [schema-name].[table-name];
- DATETIME:
CREATE TABLE [schema-name].copy
PARTITION BY [datetime-column]
AS SELECT * FROM [schema-name].[table-name];
DROP TABLE [schema-name].[table-name];
As a result, a partitioned table will be created and populated with data from the original table, which will then be dropped.
Next, open the BigQuery user interface in the console, go to the navigation bar, and select your project:
Next, select the dataset. In the dataset, select the copy table created before:
Then, rename it to the original table’s name and click ‘Copy':
In the ‘Table name’ field, enter the original table’s name and click ‘Copy’:
Finally, drop the copy table by running the following statement in the BigQuery console:
DROP TABLE [schema-name].copy;
Almost done! Now go to your dashboard and enable the connector again.
Creating an ingestion time partitioned table
To partition a table based on ingestion time, go to the BigQuery user interface in the console, and select your project in the navigation bar:
Next, find the dataset to be used as the destination or create a new dataset. Don’t forget to grant your service account the bigquery.dataOwner access for this dataset so it could create, delete, and update tables in it.
Next, click ‘+ Create table’ and retain the default values for the ‘Source’, ‘Destination’, and ‘Schema’ sections. In the ‘Table name’ field, enter the same table name as in your source.
Next, go to ‘Partition and cluster settings', select ‘Partition by ingestion time’, and click ‘Create table’:
That’s it, you’re all set!
Comments
0 comments
Please sign in to leave a comment.