Skip to main content
Pentaho Documentation

Google BigQuery Loader

The Google BigQuery Loader job entry enables you to load data into Google BigQuery from a Google Cloud Storage account. The Google BigQuery Loader supports the following formats:

  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • Avro.

Before You Begin

You must have a Google account and must create service account credentials in the form of a key file in JSON format to use the Google BigQuery Loader job entry. You must also set permissions for your BigQuery and Google Cloud accounts. To create service account credentials, see https://cloud.google.com/storage/docs/authentication

Perform the following steps to set up your system to use Google BigQuery:

  1. Download the service account credentials file that you have created using the Google API Console to your local machine. 
  2. Create a new system environmental variable on your operating system named ‘GOOGLE_APPLICATION_CREDENTIALS‘.
  3. Set the path to the downloaded JSON service account credentials file as the value of the GOOGLE_APPLICATION_CREDENTIALS variable.
  4. Reboot your local machine.

The environment variable and credentials must be set up on each machine that runs the BigQuery Loader job. The Google BigQuery Loader dialog box will not open for editing in the job canvas until this procedure is completed.

General

Enter the following information in the job step fields:

  • Entry name Specifies the unique name of the Google BigQuery Loader step on the canvas. You can customize the name or leave it as the default.

Options

The Google BigQuery Loader job entry features two tabs with fields. Each tab is described below.

Setup Tab

PDI_.Google_BigQuery Loader_dlg_SetupTab.png

This tab includes the following fields:

Field Description
Storage source URL Specify the Google Cloud Storage URL of the data to import. The URL can point to a file or a folder in Google Cloud Storage. The URL must begin with gs:// and must specify the bucket and object you want to load. You must specify the file type in the File tab.
Dataset Specify the dataset where you want to import a table. The dataset drop-down is automatically populated when you select the Storage source URL, but you can enter a new dataset name in the field. If the dataset does not exist it will be created at runtime.
Table name Specify the table name in the dataset where you want to import data.
Column name Specify the column name in the datatset table where you want to import data.
Column type Specify the column type in the datatset table where you want to import data.
Overwrite existing table Select to overwrite existing data with imported data.

File Tab

PDI_.Google_BigQuery Loader_dlg_FileTab.png

Field Description
Type Specify the file type. The values are CSV (default), JSON, and Avro. You must specify the correct File type that is associated with the Storage source URL field in the Setup tab.
Leading rows to skip Specify how many rows of the CSV file to skip.
Delimiter Specify the delimiter character used by the CSV file.
Quote character Specify the escape (quote) character used for values that have the delimiter character in them. For example, when the delimiter character is a comma, and a field contains a comma, and the quote character is a backslash, inserting a backslash before the comma in the field will prevent that field from being evaluated as the beginning of a new field value.