Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Google BigQuery Loader

Parent article

The Google BigQuery Loader job entry enables you to load data into Google BigQuery from a Google Cloud Storage account.

The Google BigQuery Loader supports the following formats:

  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • Avro

Before you begin

You must have a Google account and must create service account credentials in the form of a key file in JSON format to use the Google BigQuery Loader job entry. You must also set permissions for your BigQuery and Google Cloud accounts. To create service account credentials, see https://cloud.google.com/storage/docs/authentication.

Perform the following steps to set up your system to use Google BigQuery:

Procedure

  1. Download the service account credentials file that you have created using the Google API Console to your local machine.

  2. Create a new system environmental variable on your operating system named GOOGLE_APPLICATION_CREDENTIALS.

  3. Set the path to the downloaded JSON service account credentials file as the value of the GOOGLE_APPLICATION_CREDENTIALS variable.

  4. Reboot your local machine.

Results

NoteThe environment variable and credentials must be set up on each machine that runs the BigQuery Loader job. The Google BigQuery Loader dialog box will not open for editing in the job canvas until this procedure is completed.

General

Enter the following information in the job entry field:

  • Entry name: Specifies the unique name of the Google BigQuery Loader entry on the canvas. You can customize the name or leave it as the default.

Options

The Google BigQuery Loader job entry features two tabs with fields. Each tab is described below.

Setup tab

Setup tab, Google BigQuery Loader

This tab includes the following fields:

FieldDescription
Storage source URLSpecify the Google Cloud Storage URL of the data to import. The URL can point to a file or a folder in Google Cloud Storage. The URL must begin with gs:// and must specify the bucket and object you want to load. You must specify the file type in the File tab.
DatasetSpecify the dataset where you want to import a table. The dataset drop-down is automatically populated when you select the Storage source URL, but you can enter a new dataset name in the field. If the dataset does not exist it will be created at runtime.
Table nameSpecify the table name in the dataset where you want to import data.
Column nameSpecify the column name in the datatset table where you want to import data.
Column typeSpecify the column type in the datatset table where you want to import data.
Overwrite existing tableSelect to overwrite existing data with imported data.

File tab

File tab, Google BigQuery Loader

FieldDescription
TypeSpecify the file type. The values are CSV (default), JSON, and Avro. You must specify the correct File type that is associated with the Storage source URL field in the Setup tab.
Leading rows to skipSpecify how many rows of the CSV file to skip.
DelimiterSpecify the delimiter character used by the CSV file.
Quote characterSpecify the escape (quote) character used for values that have the delimiter character in them. For example, when the delimiter character is a comma, and a field contains a comma, and the quote character is a backslash, inserting a backslash before the comma in the field will prevent that field from being evaluated as the beginning of a new field value.