Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

JSON Input

Parent article

This step reads data from JSON structures, files, or incoming fields using a JSONPath expression to extract data and output rows. JSONPath expressions can use either dot notation or square bracket notation.

General

Enter the following information in the transformation step name field:

  • Step name: Specifies the unique name of the JSON input transformation step on the canvas. You can customize the name or leave it as the default.

You can use Preview rows to display the rows generated by this step. The JSON input step determines what rows to input based on the information you provide in the option tabs. This preview function helps you to decide if the information provided accurately models the rows you are trying to retrieve.

Options

The JSON input step features several tabs with fields. Each tab is described below.

File tab

JSON input step

Use the File tab to enter the following connection information for your source:

Option Description
Source is from a previous step

Select to retrieve the source from a previously defined field. When selected, the following fields are available:

  • Select field
  • Use field as file names
  • Read source as URL
  • Do not pass field downstream

When this option is cleared, the following fields are available:

  • File or directory
  • Regular Expression
  • Exclude Regular Expression
  • Selected files
Select fieldSpecify the field name to use as a source from a previous step.
Use field as file namesSelect to indicate the source is a filename.
Read source as URLSelect to indicate if the source should be accessed as a URL.
Do not pass field downstreamSelect to remove the source field from the output stream. This action improves performance and memory utilization with large JSON fields.
File or directorySpecify the source location if the source is not defined in a field. Click Browse to navigate to your source file or directory. Click Add to include the source in the Selected files table.
Regular expressionSpecify a regular expression to match filenames within a specified directory.
Exclude regular expressionSpecify a regular expression to exclude filenames within a specified directory.

Selected files table

The Selected files table shows files or directories to use as source locations for input. This table is populated by clicking Add after you specify a File or directory. The JSON input step tries to connect to the specified file or directory when you click Add to include it into the table.

The table contains the following columns:

ColumnDescription
File/DirectoryThe source location indicated by clicking Add after specifying it in File or directory.
Wildcard (RegExp)Wildcards as specified in Regular expression.
Exclude wildcardExcluded wildcards as specified in Exclude regular expression.
RequiredRequired source location for input.
Include subfoldersWhether subfolders are included within the source location.

Click Delete to remove a source from the table. Click Edit to remove a source from the table and return it back to the File or directory option.

Use Show filename(s) to display the file names of the sources successfully connected to the JSON Input step.

Content tab

Content tab

The Content tab contains the following options for configuring which data to retrieve:

OptionDescription
Ignore empty fileSelect to skip empty files. When cleared, empty files will cause the process to fail and stop.
Do not raise an error if no filesSelect to continue when no files are available to process.
Ignore missing pathSelect to continue processing files when an error occurs that (1) no fields match the JSON path or (2) that all the values are null. When cleared, no further rows are processed when an error occurs.
Default path leaf to nullSelect to return a null value for missing paths.
LimitSpecify a limit on the number of records generated from the step. Results are not limited when set to zero.
Include filename in outputSelect to add a string field with the filename in the result.
Rownum in outputSelect to add an integer field with the row number in the result.
Add filenames to resultSelect to add processed files to the result file list.

Fields tab

Fields tab

The Fields tab displays field definitions to extract values from the JSON structure. The table in this tab contain the following columns:

ColumnDescription
NameName of field that maps to the corresponding field in the JSON input stream.
PathComplete path of the field name in the JSON input stream. All records can be retrieved by adding the asterisk * in the path. For example, $.mydata.*
TypeData type of the input field.
FormatAn optional mask for converting the format of the original field. See Common Formats for information on common valid date and numeric formats you can use in this step.
LengthLength of the field.
PrecisionNumber of floating point digits for number-type fields.
CurrencyCurrency symbol ($ or , for example).
DecimalA decimal point can be a . (5,000.00 for example) or , (5.000,00 for example).
GroupA grouping can be a , (10,000.00 for example) or . (5.000,00 for example).
Trim typeThe trim method to apply to a string.
RepeatThe corresponding value from the last row repeated if a row is empty.

Click Get Fields to have the step populate the table with fields derived from the source file. All fields identified by this step will be added to the table.

See Understanding PDI data types and field metadata to maximize the efficiency of your transformation and job results.

Select fields

Click Get Fields in the Fields tab to open the Select Fields window. Select the checkbox next to each field in your source file that you want include in your output. All the fields selected in this step are added to the table. You can search for a field name by entering the field name in the Search box.

Select fields

Additional output fields tab

Additional output fields

The Additional output fields tab contains the following options to specify additional information about the file to process:

OptionDescription
Short filename fieldSpecify the field that contains the filename without path information but with an extension.
Extension fieldSpecify the field that contains the extension of the filename.
Path fieldSpecify the field that contains the path in operating system format.
Size fieldSpecify the field that contains the size of the data.
Is hidden fieldSpecify the field indicating if the file is hidden or not (Boolean).
Last modification fieldSpecify the field indicating the date of the last time the file was modified.
Uri fieldSpecify the field that contains the URI.
Root uri fieldSpecify the field that contains only the root part of the URI.

Examples

The design-tools/data-integration/samples/transformations directory contains the following example KTRs that show how to use this step:

  • JsonInput - read a dynamic file.ktr
  • JsonInput - read a file.ktr
  • JsonInput - read incoming stream.ktr

A sample JSON file that displays the Select Field dialog box is data-integration/samples/transformations/files/JSON - read nested fields.js.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.