The Avro Input step decodes binary or JSON Avro data and extracts fields from the structure it defines. Apache Avro is a data serialization system. This step extracts the data from an Avro file to be used in the PDI stream.
When using the Avro Input step with the Adaptive Execution Layer (AEL), the following factors affect performance and results:
- Spark processes null values differently than the Pentaho engine. You will need to adjust your transformation to successfully process null values according to Spark's processing rules.
- Metadata injection is not supported for steps running on AEL.
The following fields and button are general to this transformation step:
|Step name||Specifies the unique name of the Avro Input step on the canvas. You can customize the name or leave it as the default.|
Indicates the file system or specific cluster where the source file you want to input is located. For the supported file system types, see Virtual File System Browser.
The fully qualified URL of the source file name for the input fields.
|Preview||Display the rows generated by this step.|
The Avro Input transformation step features several tabs with fields. Each tab is described below.
The table in the Fields tab defines the following input fields from the Avro source:
|Avro path (Avro type)||
The location of the Avro source (and its format type).
|Name||The name of the input field.|
|Type||The type of the input field, such as ‘String’ or ‘Date’.|
After you have provided a path to an Avro data file or Avro schema, click Get Fields to populate the fields.
These fields represent the Avro schema. When the schema field is retrieved, the Avro type is converted to an appropriate PDI type. A user can change the PDI type. Below is the Avro-to-PDI data type conversion table.
|Avro Type||PDI Type|
The default format mask for the date type is yyyy-MM-dd. The default format mask for the timestamp type is yyyy-MM-dd HH:mm:ss.SSS. If the data stored is any other format, and was stored as a string data type, it will not be possible to retrieve the column data. In that case, null will be returned for that column.
This tab includes the following field to define the source for your Avro schema:
- File name: Specify the Avro schema file by entering its path as a fully qualified URL (file:///C:/avro-output-schema for example) or by clicking Browse. A separate schema file is not required. If you do not specify the schema file, PDI will attempt to retrieve the fields from the embedded schema in the Avro data file.