Skip to main content
Pentaho Documentation

Parquet Input

Parent article

The Parquet Input step decodes Parquet data formats and extracts fields using the schema defined in the Parquet source files. The Parquet Input and the Parquet Output transformation steps gather data from various sources and move that data into the Hadoop ecosystem in the Parquet format.

Before using the Parquet Input step, you must configure a named connection for your distribution, even if your Location is set to Local. For information named connections, see Connecting to a Hadoop cluster with the PDI client.

Select an Engine

You can run the Parquet Input step on the Pentaho engine or on the Spark engine. Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the Parquet Input step for your selected engine.

For instructions on selecting an engine from your transformation, see Run configurations.