Skip to main content
Pentaho Documentation

Hadoop File Input

Parent article

The Hadoop File Input step is used to read data from a variety of different text-file types stored on a Hadoop cluster. The most commonly used formats include comma separated values (CSV files) generated by spreadsheets and fixed-width flat files.

You can use this step to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept file names from a previous step.

Select an engine

You can run the Hadoop File Input step on the Pentaho engine or on the Spark engine. Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the Hadoop File Input step for your selected engine.

For instructions on selecting an engine for your transformation, see Run configurations.