The ORC Input step reads the fields data from an Apache ORC (Optimized Row Columnar) file into the PDI data stream.
Before using the ORC Input step, you must configure a named connection for your distribution, even if you set your Location to Local. For information on named connections, see Set up the Pentaho Server to connect to a Hadoop cluster.
Select an engine
You can run the ORC Input step on the Pentaho engine or on the Spark engine. Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the ORC Input step for your selected engine.
- Using the ORC Input step on the Pentaho engine: Learn how to set up this step when using the Pentaho engine.
- Using the ORC Input step on the Spark engine: Learn how to set up this step when using the Spark engine.
For instructions on selecting an engine for your transformation, see Run configurations