Skip to main content
Pentaho Documentation

MapReduce Output

Parent article

This step defines the key/value pairs for Hadoop output. The output of this step becomes the Hadoop output, and depends on how you have configured the transformation. This step may be included in a transformation used as a Mapper, Combiner, or Reducer.

When this step is included in a Mapper transformation type, and a combiner and/or reducer is configured, the output will be the input pairs for the combiner and/or the reducer. If no combiner or reducers are configured, the output is passed with the submitting Hadoop job’s format.

When this step is included in a Combiner transformation type, and a reducer is configured, the output will be the input pairs for the reducer. If no reducer is configured, the output is passed with the submitting Hadoop job’s format.

When this step is included in a Reducer transformation type, then the output is passed with the submitting Hadoop job’s format. The data type for the keys or values must be defined before this step.

AEL Considerations

When using the MapReduce Output step with the Adaptive Execution Layer, the following factor affects performance and results:

  • Spark processes null values differently than the Pentaho engine. You will need to adjust your transformation to successfully process null values according to Spark's processing rules.

Options

Enter the following information in the transformation step fields.

Option Description
Step nameSpecifies the unique name of the MapReduce Output step on the canvas. A MapReduce Output step can be placed on the canvas several times; however, it represents the same MapReduce Output step. You can customize the name or leave it as the default.
Key fieldThe Hadoop output field of the MapReduce key.
Value fieldThe Hadoop output field of the MapReduce value.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.