The Unique Rows step removes duplicate rows from the input stream and filters only the unique rows as input data for the step.
Select an engine
You can run the Unique Rows step on the Pentaho engine or on the Spark engine.
The input stream must be sorted in a step prior to the Unique Rows step; otherwise, only consecutive double rows will be correctly analyzed and filtered. However, the rows do not have to be pre-sorted if you use the Unique Rows (HashSet) step, or if you use the Spark engine (Spark Engine) to run the transformation.
Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the Unique Rows step for your selected engine:
- Using the Unique Rows step on the Pentaho engine: Learn how to set up this step when using the Pentaho engine.
- Using the Unique Rows step on the Spark engine: Learn how to set up this step when using the Spark engine.
For instructions on selecting an engine for your transformation, see Run configurations