Skip to main content
Pentaho Documentation

Memory Group By

The Memory Group By step groups rows in memory from a source step. The resulting rows are grouped based on a specified field or collection of fields. A new row is generated for each group. This step differs from the Group By step by processing all rows in memory, and is designed to handle non-sorted input. If the number of rows you want to group is too large to fit into memory, you must use a combination of the Sort Rows and Group By steps.

General

Enter the information in the transformation step name field as shown in the following table:

Option Definition
Step Name Specifiy the unique name of the Memory Group by step on the canvas. You can customize the name or leave it as the default.
Always give back a result row Select to return a result row, even when there is no input row.  If no input rows exist, this option returns a count of zero (0).

 

PDI_Memory_Group_By_dlg.png

The Fields that Make up the Group Table

Use this table to specify the fields you want to group. Click Get Fields to add all fields from the PDI input stream. Right-click a row in the table to edit that single row or all rows in the table. 

Leave this table blank to calculate aggregate functions over the entire dataset.

Aggregates Table 

Use this table to specify the group you want to aggregate, the aggregation method, and the name of the resulting new field. Click Get lookup fields to specify the fields to include in a group. See the Group by step for more details.

Column Description
Name The name of the aggregate field.
Subject The subject on which you want to use an aggregation method.
Type

The type of aggregation method for grouping the rows. The following methods are available:

  • Sum
  • Average (Mean)
  • Median
  • Percentile
  • Minimum
  • Maximum
  • Number of values (N)
  • Concatenate strings separated by , (comma)
  • First non-null value
  • Last non-null value
  • First value (including null)
  • Last value (including null)
  • Standard deviation
  • Concatenate strings that are separated by the character specified in the Value column
  • Number of distinct values 
  • Number of rows (without field argument).
Value The aggregate value.

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.