Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Transformation Executor

Parent article

The Transformation Executor step allows you to execute a Pentaho Data Integration (PDI) transformation. It is similar to the Job Executor step, but works with transformations.

Depending on your data transformation needs, the Transformation Executor step can be set up to function in any of the following ways:

  • By default, the specified transformation will be executed once for each input row. You can use the input row to set parameters and variables. The executor step then passes this row to the transformation in the form of a result row.
  • You can also pass a group of records based on the value in a field, so that when the field value changes dynamically, the specified transformation is executed. In these cases, the first row in the group of rows is used to set parameters or variables in the transformation.
    NoteThis function is not currently supported by the Adaptive Execution Layer.
  • You can launch multiple copies of this step to assist in parallel transformation processing.

AEL considerations

The Transformation Executor step can be used to run a sub-transformation, called a child transformation, with Spark on the Adaptive Execution Layer. When using AEL with the Transformation Executor step, note the following exceptions:

  • The Variable / Parameter to use field on the Parameters tab is not supported when the step is executed on Spark in AEL.
  • The execution results do not show the detailed log of the child transformation in AEL.
  • When you want to send the results from the child transformation back to the parent transformation as the output from the Transformation Executor step, you must enter information in the layout table in the Result rows tab. The layout information is required and the table cannot be left blank when running the step on Spark in AEL.

Error handling and parent transformation logging notes

Keep the following notes in mind when building transformations with the Transformation Executor step:

  • This step does not abort when the calling transformation errors out. To control the flow or to abort the transformation in case of errors, please specify the fields and a target step in the Execution results tab to log the number of errors.
  • During the actual implementation, the log of the parent transformation contains only the last processed batch of data. This method lessens the strain on the back-end logging. You can obtain a detailed log of the child transformation by viewing the execution results. Be sure to define a target step within the Execution results tab and view the field name of the execution logging text, by default, ExecutionLogText.

Samples

These sample transformations demonstrate the capabilities of this step. The samples are available in the distribution package and are located in the design-tools/data-integration/samples/transformations/transformation-executor folder.

  • trans-executor-child.ktr

    Adds a sequence to input rows.

  • trans-executor-parent.ktr

    Passes rows to a transformation which is then executed three times. You can preview the results, result files, and result rows steps to view the output.

Samples

General

Enter the following information in the transformation step fields.

Options Description
Step nameSpecifies the unique name of the transformation on the canvas. A transformation can be placed on the canvas several times; however, it represents the same transformation. The Step name is set to Transformation Executor by default.
Transformation

Specify your transformation to execute by entering its path or clicking Browse.

If you select a transformation that has the same root path as the current transformation, the variable ${Internal.Entry.Current.Directory} will automatically be inserted in place of the common root path. For example, if the current transformation's path is /home/admin/transformation.ktr and you select a transformation in the folder /home/admin/path/sub.ktr than the path will automatically be converted to ${Internal.Entry.Current.Directory}/path/sub.ktr.

If you are working with a repository, specify the name of the transformation. If you are not working with a repository, specify the XML file name of the transformation.

Transformations previously specified by reference are automatically converted to be specified by the transformation name within the Pentaho Repository.

Options

The Transformation Executor step features several tabs with fields for setting parameters and defining results. Each tab is described below.

Parameters tab

Transformation Executor step showing the Parameters tab

On this tab, you can define or pass variables and parameters to the transformation. If multiple rows are passed to the transformation, the first row is used to set the parameters or variables. Use the table to set a variable to a static value, to another variable's value, or to the value of a column from the input stream (for a single row).

For each variable or parameter name that is added to table, you must assign a value to that parameter in either the Variable / Parameter to use field or the Static input value field. You cannot use both a variable and a static value.

OptionDescription
Variable / Parameter name

Specify a unique name of the variable or parameter to pass to the transformation. For example, you can enter strings such as ParameterOne, ParameterTwo, etc.

This name must be unique. Otherwise, you may adversely affect your data.

Variable / Parameter to use The Variable / Parameter to use column is for setting the names of variables and/or parameters that are defined and passed to the child transformation.

For this column, enter a value using one of the following methods:

  • Select an incoming field using the drop-down menu.
  • Manually enter a variable name.
  • Use CTRLSPACE to select a value from a list of PDI environment variables.

Specify which field to set as a parameter or variable value. You can specify the variable using the ${} notation to automatically insert it in the child transformation instead of the field value it resolves to.

For example, if you enter the variable ${Internal.Kettle.Version} which resolves to Pentaho_9.0, then everywhere the specified parameter is referenced in the child transformation, the field variable is replaced with Pentaho_9.0.

For this column you have three ways to enter a value:

  • Select an incoming field using the drop-down menu.
  • Manually enter a variable name.
  • Use CTRLSPACE to select a value from a list of PDI environment variables.

If you specify a field or value in Variable / Parameter to use, the Static input value column is disabled.

When Variable / Parameter to use contains a valid value, the Inherit all variables from transformation check box does not affect the Variable/Parameter to use.

Static input value

Specify a static input value for the variable/parameter name.

For example, you may specify the Variable/Parameter name as ParameterOne and in the same row specify the Static input value as StaticInputValueOne. When the step runs the child transformation, the environment variable ${ParameterOne} resolves to StaticInputValueOne.

Entering a value in the Static input value field disables the Variable / Parameter to use field.

Inherit all variables from transformation (check box)

Use this check box to determine which variables take precedence when the Transformation Executor step is run.

Select this check box to include all the variables defined in the current transformation. When the step runs, the transformation processes:

  • the Parameter tab field values from the parent transformation,
  • then the Parameter tab values from the Transformation Executor step,
  • and finally the Parameter tab values from the child transformation.

NoteIf you specify a value in the Variable / Parameter to use field for a variable/parameter row, do not modify the value after selecting this check box. Otherwise, you may adversely affect your data.
Clear this check box to ignore the variables defined in the Parameters tab of the parent transformation. When the step runs, the transformation processes:
  • the Parameters tab values from the Transformation Executor step,
  • and then the Parameters tab values from the child transformation.

See Order of processing for more information.

Get Parameters (button)Click this option to insert all the defined parameters of the specified transformation. The description of the parameter is inserted into the static input value field.

Order of processing

The order of processing variables and parameters depends on if the transformation inherits all its variables from the transformation. Your selection for the Inherit all variables from transformation check box in the Parameters tab decides the processing order as described below:

  • When the Inherit all variables from transformation is selected, the processing order is:

Parent Transformation [Parameter] >> Transformation Executor [Variable/Parameter name] >> Child Transformation [Parameter]

  1. First, the parameters and variables defined in the Parameters tab in the parent transformation.
  2. Then the parameters and variables defined in the Parameters tab in the Transformation Executor step.
    NoteIf the variable name matches between the one defined in the parent transformation and the one defined in the Parameter tab of the Transformation Executor step, then the system will select the value defined in the Transformation Executor step.
  3. Finally, the parameters and variables defined in the Parameters tab in the child transformation.
    NoteIf the variable name matches between the one defined in the Parameter tab of the Transformation Executor step and the one defined in the child transformation, then the system will select the value defined in the child transformation.

  • When the Inherit all variables from transformation is cleared, the processing order is:

Transformation Executor [Variable/Parameter name] >> Child Transformation [Parameter]

  1. First, the parameters and variables defined in the Parameters tab in the Transformation Executor step
  2. Then the parameters and variables defined in the Parameters tab in the child transformation, such that any variables defined in the Parameters tab of the Transformation Executor step will pass to the child transformation.

Execution results tab

Execution results tab

You can define the result fields for the specified transformation and what target step to send them to. If you do not need a certain result, leave a blank input field.

OptionDescriptionDefault Value
Target step for the execution resultsUse the drop-down menu to select a step in the current transformation as the target step to receive the results from the specified transformation.N/A
Execution time (ms)Specify the field name for the transformation execution time.ExecutionTime
Execution resultSpecify the field name for the transformation execution result.ExecutionResult
Number of errorsSpecify the field name for the number of errors during the execution of the transformation.ExecutionNrErrors
Number of rows readSpecify the field name for the total number of rows read during the execution of the transformation.ExecutionLinesRead
Number of rows writtenSpecify the field name for the total number of rows written during the execution of the transformation.ExecutionLinesWritten
Number of rows inputSpecify the field name for the total number of input rows during the execution of the transformation.ExecutionLinesInput
Number of rows outputSpecify the field name for the total number of output rows during the execution of the transformation.ExecutionLinesOutput
Number of rows rejectedSpecify the field name for the total number of rows rejected during the execution of the transformation.ExecutionLinesRejected
Number of rows updatedSpecify the field name for the total number of rows updated during the execution of the transformation.ExecutionLinesUpdated
Number of rows deletedSpecify the field name for the total number of rows deleted during the execution of the transformation.ExecutionLinesDeleted
Number of files retrievedSpecify the field name for the total number of files retrieved during the execution of the transformation.ExecutionFilesRetrieved
Exit statusSpecify the field name for the exit status of the execution of the transformation.ExecutionExitStatus
Execution logging textSpecify the field name for the logging text from the execution of the transformation.ExecutionLogText
Log channel IDSpecify the field name for the log channel ID used during the execution of the transformation.ExecutionLogChannelID

Row grouping tab

Row grouping tab

Specify how to group the result rows by one of the following methods:

  • Specific number of rows.
  • Specific field.
  • Specified duration of time.

You can use the result rows in a transformation or job entry, or you can get the records themselves by using the Get rows from result step in a transformation.

To access the Field to group rows on or Duration time when collecting rows options, delete the default value in the Number of rows to send to transformation option.

OptionGroup
Number of rows to send to transformation Specify a number. After every N number of rows, the job will be executed and these N rows will be passed to the transformation.
Field to group rows on Specify a field for grouping rows. Rows will be collected in a group as long as the field value stays the same. If the value changes, the transformation will be executed and the accumulated rows will be passed to the transformation.
Duration time when collecting rowsSpecify a time in milliseconds. This value is the amount of time the step will spend collecting rows prior to the execution of the transformation.

Result rows tab

Result rows tab

In this tab, you can specify the destination of the result rows from the transformation execution as well as the layout of the expected result rows.

This step will verify that the data type of the result row fields are identical to what is specified. If there is a difference, an error message will display.

NoteWhen you want to send the results from the child transformation back to the parent transformation as the output from the Transformation Executor step, you must enter information in the layout table in the Result rows tab. The layout information is required and the table cannot be left blank when running the step on Spark in AEL.
OptionDescription
Target step for result rowsUse the drop-down menu to select a step in the current transformation as the target step.
Field nameSpecify the name of the field.
Data typeUse the drop-down menu to specify the data type of the field, such as number, date, or string.
LengthIf applicable, specify the length of the data type specified.
PrecisionIf applicable, specify the precision to use.

Result files tab

Result files tab

In this tab, you can specify the destination of the result files.

OptionDescription
Target step for result files informationUse the drop-down menu to select a step in the transformation as the target step.
Result file name fieldSpecify the name of the field for the result files.