Skip to main content
Pentaho Documentation

Running a Transformation

Overview

Explains how to run a transformation.

When you are ready to run your transformation, you can:

  • Click the Run icon on the toolbar.
  • Select Run from the Action menu.
  • Press F9.

When you do any of these actions, the Run Options window appears.

Run Options Window for Transformations

In the Run Options window, you can specify whether to run the transformation locally, on a remote server, or in a clustered environment.  You can also specify logging and other options, or experiment by passing temporary values for defined parameters and variables during each iterative run.

Always show dialog on run is set by default. You can deselect this option if you want to use the same run options every time you execute your transformation. After you have selected to not Always show dialog on run, you can access it again through the dropdown menu next to the Run icon in the toolbar, through the Action main menu, or by pressing F8.

Environment Type

Some ETL activities are lightweight, such as loading in a small text file to write out to a database or filtering a few rows to trim down your results. For these types of lightweight ETL activities, you can simply run your transformation locally in Spoon. Other ETL activities are more demanding, containing many steps calling other steps or a network of transformation modules. For these activities, you can set up a separate Data Integration (DI) Server dedicated for running transformations. For even greater scalability or if you need to reduce your execution times, you can cluster an environment of master and slaves servers for running your transformations. Such clustered environments might contain parallel processing of big data.

Choose an Environment Type based on your execution scenario:

Environment Type Description
Local Runs on your local machine.
Server Runs on a Data Integration (DI) Server. A Carte server environment must be set up for this option to be available. Use Carte Clusters to Run Transformations and Jobs describes how to set up this environment.
  • Server – Choose the DI Server to run your transformation. The list shows the available Carte server environments.
  • Send resources to this server – Send your transformation to the specified server before running it if you want the transformation to run locally on the server. Any related resources, such as other referenced files, are also included in the information sent to the server.
Settings (such as variables defined in the kettle.properties file) used to run your transformation are sent from your local environment to your DI Server for execution. If you want to run your transformation with the settings defined on the server, Use the Schedule Perspective to Schedule Transformations and Jobs.
Clustered Runs in a clustered environment. When you set up a clustered environment, you establish master and slave servers to run your file. A clustered environment must already be set up for this option to be available. Execute Transformations and Jobs on a Carte Cluster describes how to set up this environment.
  • Show Transformations – Another transformation is generated when you run your transformation in a clustered environment. Select this option to show this generated transformation upon execution.

Options

Errors, warnings, and other information generated as the transformation runs are stored in logs.  You can specify how much information is in a log and whether the log is cleared each time through the Options section of this window.  You can also enable safe mode and specify whether PDI should gather performance metrics. Logging and Monitoring Operations describes the logging methods available in PDI.

Option Description
Clear log before running Indicates whether to clear all your logs before you run your transformation. If your log is large, you might need to clear it before the next execution to conserve space.
Log level Specifies how much logging is needed. You can log from Nothing all the way to Rowlevel.
  • Nothing – No logging occurs
  • Error – Only errors are logged
  • Minimal – Only use minimal logging
  • Basic – This is the default level
  • Detailed – Give detailed logging output
  • Debug – For debugging purposes, very detailed output
  • Rowlevel – Logging at a row level, which generates a lot of log data

Debug and Rowlevel logging levels contain information you may consider too sensitive to be shown. Please consider the sensitivity of your data when selecting these logging levels.

Performance Monitoring and Logging describes how best to use these logging methods.

Enable safe mode Checks every row passed through your transformation and ensure all layouts are identical. If a row does not have the same layout as the first row, an error is generated and reported.
Gather performance metrics Monitors the performance of your transformation execution through these metrics. Using Performance Graphs shows how to visually analyze these metrics.

Parameters and Variables

You can temporarily modify parameters and variables for each execution of your transformation to experimentally determine their best values. The values you enter into these tables are only used when you run the transformation from the Run Options window.  The values you originally defined for these parameters and variables are not permanently changed by the values you specify in these tables.

Value Type Description
Parameters Set parameter values pertaining to your transformation during runtime. A parameter is a local variable. The parameters you define while creating your transformation are shown in the table under the Parameters tab.
  • Arguments – Set argument values passed to your transformation through the Arguments dialog.
Variables Set values for user-defined and environment variables pertaining to your transformation during runtime.