Skip to main content
Pentaho Documentation

Work with Transformations

In the PDI client (Spoon), you can develop transformations, which are data workflows representing your ETL activities. The steps used in your transformations define the individual ETL activities (building blocks). The transformations containing your steps are stored in .ktr files. You can access these .ktr files through the PDI client.

Open a Transformation

The way you open an existing transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely accessing your file on the Pentaho Server. Another option is to open a transformation using HTTP with the Visual File System (VFS) Browser.

If you get a message indicating that a plugin is missing, see the Troubleshooting Transformation Steps and Job Entries section for more details.

If you recently had a file open, you can also use File > Open Recent.

On Your Local Machine

Follow these instructions to open a transformation on your local machine.

  1. In the PDI client, perform one of the following actions:
  • Select File > Open.
  • Click the Open file icon in the toolbar.
  • Hold down the CTRL+O keys.
  1. Select the file from the Open window, then click Open.

The Open window closes when your transformation appears in the canvas.

In the Pentaho Repository

Follow these instructions to access a transformation in the Pentaho Repository.

  1. Make sure you are connected to a repository.
  2. In the PDI client, perform one of the following actions to access the Open repository browser window:
  • Select File > Open.
  • Click the Open file icon in the toolbar.
  • Hold down the CTRL+O keys.
  1. If you recently opened a file, use Recents to navigate to your transformation.
  2. Use either the search box to find your transformation, or use the left panel to navigate to a repository folder containing your transformation.
  3. Perform one of the following actions:
  • Double-click on your transformation.
  • Select it and press the Enter key.
  • Select it and click Open.

The Open window closes when your transformation appears in the canvas.

If you select a folder or file in the Open window, you can click on it again to rename it.

With the VFS Browser

Select File > Open URL to access files using HTTP with the VFS browser. The URL you specify identifies the protocol to use in the browser.

Learn more

Save a Transformation

The way you save a transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely saving your file on the Pentaho Server.

On Your Local Machine

Follow these instructions to save a transformation on your local machine.

  1. In the PDI client, perform one of the following actions:
  • Select File > Save.
  • Click the Save current file icon in the toolbar.
  • Hold down the CTRL+S keys.

If you are saving your transformation for the first time, the Save As window appears.

  1. Specify the transformation's name in the Save As window and select the location.
  2. Either press the Enter key or click Save. The transformation is saved.

The Save window closes when your transformation is saved.

In the Pentaho Repository

Follow these instructions to save a transformation to the Pentaho Repository.

  1. Make sure you are connected to a repository.
  2. In the PDI client, perform one of the following actions:
  • Click File > Save.
  • Click the Save current file icon in the toolbar.
  • Hold down the CTRL+S keys.

If you are saving your transformation for the first time, the Save repository browser window appears.

  1. Navigate to the repository folder where you want to save your transformation.
  2. Specify the transformation's name in the File name field.
  3. Either press the Enter key or click Save.

The Save window closes when your transformation is saved.

Run Your Transformation

After creating a transformation as a network of steps (a data workflow) that performs your ETL tasks, you should run it in the PDI client to test how it performs in various scenarios. With the Run Options window, you can apply and adjust different run configurations, options, parameters, and variables. By defining multiple run configurations, you have a choice of running your transformation locally, on a server using the Pentaho engine, or running your transformation using the Spark engine in a Hadoop cluster.

When you are ready to run your transformation, you can perform any of the following actions to access the Run Options window:

  • Click the Run icon on the toolbar.

  • Select Run from the Action menu.
  • Press F9.

The Run Options window appears.

Run Options Window in the PDI Client

In the Run Options window, you can specify a Run configuration to define whether the transformation runs on the Pentaho engine or a Spark client. If you choose the Pentaho engine, you can run the transformation locally, on the Pentaho Server, or on a slave (remote) server. To set up run configurations, see Run Configurations.

The default Pentaho local configuration runs the transformation using the Pentaho engine on your local machine. You cannot edit this default configuration.

The Run Options window also lets you specify logging and other options, or experiment by passing temporary values for defined parameters and variables during each iterative run.

Always show dialog on run is set by default. You can deselect this option if you want to use the same run options every time you execute your transformation. After you have selected to not Always show dialog on run, you can access it again through the dropdown menu next to the Run icon in the toolbar, through the Action main menu, or by pressing F8.

After running your transformation, you can use the Execution Panel to analyze the results.

Run Configurations 

Some ETL activities are lightweight, such as loading in a small text file to write out to a database or filtering a few rows to trim down your results. For these activities, you can run your transformation locally using the default Pentaho engine. Some ETL activities are more demanding, containing many steps calling other steps or a network of transformation modules. For these activities, you can set up a separate Pentaho Server dedicated for running transformations using the Pentaho engine. Other ETL activities involve large amounts of data on network clusters requiring greater scalability and reduced execution times. For these activities, you can run your transformation using the Spark engine in a Hadoop cluster.

Run configurations allow you to select when to use either the Pentaho (Kettle) or Spark engine. You can create or edit these configurations through the Run configurations folder in the View tab as shown below:

Run Configurations Folder in the View Tab of the PDI Client Explore Pane

To create a new run configuration, right-click on the Run Configurations folder and select New. To edit or delete a run configuration, right-click on an existing configuration.

Pentaho local is the default run configuration. It runs transformations with the Pentaho engine on your local machine. You cannot edit this default configuration.

Selecting New or Edit opens the Run configuration dialog box that contains the following fields:

Field Description
Name Specify the name of the run configuration.
Description Optionally, specify details of your configuration.
Engine Select the type of engine for running a transformation. You can run a transformation with either a Pentaho or a Spark engine. The fields displayed in the Settings section of the dialog box depend on which engine you select.

Select an Engine 

You can select from the following two engines:

  • Pentaho Engine: runs transformations in the default Pentaho (Kettle) environment.
  • Spark Engine: runs big data transformations through the Adaptive Execution Layer (AEL). AEL builds transformation definitions for Spark, which moves execution directly to your Hadoop cluster, leveraging Spark’s ability to coordinate large amount of data over multiple nodes. See Adaptive Execution Layer for details on how AEL works.
Pentaho Engine 

PDI Client Pentaho Engine Run Configuration Option

The Settings section of the Run configuration dialog box contains the following options when Pentaho is selected as the Engine for running a transformation:

Option Description
Local Select this option to use the Pentaho engine to run a transformation on your local machine.
Pentaho server

Select this option to run your transformation on the Pentaho Server. This option only appears if you are connected to a Pentaho Repository.

Slave server Select this option to send your transformation to a slave (remote) server or Carte cluster.
Location

If you select Slave server, specify its location.

If you have set up a Carte cluster, you can specify Clustered. See Using Carte Clusters for more details.

Send resources to the server If you specified a slave server for your remote Location, select to send your transformation to the specified server before running it. Select this option to run the transformation locally on the server. Any related resources, such as other referenced files, are also included in the information sent to the server.
Log remote execution locally If you specified Clustered for your remote Location, select to show the logs from the cluster nodes.
Show transformations If you specified Clustered for your remote Location, select to show the other transformations that are generated when you run on a cluster.
Spark Engine 

PDI Client Spark Engine Run Configuration Option

The Settings section of the Run configuration dialog box contains the following options when Spark is selected as the Engine for running a transformation:

Option Description
Protocol Select whether the protocol of the Spark host URL uses standard HTTP or HTTPS (SSL encryption).
Spark host URL Specify the address and port for the AEL daemon.

Refer to your Pentaho or IT administrator as to which Protocol to use and the value for Spark host URL. Your administrator must set up the Adaptive Execution Layer (AEL) before you can use the Spark engine.

See Troubleshooting if issues occur while trying to use the Spark engine.

Options 

Errors, warnings, and other information generated as the transformation runs are stored in logs. You can specify how much information is in a log and whether the log is cleared each time through the Options section of this window. You can also enable safe mode and specify whether PDI should gather performance metrics. Logging and Monitoring Operations describes the logging methods available in PDI.

Option Description
Clear log before running Indicates whether to clear all your logs before you run your transformation. If your log is large, you might need to clear it before the next execution to conserve space.
Log level Specifies how much logging is performed and the amount of information captured:
  • Nothing – No logging occurs
  • Error – Only errors are logged
  • Minimal – Only use minimal logging
  • Basic – This is the default level
  • Detailed – Give detailed logging output
  • Debug – For debugging purposes, very detailed output
  • Row Level – Logging at a row level, which generates a lot of log data

Debug and Row Level logging levels contain information you may consider too sensitive to be shown. Please consider the sensitivity of your data when selecting these logging levels. Performance Monitoring and Logging describes how best to use these logging methods.

Enable safe mode Checks every row passed through your transformation and ensure all layouts are identical. If a row does not have the same layout as the first row, an error is generated and reported.
Gather performance metrics Monitors the performance of your transformation execution through these metrics. Using Performance Graphs shows how to visually analyze these metrics.

Parameters and Variables 

You can temporarily modify parameters and variables for each execution of your transformation to experimentally determine their best values. The values you enter into these tables are only used when you run the transformation from the Run Options window. The values you originally defined for these parameters and variables are not permanently changed by the values you specify in these tables.

Value Type Description
Parameters Set parameter values pertaining to your transformation during runtime. A parameter is a local variable. The parameters you define while creating your transformation are shown in the table under the Parameters tab.
  • Arguments – Set argument values passed to your transformation through the Arguments dialog.
Variables Set values for user-defined and environment variables pertaining to your transformation during runtime.

Adjust Transformation Properties

You can adjust the parameters, logging options, dates, dependencies, monitoring, settings, and data services for transformations. To view the transformation properties, click the CTRL+T or right-click on the canvas and select Properties from the menu that appears.

Use the Transformation Menu

Right-click any step in the transformation canvas to view the Transformation menu.

Learn more

Stop Your Transformation

There are two different methods you can use to stop transformations running in the PDI client. The method you use depends on the processing requirements of your ETL task. Most transformations can be stopped immediately without concern. However, since some transformations are ingesting records using messaging or streaming data, such incoming data may need to be stopped safely so that the potential for data loss is avoided.

To stop a transformation running in the PDI client:

  • Use Stop if your ETL task should stop processing all data immediately.
  • Use Stop input processing if your ETL task needs to finish any records already initiated or retrieved before stopping.