Skip to main content
Pentaho Documentation

Run Your Transformation

Overview

Explains how to run your transformation.

Pentaho Data Integration provides a number of deployment options depending on the needs of your ETL project in terms of performance, batch load window, and your other needs. The three most common approaches are:

Approach Description
Local execution Allows you to execute a transformation or job from within the Spoon design environment on your local machine. This is ideal for designing and testing transformations or lightweight ETL activities
Execute remotely For more demanding ETL activities, consider setting up a dedicated Enterprise Edition Data Integration Server and using the Execute remotely option in the run dialog. The Enterprise Edition Data Integration Server also enables you to schedule execution in the future or on a recurring basis.
Execute clustered For even greater scalability or as an option to reduce your execution times, Pentaho Data Integration also supports the notion of clustered execution allowing you to distribute the load across a number of data integration servers.

This final part of the creating a transformation exercise focuses exclusively on the local execution option.

  1. In the Spoon  window, select Action > Run This Transformation.
  2. The Execute a transformation window appears. You can run a transformation locally, remotely, or in a clustered environment. For the purposes of this exercise, keep the default as Local Execution.
  3. Click Launch. The transformation executes. Upon running the transformation, the Execution Results panel opens below the canvas.
  4. The Execution Results section of the window contains several different tabs that help you to see how the transformation executed, pinpoint errors, and monitor performance.

  • Step Metrics tab provides statistics for each step in your transformation including how many records were read, written, caused an error, processing speed (rows per second) and more.  This tab also indicates whether an error occured in a transformation step.  We did not intentionally put any errors in this tutorial so it should run correctly.  But, if a mistake had occurred, steps that caused the transformation to fail would be highlighed in red.  In the example below, the Lookup Missing Zips step caused an error.

Executing Transformation.png

 

  • The Logging tab displays the logging details for the most recent execution of the transformation. It also allows you to drill deeper to determine where errors occur.  Error lines are highlighted in red.  In the example below, the Lookup Missing Zips step caused an error because it attempted to lookup values on a field called POSTALCODE2, which did not exist in the lookup stream. 

File:/pdi_execution_resutls_logging.pngExecuteTransformationLogging.png

 

  • The Execution History tab provides you access to the Step Metrics and log information from previous executions of the transformation. This feature works only if you have configured your transformation to log to a database through the Logging tab of the Transformation Settings dialog. For more information on configuring logging or viewing the execution history, see Create DI Solutions.
  • The Performance Graph allows you to analyze the performance of steps based on a variety of metrics including how many records were read, written, caused an error, processing speed (rows per second) and more.  Like the Execution History, this feature requires you to configure your transformation to log to a database through the Logging tab of the Transformation Settings dialog box.
  • The Metrics tab allows you to see a Gantt charter after the transformation or job has run. This shows you information such as how long it takes to connect to a database, how much time is spent executing a SQL query, or how long it takes to load a transformation.

metrics.png

  • The Preview Data tab displays a preview of the data.