Skip to main content
Pentaho Documentation

Working with Transformations

Overview

Explains how to create and save a transformation. Also explains how to build a job.

This section explains how to create and save a transformation.  

Create a Transformation

Follow these instructions to create your transformation.

  1. Click File > New > Transformation or hold down the CTRL+N keys.
  2. Go to the Design tab.  Expand the folders or use the Steps field to search for a specific steps.  
  3. Either drag a step to the Spoon canvas or double-click it.  
  4. Double-click the step to open its properties window.  For help on filling out the window, click the Help button that is available in each step.
  5. To add another step, either drag the step to the Spoon canvas or double-click it.  
  • If you drag the step to the canvas, you can add a hop by pressing the SHIFT key and drawing a hop from one step to the other.
  • If you double-click it, the step appears on the canvas with a hop already connected to your previous step.  
  1. When finished, save the transformation. See Save A Transformation Locally or Save a Transformation Remotely for more details.

Adjust Transformation Properties

You can adjust the parameters, logging options, dates, dependencies, monitoring, settings, and data services for transformations.  To view the transformation properties, click the CTRL+T or  right-click on the canvas and select Transformation settings from the menu that appears.

Save a Transformation Locally

Follow these instructions to save a transformation locally on your file system.

  1. In Spoon, select File > Save.
  2. Enter the transformation name in the Save As window and select the location.
  3. Click OK.  The transformation is saved.

Save a Transformation Remotely

Follow these instructions to save a transformation remotely on the Pentaho Server.

  1. Connect to a repository.
  2. In Spoon, click File > Save As. The Transformation Properties window appears.
  3. In the Transformation Name field, enter the transformation name.
  4. In the Directory field, click the Folder Icon to select a repository folder where you will save your transformation.
  5. Click OK to exit the Transformation Properties dialog box. The Enter Comment dialog box appears.
  6. Enter a comment, then click OK.  The transformation is saved.

Open a Transformation

Follow these instructions to open a transformation.

  1. In Spoon, select File > Open.
  2. If connected to a repository, select the file from the Select Repository Object window, then click OK. Otherwise, select the file from the Open window, then click OK.
  3. The transformation appears on the canvas.

If you recently had a file open, you can also use File > Open Recent.

If you get a message indicating that a plugin is missing, see the Troubleshooting Transformation Steps and Job Entries section for more details.

Using the Transformation Menu

Right-click any step in the transformation canvas to view the transformation menu.

Menu Item Description
New Hop Creates a new hop.
Edit ... Shows the configuration window for the step or transformation.
Description ... Allows you to add a description to the step.
Open Referenced Object Allows you to map a sub-transformation.  Mapping a sub-transformation is covered in detail in the Reusing Transformation Mapping Flows Between Steps.
Inspect Data

Lets you inspect the data stream of a step once the transformation has run.

This option is not available until you run your transformation.

Run and Inspect Data Runs your transformation, then lets you inspect the data of a step.
Data Movement Describes the way data moves through the transformation when there is more than one hop. There are three options:
  • Round Robin - Partitions the output stream and sends a portion of all output records down each hop.
  • Load Balance - checks the output row sets to see how much room is left in the buffer.  It selects the one that is most empty.  If the rows are distributed to steps that take very little processing time per row (or the exact same amount of time for each step to process a row), Load Balancing is identical to Round Robin. But if rows are sent down one path that takes a long time to process, such as Sort or Group By and down another path that processes rows more quickly, the "quick path" will likely have more rows sent to it, as it will empty its buffer before the "slow path" has a chance to empty its buffer.  This is typically used for clustered transformations, where the same processing occurs on different nodes. The row buffer is set, by default, to 100000.  To change the row buffer size, open the Transformation Settings window, then select Miscellaneous > Number of Rows.

  • Copy Data to Next Steps - Copies the data to subsequent steps.
Change Number of Copies to Start Starts several instances of a step in parallel. 
Copy Copies selected items to the clipboard.
Duplicate Makes a copy of the selected items, then pastes them to the canvas.
Delete Deletes selected items from the canvas.
Hide Hides the step or entry from the Spoon canvas.  Caution: if you hide the step or entry, you will need to open the transformation or job XML file and hand edit it to view it again.  For more details, see the troubleshooting section.
Detach Detach the step or entry from the transformation or job.
Input Fields Shows metadata, like the field name and type, for fields that come into the step. 
Output Fields Shows metadata, like the field name and type, for fields that go out of the step.
Sniff Test During Execution The sniff test displays data as it travels from one step to another in the stream.   To use this, right-click a step in the transformation as it runs and select Sniff Test During Execution.  There are three options in this menu:
  • Sniff test input rows - Shows the data inputted into the step.
  • Sniff test output rows - Shows the data outputted from the step.
  • Sniff test error handling - Shows error handling data.

For more information on how to use this tool, see the Sniff Test Tool article.

Check Selected Step(s) Checks transformation steps for problems that could interfere with successfully running the transformation.  Right-click the transformation step that you want to check and click Check Selected Step(s).  Warnings and errors appear in the Results of transformation checks window.     
Error Handling Indicates how to apply error handling for a step.  When this is selected, the Step error handling settings window appears.
Preview Allows you to preview the results of the transformation.  Launches the Transformation Debug Dialog
Align/Distribute Arranges steps or entries on the canvas so that they are aligned properly or distributed evenly.  This helps create a visually pleasing transformation or job that is easier to read and digest.

Align refers to where the steps or entries are permitted along the x (horizontal) or y (vertical) axis.   Distribute makes the horizontal and vertical spacing between steps or entries consistent.   Typically, you turn on the grid, then move the different steps or entries on the canvas so that they form some sort of pattern, like a straight or branching line.  Then you select steps or entries and apply the following options as needed.

  • Align Left - Positions all steps or entries so their left sides start on the same "x" (horizontal) coordinate as the left-most step or entry.  After applied, steps or entries are arranged in a straight vertical line.  No changes are made to the spaces between steps or entries.
  • Align Right -  Positions all steps or entries so their right sides start on the same "x" (horizontal) coordinate as the right-most step or entry.  After applied, steps or entries are arranged in a straight vertical line.  No changes are made to the spaces between steps or entries.
  • Align Top-  Positions all steps or entries so their top sides start on the same "y" (horizontal) coordinate as the step or entry positioned closest to the top of the canvas.  After applied, steps or entries are arranged in a straight horizontal line. No changes are made to the spaces between steps or entries.
  • Align Bottom-  Positions all steps or entries so their bottom sides start on the same "y" (horizontal) coordinate as the step or entry positioned closest to the bottom of the canvas.  After applied, steps or entries are arranged in a straight horizontal line. No changes are made to the spaces between steps or entries.
  • Distribute Horizontally - Positions all steps or entries so that they are evenly spaced horizontally.  After applied, steps or entries are arranged evenly.  No changes are made to the alignment.
  • Distribute Vertically - Positions all steps or entries so that they are evenly spaced vertically.  After applied, steps or entries are arranged evenly. No changes are made to the alignment.
  • Snap to Grid - Aligns steps or entries on the canvas to the grid.  If grid markers do not appear on the canvas, select Tools > Options > Look & Feel > Show Canvas Grid.  See Customize Spoon Options for more information on how to  customize Spoon.
Data Services Allows you to create, edit, delete, or test a Pentaho Data Service.  The Pentaho Data Service allows others to obtain the results of a transformation, even if the person does not have the Spoon or Pentaho Server installed.  The Pentaho Data Service is discussed in great detail in Use Pentaho Data Services.
Mapping … Provides a way for you to map target fields from the step to source columns in a database. When this option is clicked the Mapping window appears that contains these fields:
  • Source Fields lists the field names from the incoming stream.
  • Target Fields lists the column names in a target table.
  • Auto Target Selection automatically selects a matching table column if the target field is selected.
  • Auto Source Select automatically selects a matching target field if the table column is selected.
  • Add button allows you to move the mapped target and source information to the mappings grid.
  • Guess button makes mappings based on a computer algorithm. 
  • Hide assigned source fields and Hide assigned target fields removes mappings from the Source Fields and Target Fields lists those fields are added to the mapping grid.
  • Delete button removes mappings from the mapping grid so that they reappear in the Target Fields and Source Fields lists again.

When you click OK, the Mapping window closes and a Select / Rename Values step appears on the canvas.  (It is usually named after the step that right-clicked.)  The Select/Rename Values window contains the mappings.  If you weren't able to make mappings, the step still appears, but the properties are blank.

Partitions… Partitions split data into subsets according to a rule that is applied on a row of data.  Partitions are discussed in detail in the Partitioning Data article. 
Clusters … Clusters allow you to create Carte Clusters.  For more information, see Using Carte Clusters

Model

Generates a model of the data in your transformation.  The data should have a dimension or a measure.  Right-click on a step or entry that has data that can be modeled, such as the Table Output step.  The data appears in Pentaho Metadata Editor.
Visualize Generates a visualization of the data in your transformation.  Right-click on a step or entry that has data that can be visualized, such as the Table Output step.   Two sub-options appear when this menu option is selected:
  • Analyzer - Visualizes the data in Analyzer, which is an analytic visualization tool that allows you to filter and drill down into your data.  For more information on Analyzer, see Get Started with Analyzer Reports and Use Pentaho Analyzer.
  • Report Wizard - Starts the Report Design Wizard, which configures a visualization of the data in Pentaho Report Designer.  Report Designer is a report creation tool that allows you to create highly detailed, print-quality reports.  For more information on Pentaho Report Designer, see Getting Started with Report Designer and Design Print-Quality Reports with Report Designer.

Run Your Transformation

See Running a Transformation for instruction on how to execute a transformation. If you are using a Hadoop cluster for big data transformations, see Adaptive Execution Layer (AEL) for how to use AEL to run your transformations with a Spark engine.