Pentaho Data Integration (PDI) provides you with several methods in which to monitor the performance of jobs and transformations. Logging offers you summarized information regarding a job or transformation such as the number of records inserted and the total elapsed time spent in a transformation. In addition, logging provides detailed information about exceptions, errors, and debugging details.
Reasons you may want to enable logging and step performance monitoring include: determining if a job completed with errors or to review errors that were encountered during processing. In headless environments, most ETL in production is not run from the graphical user interface and you need a place to watch initiated job results. Finally, performance monitoring provides you with useful information for both current performance problems and capacity planning.
To see what effect your transformation will have on the data sources it includes, go to the Action menu and click on Impact. PDI will perform an impact analysis to determine how your data sources will be affected by the transformation if it is completed successfully.
If you are an administrative user and want to monitor jobs and transformations, you must first set up logging and performance monitoring in Spoon. For more information about monitoring jobs and transformations, see the Monitoring System Performance section.
Set Up Logging
Follow the instructions below to create a log table that keeps a history of information associated with your field information.
- Have your system administrator create a database or table space called pdi_logging.
- Right-click in the workspace (canvas) where you have an open transformation and select Properties, or press <CTRL +T>. The Transformation Properties dialog box appears.
- In the Transformation Properties dialog box, click the Logging tab. Select which type of logging you want to use in the navigation pane on the left.
- Under Logging enter the following information:
Option Description Log Connection Specifies the database connection you are using for logging. You can configure a new connection by clicking New. Log table schema Specifies the schema name, if supported by your database. Log table name Specifies the name of the log table. Logging interval (seconds)
Specifies the interval in which logs are written to the table.
This property only applies to Transformation and Performance logging types.
Log record timeout (in days) Specifies the number of days to keep log entries in the table before they are deleted. Log size limit in lines
Limits the number of lines that are stored in the LOG_FIELD. PDI stores logging for the transformation in a long text field (CLOB).
This property only applies to the Transformation logging type.
- Select the fields you want to log in the Fields to log pane, or keep the default selections.
- Click SQL to open the Simple SQL Editor.
- Enter your SQL statements in the Simple SQL Editor.
- Click Execute to execute the SQL code to create your log table, then click OK to exit the Results dialog box.
- Click Close to exit the Simple SQL Editor.
- Click OK to exit the Transformation Properties dialog box.
The next time you run your transformation, logging information will be displayed under the Execution History tab.
Monitoring the LOG_FIELD field can negatively impact Pentaho Server performance. However, if you don't select all fields, including LOG_FIELD, when configuring transformation logging, you will not see information about this transformation in the Operations Mart logging.
When you run a job or transformation that has logging enabled, you have the following options of log verbosity level in the Run Options window:
|Nothing||Do not record any logging output.|
|Error||Only show errors.|
|Minimal||Only use minimal logging.|
|Basic||This is the default level.|
|Detailed||Give detailed logging output.|
|Debug||For debugging purposes, very detailed output.|
|Row Level||Logging at a row level. This will generate a lot of log data.|
If the Enable time option is selected, all lines in the logging will be preceded by the time of day.
There are a few ways that you can monitor step performance in PDI. Two tools are particularly helpful: the Sniff Test tool and the Monitoring tab. You can also use graphs to view performance.
Sniff Test Tool
The sniff test displays data as it travels from one step to another in the stream.
To use this, complete these steps.
- Right-click a step in the transformation as it runs and select Sniff Test During Execution. There are three options in this menu:
- Sniff test input rows - Shows the data inputted into the step.
- Sniff test output rows - Shows the data outputted from the step.
- Sniff test error handling - Shows error handling data.
- After you've selected an option, values in the data stream appear. You are also able to observe throughput.
The sniff test is designed to be used as a supplement to logs so that you can debug complex situations. Applying a sniff test slows transformation run speed, so use with care.
Pentaho Data Integration provides you with a tool for tracking the performance of individual steps in a transformation. By helping you identify the slowest step in the transformation, you can fine-tune and enhance the performance of your transformations.
You enable the step performance monitoring in the Transformation Properties dialog box. To access the dialog box right-click in the workspace that is displaying your transformation and choose, Transformation Settings. You can also access this dialog box, by pressing <CTRL + T>.
As shown in the sample screen capture above, the option to track performance (Enable step performance monitoring?) is not selected by default. Step performance monitoring may cause memory consumption problems in long-running transformations. By default, a performance snapshot is taken for all the running steps every second. This is not a CPU-intensive operation and, in most instances, does not negatively impact performance unless you have many steps in a transformation or you take a lot of snapshots (several per second, for example). You can control the number of snapshots in memory by changing the default value next to Maximum number of snapshots in memory. In addition, if you run in Spoon locally you may consume a fair amount of CPU power when you update the JFreeChart graphics under the Performance tab. Running in "headless" mode (Kitchen, Pan, Pentaho Server [slave server], Carte, Pentaho BI platform, and so on) does not have this drawback and should provide you with accurate performance statistics.