Skip to main content
Pentaho Documentation

Create a Pentaho Data Service

Overview

Explains how to create, test, and optimize a Pentaho Data Service.

The Pentaho Data Service allows you to retrieve the query results of a transformation through a JDBC service.   These instructions explain how you can create and test a data service.  It also describes techniques for optimizing the performance of your Data Service.  Applying optimizations to a Data Service can significantly improve performance by leveraging caching, as well as adding filters to the input data source.

As you work through these instructions to create a Pentaho Data Service, we recommend that you use the Verify feature (Actions > Verify). The verify feature helps you validate your transformation, including the Pentaho Data Service, as you build it.

Create a Pentaho Data Service

To create the data service, complete the following steps.

  1. In Spoon, create or open a transformation that you’d like to turn into a data service.
  2. Right-click a step at the end of the transformation which outputs the desired data, and select Data Service > New....

DataService.png

  1. In the Data Service window, enter a name for the data service in the Service Name (Virtual Table Name) field.  When you run the data service, the results will be stored in a virtual table with the name you enter in this field.  Your data service name should be unique within your transformation. The data service should also not have the same name as another data service in another transformation saved locally or in the repository.
  2. By default, data service results are cached in memory on the DI Server for an hour. If you want to change the duration, click the Service Cache tab, and modify the Cache Duration (seconds) field.  (If you want to disable the cache, deselect the Enable Caching checkbox.)
  3.  Test the data service.
  4. Save and close the data service.

Test the Pentaho Data Service

 To test the Pentaho Data Service, complete these steps.

  1. Select the data service that you want to test from the View panel.  The Data Service window appears.
  2. Click the Test Data Service button.  The Data Service Test window appears.

dataservicetest.png

  1. Adjust the logging level, if needed.
  2. Indicate the maximum number of rows that you’d like to preview in the Max Rows field.
  3. Click the Execute SQL button.  (You can also press [CTRL]+[ENTER] to query.)  Test results span several tabs.
Tab Name Description
Query Results 
 
Shows processing information and the result of the query.
Optimized Queries 
 
Displays processing information and the result of optimizations.
Service Transformation Logging 
 
Shows the logs from the transformation as it runs.  Note that there are two transformations that run: one that you created  (Service Transformation) and the one that PDI creates (Generated Transformation) when you run the data service.  This tab shows service transformation logs.
Generated Transformation Logging 
 
Displays the logs from the transformation as it runs.  Note that there are two transformations that run: one that you created  (Service Transformation) and the one that PDI creates (Generated Transformation) when you run the data service.  This tab shows generated transformation logs.
Service Metrics 
 
Shows a GANTT-chart like representation of the amount of time, in milliseconds it took to run the different parts of the transformation.
SQL Trans Metrics Shows a GANTT-chart like representation of the amount of time, in milliseconds it took to run the SQLquery made to the Data Service.

Optimize a Data Service

Optimizations are optional, but can greatly improve the performance of your data service.  Optimizations can be applied to the Table Input and MongoDB Input steps.

 If caching is enabled for the data service, the cache will be checked before applying optimizations. If query results cannot be derived from cache, the service transformation will be run, applying optimizations where possible. The results of that execution will then be cached.

Set Input Step Optimization Values

To set up an input step optimization value, complete these steps.

  1. Create and test a Pentaho Data Service for a transformation that has a Table Input or MongoDB Input step.
  2. In the Pentaho Data Service window, click the Input Step Optimizations tab.

DataService_InputStepOptimization.png

  1. Click the + button.
  2. In the Create Parameter window type the name of an optimization parameter.  You will be using this parameter name later when you add it to your input step query.  Click OK.
  3. Select the step that contains the parameter in the Step Name field.
  4. In the Definitions area of the window, enter the Data Service Field and the Step Field that you want to parameterize.
  • The Data Service Field contains the name of the transformation's name of the field you want to parameterize.  For example, you might have renamed the cty field from your data source as country.  You would enter country in the Data Service Field.
  • The Step Field should contain the data source's name of the field you want to parameterize.  For example, if you wanted to  parameterize the cty field in a MySQL database, enter cty in the Step Field.

Optionally, you can click the Get Optimizations button to automatically generate input step optimizations based on the output fields for the step on which you've created the data service and input fields.

  1. Click the OK button to save and close the window.
  2. Add the optimization parameter you created to the input step query.

Add the Optimization Parameter to the Input Step Query

The input step allows you to query the data source.  These instructions explain how to add a parameter that will act as a WHERE clause in the SQL query or the equivalent type of clause in the MongoDB query.

  1. In the transformation that contains the data service, double-click the input step (Table Input or MongoDB Input) that contains the query you want to optimize.
  2. Add the parameter you created in the Set Input Step Optimization Values instructions to the place where the WHERE clause value would be, like this:
  • SQL Query Example: SELECT * FROM media WHERE  ${countryParam} 
  • MongoDB Query Example: {$match : ${mongoDbParam}}
  1. Press [CTRL]+[SPACE] to view the list of parameters. Click on a parameter from the list to add it to the query.
  2. Click the Replace Variables in Script? checkbox.
  3. Click the OK button.
  4. Test your optimization.

Test the Optimization

To test the Pentaho Data Service optimization, complete these steps.

  1. Select the data service that you want to test from the View panel.  The Data Service window appears.
  2. Click the Test Data Service button.  The Data Service Test window appears.  The SQL section shows a query that has the optimization variable that you added to it, like this:

SELECT * FROM myNewParam 

  1. Adjust the logging level, if desired.
  2. Indicate the maximum number of rows that you’d like to return in the Max Rows field.
  3. In the SQL section of the window set the WHERE clause value using SQL, like this:

SELECT * FROM myNewParam WHERE cty='Japan' 

  1. Click the Preview Optimization button.
  2.  The query results appear in the Optimized Query tab.  This tab shows the starting query as well as the input step queries after push down optimizations have been applied.  There could be several input queries displayed in the tab: one for each input step that supports push down.

Edit or Delete a Data Service

To edit or delete a data service, do these things.

  • Edit a Data Service: In the View tab, right-click the data service and select Edit.
  • Delete a Data Service: In the View tab, right-click the data service and select Delete.
  • Disable an Optimization: Click the Disable an Optimization checkbox in the Data Services window.

Monitor the Data Service

To monitor the data service from a browser, use one of the following commands.  Replace localhost:9080 with the hostname and port number of the DI Server that is running the data service.

To list the services on carte, use http://localhost:9080/kettle/listServices.  To list the status of the service on carte, use http://localhost:9080/kettle/Status.

Share the Data Service with Others

To share the data service with others, do two things.

  1. Grant the person who will use the data service access to the data service transformation and the DI Server.  For more information, read Assign Permissions to Use or Manage Database Connections.
  2. Follow the instructions in the Connect to PDI Data Services article.

Troubleshooting

Having problems with data services?  Check out our Resolve Pentaho Data Service Issues article to get help.