Skip to main content
Pentaho Documentation

Connect to a Pentaho Data Service

Overview

Explains how to connect to and query the Pentaho Data Service from a Pentaho or non-Pentaho tool.

A Pentaho Data Service is a virtual table that contains the output of a PDI transformation step.  You can connect to and query the Pentaho Data Service from any Pentaho tool, such as Report Designer, the PDI client (Spoon), and Analyzer.  You can also connect to and query it from a non-Pentaho tool, like RStudio or SQuirreL.  To learn more about the Pentaho Data Service, refer to the Turn Transformation Step Results Into a Pentaho Data Service and Develop and Optimize a Pentaho Data Service articles.

To connect and query the Pentaho Data Service, you need to have permission to run the transformation and to access the Pentaho Server where it is published.

Connect to the Pentaho Data Service from a Pentaho Tool

Connecting to the data service from another Pentaho tool is similar to connecting to a database. For information on connecting to a database, refer to Specify Data Connections for the DI Server.  The following table provides values for the typical parameters that you'll need to connect.

Required Parameters Description
Connection Name Name that you specify.
Connection Type Pentaho Data Services
Access Native (JDBC)
Hostname Hostname of the DI Server or IP Address.  The default is localhost if running the Pentaho Server locally.
Port Number Port number of the Pentaho Server the data service will run on.  The default is 8080.
Username Name of a user who has permission to run the data service.
Password Password for a user who has permission to run the data service.
webappname

The name of the web application. For the DI Server, the webappname is typically pentaho-di.  It is a parameter in the Options sections of the Kettle database connection dialog.

You can also set the following optional parameters.

Optional Parameters Description
proxyhostname Proxy server for HTTP connection(s).
proxyport Proxy server port.
nonproxyhosts Hosts that do not use the proxy server.  If there is more than one host name, separate them with commas.
debugtrans Optional name of the file where the generated transformation is stored.  This transformation is generated to debug it.  Example: /tmp/debug.ktr.  Specify the name of the transformation or a path plus the name of the transformation.
PARAMETER_[optionname]=value Sets the value for a parameter in the transformation. [optionname] is the name of the parameter, and [value] is the value assigned to it.  PARAMETER_ is placed before the option name. For example, if the name of the parameter is “model”, set the parameter:   PARAMETER_model=E6530.
secure Set this parameter to TRUE to use the HTTPS secure protocol connect to the data service.  If you omit this parameter or set it to FALSE, the standard HTTP unsecure protocol is used.

 

Connect to the Pentaho Data Service from a Non-Pentaho Tool

To connect to and run a Pentaho Data Service from a non-Pentaho tool, like SQuirreL or Beaker, you need to install the service driver files, then create a connection to the data service from your tool.

Download and Install the Pentaho Data Service JDBC Driver

Before you can connect to and run a Pentaho Data Service from a non-Pentaho tool like SQuirreL or Beaker, you need to download the PDI Data Service driver and install it. The driver is bundled with PDI.  If you want someone who doesn’t have PDI to connect to your data service, you will need to download the driver and give it to them so that they can install it.

Use the PDI client to Download the Pentaho Data Service JDBC Driver

To download the Pentaho Data Service JDBC Driver using the PDI client, perform the following steps.  If you'd prefer to download the files manually instead, refer to Manually Download the Pentaho Data Service JDBC Driver

  1. Do one of the following:
  • Right-click the step, then select Data Service > Driver Details....
  • In the transformation's View tab, select the data service from the Data Services folder, then right-click any data service and select Driver Details....
  • In the Data Services window, click Driver Details....  
  1. In the Driver Details window click Get Driver....
  2. In the Save As window, indicate where you want to save the Pentaho-Data-Service-Driver.zip file, then click Save.
  3. Click Close to exit the window.
  4. Follow the instructions to Install the Pentaho Data Services JDBC Driver.

Manually Download the Pentaho Data Service JDBC Driver

To manually download the Pentaho Data Service JDBC Driver perform the following steps.  To use the PDI client to download the Pentaho Data Service JDBC Driver, refer to Use Spoon to Download the Pentaho Data Service JDBC Driver.

  1. Go to the <pentaho installation directory>/design-tools/data-integration/Data Service JDBC Driver directory and download the zipped file in it. 
  2. Follow the instructions to Install the Pentaho Data Service JDBC Driver.

Install the Pentaho Data Service JDBC Driver

To install the Pentaho Data Service JDBC Driver, complete these steps.

  1. Extract the driver files that you downloaded in the Download the Pentaho Data Service JDBC Driver instructions and paste them into the JDBC directory in your application.  For example, if you want to access the data service from RStudio, paste the driver files in the same directory as the other JDBC drivers.  If you aren’t sure where this is, check the documentation for that application.
  2. Start and stop the application.
  3. In the application, make a connection to the data service, using the instructions in Connect to the Pentaho Data Service from a Non-Pentaho Tool.

Create a Connection from a Non-Pentaho Tool

Once the driver is installed, you will need to create the connection to the Pentaho Data Service.  For many tools, you'll do this by specifying a connection object.  Review the connection details and optional options in Connect to the Pentaho Data Service from a Pentaho Tool.

You'll probably also need the JDBC Driver class from the following table.

Parameter Value
JDBC Driver Class org.pentaho.di.trans.dataservice.jdbc.ThinDriver
Example of JDBC Connection String

The JDBC connection string uses this format:

jdbc:pdi://<Pentaho Server Hostname:Port>/kettle?option=value&option=value

Here is an example of a connection string.  The webappname is required if the data service is running on the Pentaho Server.   

jdbc:pdi://localhost:8080/kettle?webappname=pentaho-di

Monitor a Pentaho Data Service

To monitor a data service, type one of the following commands into your browser.  Replace the <Pentaho Server Host:Port> of the Pentaho Server with the host name or IP address and the port of the Pentaho Server you are running the data service on.  You will need to have access to the Pentaho Server and be authenticated to run the following commands.

List Names of Data Services on the Pentaho Server List Status of Data Services on the Pentaho Server

http://<Pentaho Server Host:Port>/pentaho-di/kettle/listServices

http://<Pentaho Server Host:Port>/pentaho-di/kettle/status

Query a Pentaho Data Service

You can use SQL to query a data service. You can also add a conditional to the query if the data service transformation uses a parameter. This means that the ETL Developer who created or modified the data service transformation, must have already added the parameter. Refer to Develop and Optimize a Pentaho Data Service for more details. You can then assign a value to the parameter in your query. 

To find the name of the table to query, you can connect to the data service, then use explorer to find the name of the table.  The name of table is usually the same as the name of the data service.

Here is an example of a SQL query that include a parameter. The example queries the employeeList data service for records, and when the data service transformation is executed, any use of the parameter employeeRegion in it will be substituted with the value USA EAST. This parameter could be in any transformation step that allows variable or parameter substitution.

You can also assign a value to the parameter in the Connection Properties window.

Here is the syntax:

SELECT * FROM <data service name> WHERE PARAMETER('<parameter_name>') = '<parameter_value>'

Here is an example.

view sourceprint
   SELECT * FROM employeeList WHERE PARAMETER('employeeRegion')='USA EAST'