Explains how to connect to and query the Pentaho Data Service from a Pentaho or non-Pentaho tool.
A Pentaho Data Service is a virtual table that contains the output of a PDI transformation step. You can connect to and query the Pentaho Data Service from any Pentaho tool, such as Report Designer, the PDI client (Spoon), and Analyzer. You can also connect to and query it from a non-Pentaho tool, like RStudio or SQuirreL. To learn more about the Pentaho Data Service, refer to the Turn Transformation Step Results Into a Pentaho Data Service and Develop and Optimize a Pentaho Data Service articles.
To connect and query the Pentaho Data Service, you need to have permission to run the transformation and to access the Pentaho Server where it is published.
Connecting to the data service from another Pentaho tool is similar to connecting to a database. For information on connecting to a database, refer to Specify Data Connections for the DI Server. The following table provides values for the typical parameters that you'll need to connect.
|Connection Name||Name that you specify.|
|Connection Type||Pentaho Data Services|
|Hostname||Hostname of the DI Server or IP Address. The default is localhost if running the Pentaho Server locally.|
|Port Number||Port number of the Pentaho Server the data service will run on. The default is 8080.|
|Username||Name of a user who has permission to run the data service.|
|Password||Password for a user who has permission to run the data service.|
The name of the web application. For the DI Server, the webappname is typically pentaho-di. It is a parameter in the Options sections of the Kettle database connection dialog.
You can also set the following optional parameters.
|proxyhostname||Proxy server for HTTP connection(s).|
|proxyport||Proxy server port.|
|nonproxyhosts||Hosts that do not use the proxy server. If there is more than one host name, separate them with commas.|
|debugtrans||Optional name of the file where the generated transformation is stored. This transformation is generated to debug it. Example:
|PARAMETER_[optionname]=value||Sets the value for a parameter in the transformation.
|secure||Set this parameter to TRUE to use the HTTPS secure protocol connect to the data service. If you omit this parameter or set it to FALSE, the standard HTTP unsecure protocol is used.|
To connect to and run a Pentaho Data Service from a non-Pentaho tool, like SQuirreL or Beaker, you need to install the service driver files, then create a connection to the data service from your tool.
Before you can connect to and run a Pentaho Data Service from a non-Pentaho tool like SQuirreL or Beaker, you need to download the PDI Data Service driver and install it. The driver is bundled with PDI. If you want someone who doesn’t have PDI to connect to your data service, you will need to download the driver and give it to them so that they can install it.
To download the Pentaho Data Service JDBC Driver using the PDI client, perform the following steps. If you'd prefer to download the files manually instead, refer to Manually Download the Pentaho Data Service JDBC Driver.
- Do one of the following:
- Right-click the step, then select Data Service > Driver Details....
- In the transformation's View tab, select the data service from the Data Services folder, then right-click any data service and select Driver Details....
- In the Data Services window, click Driver Details....
- In the Driver Details window click Get Driver....
- In the Save As window, indicate where you want to save the Pentaho-Data-Service-Driver.zip file, then click Save.
- Click Close to exit the window.
- Follow the instructions to Install the Pentaho Data Services JDBC Driver.
To manually download the Pentaho Data Service JDBC Driver perform the following steps. To use the PDI client to download the Pentaho Data Service JDBC Driver, refer to Use Spoon to Download the Pentaho Data Service JDBC Driver.
- Go to the <pentaho installation directory>/design-tools/data-integration/Data Service JDBC Driver directory and download the zipped file in it.
- Follow the instructions to Install the Pentaho Data Service JDBC Driver.
To install the Pentaho Data Service JDBC Driver, complete these steps.
- Extract the driver files that you downloaded in the Download the Pentaho Data Service JDBC Driver instructions and paste them into the JDBC directory in your application. For example, if you want to access the data service from RStudio, paste the driver files in the same directory as the other JDBC drivers. If you aren’t sure where this is, check the documentation for that application.
- Start and stop the application.
- In the application, make a connection to the data service, using the instructions in Connect to the Pentaho Data Service from a Non-Pentaho Tool.
Once the driver is installed, you will need to create the connection to the Pentaho Data Service. For many tools, you'll do this by specifying a connection object. Review the connection details and optional options in Connect to the Pentaho Data Service from a Pentaho Tool.
You'll probably also need the JDBC Driver class from the following table.
|JDBC Driver Class||org.pentaho.di.trans.dataservice.jdbc.ThinDriver|
Example of JDBC Connection String
The JDBC connection string uses this format:
jdbc:pdi://<Pentaho Server Hostname:Port>/kettle?option=value&option=value
Here is an example of a connection string. The webappname is required if the data service is running on the Pentaho Server.
To monitor a data service, type one of the following commands into your browser. Replace the <Pentaho Server Host:Port> of the Pentaho Server with the host name or IP address and the port of the Pentaho Server you are running the data service on. You will need to have access to the Pentaho Server and be authenticated to run the following commands.
|List Names of Data Services on the Pentaho Server||List Status of Data Services on the Pentaho Server|
http://<Pentaho Server Host:Port>/pentaho-di/kettle/listServices
http://<Pentaho Server Host:Port>/pentaho-di/kettle/status
You can use SQL to query a data service. You can also add a conditional to the query if the data service transformation uses a parameter. This means that the ETL Developer who created or modified the data service transformation, must have already added the parameter. Refer to Develop and Optimize a Pentaho Data Service for more details. You can then assign a value to the parameter in your query.
To find the name of the table to query, you can connect to the data service, then use explorer to find the name of the table. The name of table is usually the same as the name of the data service.
- There are SQL limitations for queries. Refer to the Pentaho Data Source SQL Reference and Other Development Considerations article for more details.
- Although you can use any Pentaho-supported input source, data services can only be queried with SQL.
Here is an example of a SQL query that include a parameter. The example queries the employeeList data service for records, and when the data service transformation is executed, any use of the parameter employeeRegion in it will be substituted with the value USA EAST. This parameter could be in any transformation step that allows variable or parameter substitution.
You can also assign a value to the parameter in the Connection Properties window.
Here is the syntax:
SELECT * FROM <data service name> WHERE PARAMETER('<parameter_name>') = '<parameter_value>'
Here is an example.
view sourceprint SELECT * FROM employeeList WHERE PARAMETER('employeeRegion')='USA EAST'