Explains how to use the Pentaho Data Service in a Pentaho or non-Pentaho Tool
A Pentaho Data Service is a virtual table that contains the output of a PDI transformation step. You can connect to and query the Pentaho Data Service from any Pentaho tool, such as Report Designer, the PDI client (Spoon), and Analyzer. You can also connect to and query it from a non-Pentaho tool, like RStudio or SQuirreL. To learn more about the Pentaho Data Service, refer to the Turn Transformation Step Results Into a Pentaho Data Service and Create a Pentaho Data Service articles.
To connect and query the Pentaho Data Service, you need to have permission to run the transformation and to access the Pentaho Server where it is published.
Access a Pentaho Data Service
Once you've created a Pentaho Data Service, you can share it with others. Here is how they can connect to the service.
Connect to the Pentaho Data Service from a Pentaho Tool
Connecting to the data service from another Pentaho tool is similar to connecting to a database. For information on connecting to a database, refer to Specify Data Connections for the DI Server. The following table provides values for the typical parameters that you'll need to connect.
|Connection Name||Name that you specify.|
|Connection Type||Pentaho Data Services|
|Hostname||Hostname of the DI Server or IP Address. The default is localhost if running the Pentaho Server locally.|
|Port Number||Port number of the Pentaho Server the data service will run on. The default is 9080.|
|Username||Name of a user who has permission to run the data service.|
|Password||Password for a user who has permission to run the data service.|
The name of the web application. The webappname is typically pentaho-di. Specify this in the Options section of the Kettle database connection window if you want to connect with PDI to a Pentaho Server
You can also set the following optional parameters.
|proxyhostname||Proxy server for HTTP connection(s).|
|proxyport||Proxy server port.|
|nonproxyhosts||Hosts that do not use the proxy server. If there is more than one host name, separate them with commas.|
|debugtrans||Optional name of the file where the generated transformation is stored. This transformation is generated to debug it. Example:
|debuglog||Set this parameter to "true” if you want the log data from the transformation to be written to the general logging channel that appears in Spoon.|
|PARAMETER_[optionname]=value||Sets the value for a parameter in the transformation.
|secure||Set this parameter to TRUE to use the HTTPS secure protocol connect to the data service. If you omit this parameter or set it to FALSE, the standard HTTP unsecure protocol is used.|
Install Pentaho Data Service JDBC Driver Files on a Non-Pentaho Tool
To connect to and run a Pentaho Data Service from a non-Pentaho tool, like Squirrel or Beaker, you need to install the service driver files, then create a connection to the data service. Pentaho Data Service JDBC driver files are available with installations of Pentaho.
- Go to the pentaho/data-integration/Data Service JDBC Driver directory on a computer that has Pentaho Data Integration installed, and copy the files in it.
- Paste the files to the directory in your application where driver files are kept.
- If necessary, stop and restart the application.
Connect to the Pentaho Data Service from a Non-Pentaho Tool
Once the driver is installed, you will need to create the connection to the Pentaho Data Service. For many tools, you'll do this by specifying a connection object. Review the connection details and optional options in Connect to the Pentaho Data Service from a Pentaho Tool.
You'll probably also need the JDBC Driver class from the following table.
|JDBC Driver Class||org.pentaho.di.trans.dataservice.jdbc.ThinDriver|
Example of JDBC Connection String
The JDBC connection string uses this format:
Here is an example of a connection string. The webappname is required.
Query a Pentaho Data Service
You can query the data service using SQL depending on your data service. Note that you cannot query a Pentaho Data Service with MongoDB aggregation pipelines. MongoDB can be used as an input source, but you can only query the Data Service with SQL. You can query the data service as you would normally for the tool you are using.
- To find the name of the table to query, connect to the data service, then use explorer to find the name of the table. The name of table is usually the same as the name of the data service.
- Note that there are SQL limitations for queries.