Skip to main content
Pentaho Documentation

Advanced Settings for Using the SDR

Overview

The SDR information described here is provided as a set of building blocks for you to use to create your own data refinery solutions.

The SDR information described here is provided as a set of building blocks for you to use to create your own data refinery solutions.

Install Vertica JDBC Driver

If you are using Vertica, you will need to follow these steps to install the Vertica JDBC driver.

  1. Exit Spoon, if you have it running.
  2. Stop the Data Integration server.
    1. Windows: Stop the DI Server by going to Start > Pentaho Enterprise Edition > Server Management and double-clicking the Stop DI Server icon.
    2. Mac OS: In the command line, run this script to stop the DI Server. 
      shutdown data-integration-server ./ctlscript.sh stop data-integration-server
      
    3. Copy the driver to these three directories:
      INSTALL_DIR/server/biserver-ee/tomcat/webapps/pentaho/WEB-INF/lib
      
      INSTALL_DIR/server/data-integration-server/tomcat/webapps/pentaho-di/WEB-INF/lib
      
      INSTALL_DIR/design-tools/data-integration/lib
      

Next, we are going to download and install the SDR sample.

Configure .KTR Files for Your Environment

If you are NOT using Postgres and a default installation, you will need to configure a few . KTRS in Spoon in order to use the SDR form.

  1. Click File > Open and navigate to find the SDR_data.ktr in this directory: pentaho/server/biserver-ee/pentaho-solutions/system/SDR/endpoints/kettle. Right-click on the Set Local Variables step to edit, and enter the URL for your BA Server. Click OK.

    SetLocalVariables.png

  2. In the same transformation, right-click on the Call_ML_SDR Job and select Open Referenced Object > Job to open the _ML_SDR.job. Right-click to edit the Create Table step to point to your staging database. Click OK.

    OrchestrateMe.png

  3. Next, right-click to edit the Publish Model step and enter your user id and password for the BA Server. Click Test Connection, then click OK.
  4. Open the ML_SDR_REFINERY.ktr. Right-click to edit the Out to Staging DB step and select your staging database from the drop-down menu.

    StagingDB.png

  5. Save all of these files and exit out of Spoon.
  6. Restart the BA and DI servers.

Use Hadoop with SDR

There are a few prerequisites that you need to make sure are satisfied before you can begin using SDR with Hadoop.

  1. Open the ML_SDR_REFINERY.ktr and locate the Hadoop File Input step in the upper-left.
  2. Click to activate the hop between the Hadoop File Input and Parse weblog steps.
  3. Click to deactivate all of  the hops between the Log File List and Read Weblog Files steps.

    HadoopActivated2.png

  4. Right-click to edit the Out to Staging DB step and select your staging database from the drop-down menu
  5. Click OK to close the window and save the transformation.

Update the SDR Sample to Call the DI Server

You will need to configure the repository location to run jobs through HTTP before you can use the SDR sample with the DI Server.

  1. Locate the data-integration-server\pentaho-solutions\system\kettle directory and open the slave-server-config.xml file with any text editor.
  2. Import the SDR KJB/KTR into the DI Server.
    1. Find the KJB and KTRs located in the biserver-ee\pentaho-solutions\system\SDR\endpoints\kettle\ML_SDR.
    2. Import them into the DI Server.
  3. Configure the SDR KJB/KTR to run from the DI Server.
    1. Update the job so that it runs the KTR from the repository, instead of the filesystem.
    2. Update the transformation so that it can find the weblog and lookup files.
  4. Update the SDR endpoint to call the DI Server to run the job.
    1. Switch to the HTTP Client Lookup step to make the HTTP call.

App Endpoints for Forms

These are a few API endpoints that you can also use to run the app. To perform a command, alter this example to match your parameters:

http://{host}/pentaho/plugin/{pluginID}/api/{command}
Endpoint Description
genre Populates the options for the genre selector.
gender Populates the options for the gender selector.
occupation Populates the options for the occupation selector.
income Populates the options for the income selector.
firstdate Returns the limit dates for the data to be processed.
data_source_name Returns the names of all data sources available on the server.
latest_requests Returns the latest 10 requests made in a table, instead of inside of a popup.
sdr_data Processes the request and returns the status of the data.
refresh Refreshes the kettle and dashboard elements to reflect any saved changes. Clears the cache for all kettle endpoints.