Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Data Integration Operations Mart

Parent article

The PDI Operations Mart is a centralized data mart that stores job or transformation log data for auditing, reporting, and analysis. The PDI Operations Mart enables you to collect and query Data Integration log data and then use the Pentaho Server tools to examine the log data in reports, charts, and dashboards. The data mart is a collection of tables organized as a data warehouse using a star schema. Together, the dimension tables and a fact table represent the logging data. These tables must be created in the PDI Operations Mart database. Pentaho provides SQL scripts to create these tables for the PostgreSQL database. A Data Integration job populates the time and date dimensions.

NoteFor optimal performance, be sure to clean the operations mart periodically.

Getting started

Installation of the Data Integration Operations Mart depends on the following conditions and prerequisites:

Database requirement

Before proceeding with the Data Integration Operations Mart installation steps below, ensure that your Pentaho Server and Pentaho Repository are configured with one of the following database types:
  • PostgreSQL
  • MySQL
  • Oracle
  • MS SQL Server

If you need to review the Pentaho Server installation method, see Pentaho installation.

Existing 8.3 installation

If you have an existing 8.3 installation of the PDI client (Spoon) and the Pentaho Server, you must configure them to use the Data Integration Operations Mart using the Installation Steps below.

Installation Steps

To install the Data Integration Operations Mart, you will perform the following steps:

  • Step 1: Get the Data Integration Operations Mart files
  • Step 2: Run the setup script
  • Step 3: Set the global Kettle logging variables
  • Step 4: Add logging and Operations Mart connections
  • Step 5: Add a JNDI connection for the Pentaho Server
  • Step 6: Add the Data Integration Operations Mart ETL solution and sample reports to the Pentaho Repository
  • Step 7: Initialize the Data Integration Operations Mart
  • Step 8: Verify the Data Integration Operations Mart is working

Step 1: Get the Data Integration Operations Mart files

The Data Integration Ops Mart files are available for download from the Pentaho Customer Support Portal.

Procedure

  1. On the Customer Portal home page, sign in using the Pentaho support user name and password provided in your Pentaho Welcome Packet.

  2. Click Downloads, then click Pentaho 8.3 GA Release in the 8.x list.

  3. On the bottom of the Pentaho 8.3 GA Release page, browse the folders in the Box widget to find the files you need, located in the Operations Mart folder:

    • pentaho-operations-mart-8.3.0-dist.zip
  4. Unzip the Pentaho Operations Mart file.

    Inside are the packaged Operations Mart installations file.
  5. Unpack the installation file by running the installer file for your environment.

  6. In the IZPack window, read the license agreement, select I accept the terms of this license agreement, and then click Next.

  7. In the Select the installation path text box, browse to or enter the directory location where you want to unpack the files, then click Next.

  8. If you chose an existing directory, a warning message that the directory already exists appears. Click Yes.

    Any existing files in the directory will be retained.
  9. When the installation progress is complete, click Quit.

    Your directory will contain the setup scripts and files used to create the default content in the following steps.

Step 2: Run the Setup Script

Depending on your database repository type, run the following scripts to create the tables which will capture the activity of transformations and jobs.

The pentaho-operations-mart-ddl-8.3.x.yy.zip file contains folders for each database type listed with the scripts that are needed.

Database TypeScript NameLocated in the Directory
PostgreSQL
  • pentaho_logging_postgresql.sql
  • pentaho_mart_postgresql.sql
  • pentaho_mart_upgrade_postgresql.sql
/pentaho-server/data/postgresql
MySQL
  • pentaho_logging_mysql.sql
  • pentaho_mart_mysql.sql
  • pentaho_mart_upgrade_mysql.sql
/pentaho-server/data/mysql5
Oracle
  • pentaho_logging_oracle.sql
  • pentaho_mart_oracle.sql
  • pentaho_mart_upgrade_oracle.sql
/pentaho-server/data/oracle10g
Microsoft SQL Server
  • pentaho_logging_server.sql
  • pentaho_mart_sqlserver.sql
  • pentaho_mart_upgrade_sqlserver.sql
/pentaho-server/data/sqlserver

Step 3: Set the global Kettle logging variables

Perform this step on the computer where you have installed your Pentaho Data Integration (PDI) client and Pentaho Server.

When you run PDI for the first time, the kettle.properties file is created and stored in the $USER_HOME/.kettle.properties directory.

Procedure

  1. In the PDI client, select Edit Edit the kettle.properties file

  2. Add or edit the variables and values to reflect the values shown in the following table:

    NoteFor Oracle and Microsoft SQL Server, leave Value blank with Variables that contain SCHEMA in the name.
    VariableValue
    KETTLE_CHANNEL_LOG_DBlive_logging_info
    KETTLE_CHANNEL_LOG_TABLEchannel_logs
    KETTLE_CHANNEL_LOG_SCHEMApentaho_dilogs
    KETTLE_JOBENTRY_LOG_DBlive_logging_info
    KETTLE_JOBENTRY_LOG_TABLEjobentry_logs
    KETTLE_JOBENTRY_LOG_SCHEMApentaho_dilogs
    KETTLE_JOB_LOG_DBlive_logging_info
    KETTLE_JOB_LOG_TABLEjob_logs
    KETTLE_JOB_LOG_SCHEMApentaho_dilogs
    KETTLE_METRICS_LOG_DBlive_logging_info
    KETTLE_METRICS_LOG_TABLEmetrics_logs
    KETTLE_METRICS_LOG_SCHEMApentaho_dilogs
    KETTLE_STEP_LOG_DBlive_logging_info
    KETTLE_STEP_LOG_TABLEstep_logs
    KETTLE_STEP_LOG_SCHEMApentaho_dilogs
    KETTLE_TRANS_LOG_DBlive_logging_info
    KETTLE_TRANS_LOG_TABLEtrans_logs
    KETTLE_TRANS_LOG_SCHEMApentaho_dilogs
    KETTLE_TRANS_PERFORMANCE_LOG_DBlive_logging_info
    KETTLE_TRANS_PERFORMANCE_LOG_TABLEtransperf_logs
    KETTLE_TRANS_PERFORMANCE_LOG_SCHEMApentaho_dilogs

Step 4: Add logging and operations mart connections

This section explains how to add the logging (live_logging_info) and Operations Mart (PDI_Operations_Mart) connections for a PDI client.

Procedure

  1. Navigate to the pentaho/design-tools/data-integration/simple-jndi directory.

  2. Open the jdbc.properties file with a text editor.

  3. Depending on your repository database type, update the values accordingly (URL, users, password) as shown in the samples:

    PostgreSQL:

     PDI_Operations_Mart/type=javax.sql.DataSource
    PDI_Operations_Mart/driver=org.postgresql.Driver
    PDI_Operations_Mart/url=jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_operations_mart
    PDI_Operations_Mart/user=hibuser
    PDI_Operations_Mart/password=password
    live_logging_info/type=javax.sql.DataSource
    live_logging_info/driver=org.postgresql.Driver
    live_logging_info/url=jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_dilogs
    live_logging_info/user=hibuser
    live_logging_info/password=password

    MySQL:

    PDI_Operations_Mart/type=javax.sql.DataSource
    PDI_Operations_Mart/driver=com.mysql.jdbc.Driver
    PDI_Operations_Mart/url=jdbc:mysql://localhost:3306/pentaho_operations_mart
    PDI_Operations_Mart/user=hibuser
    PDI_Operations_Mart/password=password
    live_logging_info/type=javax.sql.DataSource
    live_logging_info/driver=com.mysql.jdbc.Driver
    live_logging_info/url=jdbc:mysql://localhost:3306/pentaho_dilogs
    live_logging_info/user=hibuser
    live_logging_info/password=password

    Oracle:

    PDI_Operations_Mart/type=javax.sql.DataSource
    PDI_Operations_Mart/driver=oracle.jdbc.OracleDriver
    PDI_Operations_Mart/url=jdbc:oracle:thin:@localhost:1521/XE
    PDI_Operations_Mart/user=pentaho_operations_mart
    PDI_Operations_Mart/password=password
    live_logging_info/type=javax.sql.DataSource
    live_logging_info/driver=oracle.jdbc.OracleDriver
    live_logging_info/url=jdbc:oracle:thin:@localhost:1521/XE
    live_logging_info/user=pentaho_dilogs
    live_logging_info/password=password

    Microsoft SQL Server:

    PDI_Operations_Mart/type=javax.sql.DataSource
    PDI_Operations_Mart/driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
    PDI_Operations_Mart/url=jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart
    PDI_Operations_Mart/user=pentaho_operations_mart
    PDI_Operations_Mart/password=password
    live_logging_info/type=javax.sql.DataSource
    live_logging_info/driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
    live_logging_info/url=jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_dilogs
    live_logging_info/user=dilogs_user
    live_logging_info/password=password

Step 5: Add a JNDI connection for the Pentaho Server

This section explains how to add a JNDI connection for the Pentaho Server. Perform this task on the computer where you have installed the Pentaho Server.

Procedure

  1. Navigate to the pentaho/server/pentaho-server/tomcat/webapps/Pentaho/META-INF/ folder.

  2. Open the context.xml file with a text editor.

  3. Depending on your database type, edit the file to reflect the values in the applicable example:

    PostgreSQL:

     <Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="org.postgresql.Driver" 
                url="jdbc:postgresql://localhost:5432/hibernate"
                validationQuery="select 1"/>
               
     <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="org.postgresql.Driver" 
                url="jdbc:postgresql://localhost:5432/hibernate"
                validationQuery="select 1"/>
               
     <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" minIdle="0" maxIdle="5" initialSize="0"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="org.postgresql.Driver" 
                url="jdbc:postgresql://localhost:5432/hibernate?searchpath=pentaho_dilogs"            
                validationQuery="select 1"/>

    MySQL:

      <Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_operations_mart"
                jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/>
               
      <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_operations_mart"
                jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/>
                
      <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource"
                factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxActive="20" maxIdle="5"
                maxWait="10000" username="hibuser" password="password"
                driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pentaho_dilogs"            
                jdbcInterceptors="ConnectionState" defaultAutoCommit="true" validationQuery="select 1"/>

    Oracle:

    <Resource 
        validationQuery="select 1 from dual"
        url="jdbc:oracle:thin:@localhost:1521/orcl"
        driverClassName="oracle.jdbc.OracleDriver"
        password="password"
        username="pentaho_operations_mart"
        initialSize="0"
        maxActive="20"
        maxIdle="10"
        maxWait="10000"
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
        type="javax.sql.DataSource"
        auth="Container"
        connectionProperties="oracle.jdbc.J2EE13Compliant=true"
        name="jdbc/pentaho_operations_mart"/>
    
    <Resource 
        validationQuery="select 1 from dual"
        url="jdbc:oracle:thin:@localhost:1521/orcl"
        driverClassName="oracle.jdbc.OracleDriver"
        password="password"
        username="pentaho_operations_mart"
        initialSize="0"
        maxActive="20"
        maxIdle="10"
        maxWait="10000"
        factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
        type="javax.sql.DataSource"
        auth="Container"
        connectionProperties="oracle.jdbc.J2EE13Compliant=true"
        name="jdbc/PDI_Operations_Mart"/>
    
    <Resource validationQuery="select 1 from dual" url="jdbc:oracle:thin:@localhost:1521/XE" 
         driverClassName="oracle.jdbc.OracleDriver" password="password" 
         username="pentaho_dilogs" maxWaitMillis="10000" maxIdle="5" maxTotal="20" 
         jdbcInterceptors="ConnectionState" defaultAutoCommit="true" 
         factory="org.apache.commons.dbcp.BasicDataSourceFactory" type="javax.sql.DataSource" 
         auth="Container" name="jdbc/live_logging_info"/>

    Microsoft SQL Server:

    <Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource"
          factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5"
          maxWaitMillis="10000" username="pentaho_operations_mart" password="password" 
          jdbcInterceptors="ConnectionState" defaultAutoCommit="true"
          driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
          url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart"
          validationQuery="select 1"/>
                
    <Resource name="jdbc/pentaho_operations_mart" auth="Container" type="javax.sql.DataSource"
           factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5"
           maxWaitMillis="10000" username="pentaho_operations_mart" password="password" 
           jdbcInterceptors="ConnectionState" defaultAutoCommit="true"
           driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
           url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_operations_mart"
           validationQuery="select 1"/>
     
    <Resource name="jdbc/live_logging_info" auth="Container" type="javax.sql.DataSource"
            factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" maxTotal="20" maxIdle="5"
            maxWaitMillis="10000" username="dilogs_user" password="password" 
            jdbcInterceptors="ConnectionState" defaultAutoCommit="true"
            driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
            url="jdbc:sqlserver://10.0.2.15:1433;DatabaseName=pentaho_dilogs"            
            validationQuery="select 1"/>

Step 6: Add the Data Integration Operations Mart ETL solution and sample reports to the Pentaho Repository

  1. Stop the Pentaho Server.

  2. Depending on your repository database type, copy the following ETL solution and sample reports (downloaded in Step 1: Get the Data Integration Operations Mart files) to $PENTAHO_HOME/pentaho-server/pentaho-solution/default-content.

    NoteThe file pentaho-operations-mart-etl-8.3.0-dist.zip may already be in this directory. If you are using a repository database type other than PostgreSQL, remove it.
    • PostgreSQL: pentaho-operations-mart-etl-8.3.0-dist.zip
    • MySQL: pentaho-operations-mart-etl-mysql5-8.3.0-dist.zip
    • Oracle: pentaho-operations-mart-etl-oracle10g-8.3.0-dist.zip
    • Microsoft SQL Server: pentaho-operations-mart-etl-mssql-8.3.0-dist.zip
  3. Place these two files in the directory as well:

    • DI Operations Mart sample reports: pentaho-operations-mart-operations-di-8.3.0-dist.zip
    • BA Operations Mart sample reports: pentaho-operations-mart-operations-bi-8.3.0-dist.zip
  4. Start the Pentaho Server.

Step 7: Initialize the Data Integration Operations Mart

  1. Launch the PDI client (Spoon).

  2. Connect to the Pentaho Repository via the Pentaho Server.

  3. At the main menu, select File Open.

  4. Select Browse Files Public Pentaho Operations Mart DI Ops Mart ETL Fill_in_DIM_DATE_and_DIM_TIME job file and run it.

  5. At the main menu, select File Open.

  6. Select Public Pentaho Operations Mart DI Ops Mart ETL Update_Dimensions_then_Logging_Datamart job file and run it.

Step 8: Verify the Data Integration Operations Mart is working

  1. From the Pentaho User Console, select Browse Files Public Pentaho Operations Mart DI Audit Reports Last_Runand open it.

  2. You should see the Jobs and Transformations that were run in Step 7.

Give users access to the PDI Operations Mart

By default, only users who have the Admin role can access the Pentaho Operations Mart. The Admin role has access to all capabilities within all Pentaho products, including the Pentaho Operations Mart. If you want to allow users to view and run the Pentaho Operations Mart only, you can assign them the Pentaho Operations role. For example, a user who has been assigned the Pentaho Operations user role is able to open and view a report within the PDI Operations Mart, but does not have the ability to delete it.

To give users access to view the PDI Operations Mart, assign the Pentaho Operations role to those users as follows:

Procedure

  1. From within the Pentaho User Console, select the Administration tab.

  2. From the left panel, select Security Users/Roles.

  3. Select the Roles tab.

  4. Add the new role called Pentaho Operations by following the instructions in Adding Roles.

  5. Assign the appropriate users to the new role, as described in Adding Users to Roles.

  6. Advise these users to log in to the Pentaho User Console, create a Pentaho Analyzer or Pentaho Interactive Reports, and ensure that they can view the Pentaho Operations Mart in the Select a Data Source dialog box.

Charts, reports, and dashboards using PDI Operations Mart data

Once you have created and populated your Pentaho Data Integration Operations Mart with log data, the features of the Pentaho User Console enable you to examine this data and create reports, charts, and dashboards. We provide many pre-built reports, charts, and dashboards that you can modify.

To help understand the contents of the log, see Data Integration Operations Mart Reference.

Clean up operations mart tables

Cleaning the PDI Operation Mart consists of running either a job or transformation that deletes data older than a specified maximum age. The transformation and job for cleaning up the PDI Operations Mart can be found in the etl folder.

Perform the following steps to clean up the PDI Operations Mart:

Procedure

  1. Using the PDI client (Spoon), open either Clean_up_PDI_Operations_Mart.kjb for jobs or the Clean_up_PDI_Operations_Mart_fact_table.ktr for transformations.

  2. Set the following parameters:

    • max.age.days (required)

      the maximum age in days of the data.

    • schema.prefix (optional)

      for PostgreSQL databases, enter the schema name followed by a period (.), this will be applied to the SQL statements. For other databases, leave the value blank

  3. Run the job or transformation.

    This will delete Job and transformation data older than the maximum age from the data mart.

Next steps

To schedule regular clean up of the PDI Operations Mart, see Schedule perspective in the PDI client.