Skip to main content
Pentaho Documentation

Cluster the Application Server

Overview

These sections explain the requirements for clustering servers, how to initialize and configure the repository, configure a jackrabbit journal and quartz, and how to test the cluster.

A Pentaho node is made up of a Tomcat Web App server, and one BA server. Multiple nodes that are joined make up a cluster. You can create a cluster using any version of Pentaho BA Suite 5.x.

These sections explain the requirements for clustering servers, how to initialize and configure the repository, configure a jackrabbit journal and quartz, and how to test the cluster.

Prerequisites for Clustering

Before you begin the process of clustering your servers, there are a few tasks that you need to do and some specific requirements that must be met in order to successfully implement a Pentaho deployment on a Tomcat or JBoss cluster.

 
Requirement Description
Make sure that all of your application nodes are set up with identical configurations and BA deployments. Your application nodes all need the same configurations and BA deployments installed already in order for clustering to work.
Establish a load balancer. This will make sure that computing resources are spread evenly among the nodes.
Each node and the load balancer must be time-synchronized via NTP. All machines that make up the cluster have to have the same system time. If they do not, execution times of objects will be effected.
You must run only one node per machine (or NIC). It is possible to run multiple application servers on each node with a modified configuration, but this scenario does not offer any benefit for load balancing (performance) or hardware failover (redundancy), and therefore is not covered in this guide. Refer to your application server's clustering documentation for more information.
You must use either Tomcat 6.0 /7.0 or JBoss EAP 6.1.x/6.2.x. You may be able to use this guide as a basic blueprint for configuring other application servers or versions of Tomcat and JBoss for a clustered environment, but Pentaho support will not be able to assist you if you run into any problems with the BA or DI Servers.
You must have permission to install software and modify service configurations. If you do not have permissions, you must have access to someone at your company who does have the correct permission levels - typically root access.
Only the Pentaho BA Server will be deployed to the cluster. It is possible to modify the configuration to deploy other WARs or EARs. However, for ease of testing and support, Pentaho only supports deployment of the pentaho and pentaho-style WARs to the cluster.
You must use a single repository location. Most people use a database-based solution  repository. Keep in mind that you are not clustering the database server in this procedure, only the application server. 
You must have sticky sessions enabled. This will tie your session to a single node.

Initialize and Configure Repository

After you have determined that your systems meet all of the requirements listed in the checklist, you need to first initialize and then configure the repository for clustering. Finally, there are a few steps to take in order to verify your clustering setup, before you move on to setting up the jackrabbit journal.

  1. Initialize your database using the steps in the appropriate article for your system. Initialize Repository has sections for PostgreSQL, MySQL, MS SQL Server, and Oracle databases.
  2. After you have initialized your database, you will need to configure the data connections to the BA Repository. Specify Connections walks you through the steps for JDBC and JNDI connections for PostgreSQL, MySQL, and Oracle.
  3. The next step is to configure your repository using the appropriate tasks in the Configure Repository article.
  4. After you have initialized and configured your repository, you should clean up these files by following these steps.
    • Locate the ...biserver-ee/tomcat directory and remove all files and folders from the temp folder.

    • Locate the ...biserver-ee/tomcat directory and remove all files and folders from the work folder.

    • Locate the ...biserver-ee/pentaho-solutions/system/jackrabbit/repository directory and remove all files and folders from the final repository folder.

    • Locate the ...biserver-ee/pentaho-solutions/system/jackrabbit/repository directory and remove all files and folders from the workspaces folder.

You now have a configured repository and are ready to move to the next step for clustering.

Configure Jackrabbit Journal

These directions explain how to set up the Jackrabbit journal for your cluster. Make sure that each node has a unique ID.

  1. Locate the repository.xml file in the .../bi-server/pentaho-solutions/system/jackrabbit directory and open it with any text editor.
  2. Scroll to the bottom of the file and replace the section that begins with <!-- Run with a cluster journal --> with the correct code for your type of database repository.
  3. Save and close the file.

For PostgreSQL only:


<!--
Run with a cluster journal
-->
<Cluster id="Unique_ID">
    <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
      <param name="revision" value="${rep.home}/revision.log"/>
      <param name="url" value="jdbc:postgresql://HOSTNAME:PORT/jackrabbit"/>
      <param name="driver" value="org.postgresql.Driver"/>
      <param name="user" value="jcr_user"/>
      <param name="password" value="password"/>
      <param name="databaseType" value="postgresql"/>
      <param name="janitorEnabled" value="true"/>
      <param name="janitorSleep" value="86400"/>
      <param name="janitorFirstRunHourOfDay" value="3"/>
    </Journal>
</Cluster>

For MySQL only:

<!--
Run with a cluster journal
-->
<Cluster id="Unique_ID">
    <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
      <param name="revision" value="${rep.home}/revision.log"/>
      <param name="url" value="jdbc:mysql://HOSTNAME:PORT/jackrabbit"/>
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="user" value="jcr_user"/>
      <param name="password" value="password"/>
      <param name="schema" value="mysql"/>
      <param name="databaseType" value="mysql"/>
      <param name="janitorEnabled" value="true"/>
      <param name="janitorSleep" value="86400"/>
      <param name="janitorFirstRunHourOfDay" value="3"/>
    </Journal>
</Cluster>

 For Oracle only:

<!--
Run with a cluster journal
-->  

<Cluster id="Unique_ID">
    <Journal class="org.apache.jackrabbit.core.journal.OracleDatabaseJournal">
        <param name="revision" value="${rep.home}/revision.log" />
        <param name="url" value="jdbc:oracle:thin://HOSTNAME:PORT/di_jackrabbit"/>
        <param name="driver" value="oracle.jdbc.OracleDriver"/>
        <param name="user" value="jcr_user"/>
        <param name="password" value="password"/>
        <param name="schema" value="oracle"/>
        <param name="janitorEnabled" value="true"/>
        <param name="janitorSleep" value="86400"/>
        <param name="janitorFirstRunHourOfDay" value="3"/>
     </Journal>
</Cluster>"

For MS SQL Server only:

<!--
Run with a cluster journal
-->
<Cluster id="Unique_ID">
    <Journal class="org.apache.jackrabbit.core.journal.MSSqlDatabaseJournal">
      <param name="revision" value="${rep.home}/revision.log"/>
      <param name="url" value="jdbc:sqlserver://HOSTNAME:PORT;databaseName=jackrabbit"/>
      <param name="driver" value="com.microsoft.sqlserver.jdbc.SQLServerDriver"/>
      <param name="user" value="jcr_user"/>
      <param name="password" value="password"/>
      <param name="schema" value="mssql"/>
      <param name="janitorEnabled" value="true"/>
      <param name="janitorSleep" value="86400"/>
      <param name="janitorFirstRunHourOfDay" value="3"/>
    </Journal>
</Cluster>

Jackrabbit journaling is now set up for your BA cluster. The Jackrabbit Wiki has additional information about journaling. Next, you need to cluster the quartz tables to avoid duplicate scheduling on each node.

Configure Quartz

There are a few edits that you will need to make in the quartz.properties file to configure Quartz to work with your cluster.

org.quartz.scheduler.instanceId = AUTO

  1. Locate the quartz.properties file in the .../bi-server/pentaho-solutions/system/quartz directory and open it with any text editor.
  2. Find the org.quartz.scheduler.instanceId = INSTANCE_ID line and change INSTANCE_ID to AUTO.
  3. org.quartz.scheduler.instanceId = AUTO
    
  4. Find the #_replace_jobstore_properties section and change the default value of org.quartz.jobStore.isClustered to true as shown.
  5. #_replace_jobstore_properties
    
    org.quartz.jobStore.misfireThreshold = 60000
    org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
    org.quartz.jobStore.useProperties = false
    org.quartz.jobStore.dataSource = myDS
    org.quartz.jobStore.tablePrefix = QRTZ5_
    org.quartz.jobStore.isClustered = true
    
  6. Add this line just after the org.quartz.jobStore.isClustered = true line.

    
    org.quartz.jobStore.clusterCheckinInterval = 20000
    

Quartz is now configured for your cluster. The Quartz Configuration Reference has additional information about clustering with Quartz.

Start and Test the Cluster

Follow the below instructions to start the cluster and verify that it is working properly.

  1. Start the solution database.
  2. Start the application server on each node.
  3. Make sure that the load balancer is able to ping each node.
  4. Repeat for each node that you have set up.
  5. Test the cluster by accessing the BA Server through the load balancer's IP address, hostname, or domain name. Begin whatever test procedure you have designed for this scenario.