Skip to main content
Pentaho Documentation

Set up Pentaho to Connect to a MapR Cluster

Overview

Learn how to configure Pentaho to connect to a MapR cluster.

These instructions explain how to configure Pentaho's MapR shim so Pentaho can connect to a working MapR cluster.

Before You Begin

Do these things before you configure the shim.

Task Description
Verify Support Check the Component Reference to verify that your Pentaho version supports your version of the MapR cluster.
Set Up a MapR cluster Pentaho can connect to secured and unsecured MapR Clusters.
  • Configure a MapR cluster.  See MapR's documentation if you need help.
  • Install any required services and and service client tools.
  • Test the cluster.
Set up MapR Client
  • Install the MapR client, then test to make sure it is properly installed on your computer and is able to connect to and browse your MapR cluster. For more information on how to do this, visit the MapR site.
  • Set the MAPR_HOME environment variable to the installation location of the MapR client. 

If you are installing MapR 4.0.1 on Windows, use version 4.0.1.31009GA or later as your MapR client.  If you are using MapR 4.1.0, use version 4.1.0.31175GA  of the MapR client.  The software can be obtained from MapR.

Review the Special Topics Section Read the Special Topics section to review special configuration instructions for your version of MapR.

If you are connecting to a secured MapR cluster there are a few more things you need to do.

Task Description
Secure the MapR Cluster with Kerberos Pentaho supports Kerberos authentication.  You will need to:
  • Configure Kerberos security on the cluster, including the Kerberos Realm, Kerberos KDC, and Kerberos Administrative Server. 
  • Configure the name, data, secondary name, job tracker, and task tracker nodes to accept remote connection requests.
  • Set up Kerberos for name, data, secondary name, job tracker, and task tracker nodes if you are have deployed Hadoop using an enterprise-level program.
  • Add the user account credential for each Pentaho user that should have access to the Hadoop cluster to the Kerberos database.  Make sure there is an operating system user account on each node in the Hadoop cluster for each user that you want to add to the Kerberos database. Add operating system user accounts if necessary. Note that the user account UIDs must be greater than the minimum user ID value (min.user.id). Usually, the minimum user ID value is set to 1000.
Set up Kerberos on your Pentaho computers Instructions for how to do this appear in Set up Kerberos on Your Pentaho Computer.
Set up Impersonation
  • If you will be using impersonation, you will also need to complete the steps in MapR Impersonation article.
  • If you plan to use spoofing or impersonation to connect to the MapR client, specify the appropriate User ID (UID), Group ID (GID), and name as indicated in the MapR documentation(NOTE: Make sure that the account that you use for spoofing is created the client and on each node.  Each "spoofing" account should have the same UID and GID as the one on the client.)

Edit Configuration Files on Cluster

There are no edits that need to be made to the *-site.xml configuration files on the cluster.

Configure Pentaho Component Shims

You must configure the shim for each of the following that you want to connect to the MapR Cluster.

  • Spoon (PDI Client)
  • Pentaho Data Integration (DI) Server
  • Business Analytics (BA) Server (including Analyzer and Pentaho Interactive Reporting).
  • Pentaho Report Designer (PRD)
  • Pentaho Metadata Editor (PME)

We recommend you configure and test the Spoon shim first since it has handy features for testing your configuration.  Then, you can either copy the tested Spoon configuration files to the other components, making changes as necessary, or you can go through these instructions again for each component. 

If you do not not plan to connect to the cluster from Spoon, you can configure the connection to another component first instead.

Here are the shim configuration steps.

  • Locate the Shim Directories
  • Select Correct Shim
  • Download Shim from Support Portal (Optional Step)
  • Copy Configuration Files from Cluster to Shim
  • Edit Shim Configuration Files
  • Connect to MapR Cluster from Spoon

Locate the Pentaho Big Data Plugin and Shim Directories

Shims and other parts of the Pentaho Adaptive Big Data Layer are in the Pentaho Big Data Plugin directory.  The path to this directory differs by component. You need to know the locations of this directory, in each component, to complete shim configuration and testing tasks.

<pentaho home> is the directory where Pentaho is installed.

Components Location of Pentaho Big Data Plugin Directory
Spoon <pentaho home>/design-tools/data-integration/plugins/pentaho-big-data-plugin
DI Server <pentaho home>/server/data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin
BA Server <pentaho home>/server/biserver-ee/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin
Pentaho Report Designer <pentaho home>/design-tools/report-designer/plugins/pentaho-big-data-plugin
Pentaho Metadata Editor <pentaho home>/design-tools/metadata-editor/plugins/pentaho-big-data-plugin

Shims are located in the pentaho-big-data-plugin/hadoop-configurations directory.  Shim directory names consist of a three or four letter Hadoop Distribution abbreviation followed by the Hadoop Distribution's version number.  The version number does not contain a decimal point.  For example, the shim directory named cdh54 is the shim for the CDH (Cloudera Distribution for Hadoop), version 5.4.  Here is a list of the shim directory abbreviations.

Abbreviation Shim
cdh Cloudera's Distribution of Apache Hadoop
emr Amazon Elastic Map Reduce
hdp Hortonworks Data Platform
mapr MapR

Select the Correct Shim

For the location of the pentaho-big-data-plugin directory listed in these instructions see Locate the Shim Directories.

Although Pentaho often supports one or more versions of a Hadoop distribution, the download of the Pentaho BA suite only contains the latest, supported, Pentaho-certified version of the shim.  The other supported versions of shims can be downloaded from the Pentaho Customer Support Portal

Before you begin, verify that the shim you want is supported by your version of Pentaho shown in the Components Reference.

  1. In a shell tool, go to the pentaho-big-data-plugin/hadoop-configurations directory.  Shim directories are listed there. 
  2. If the shim you want to use is there, you can go to the next step: Copy the Configuration Files from Cluster to Shim.
  3. Go to the Pentaho Customer Support Portal Knowledge Base's Downloads page.  You are prompted to log in if you have not done so already.

 knowledge_base_downloads.png

  1. Enter the name of the shim you want in the search box.  Select the shim from the search results.
  2. Read the instructions, then download the shim.  You might need to scroll down to see the download link.
  3. Unzip the downloaded shim package to the pentaho-big-data-plugin/hadoop-configurations directory.
  4. Go to Copy the Configuration Files from Cluster to Shim.

Copy the Configuration Files from Cluster to Shim

If you are using a cluster, copying configuration files from the cluster to the shim keeps the configuration files in sync and reduces troubleshooting errors.

The location of the pentaho-big-data-plugin directory listed in these instructions is referenced in the Locate the Shim Directories section of this document.

  1. Back up the existing MapR shim files in the pentaho-big-data-plugin/hadoop-configurations/maprxx directory. 
  2. Copy the following configuration files from the MapR cluster to pentaho-big-data-plugin/hadoop-configurations/maprxx. They will overwrite the existing files.
  • hbase-site.xml
  • hdfs-site.xml
  • hive-site.xml
  1. Copy the following configuration files from the MapR cluster to the Hadoop directory under the MapR Client installed on your computer.  

The Winows path to the MapR client is usually C:\opt\mapr\hadoop\hadoop-2.x.x\etc\hadoop.  In Linux the path is usually /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop

  • core-site.xml
  • mapred-site.xml
  • yarn-site.xml
  1. Edit the shim configuration files.

Edit the Shim Configuration Files

The location of the pentaho-big-data-plugin directory listed in these instructions is referenced in the Locate the Shim Directories section of this document.

You need to verify or change authentication, oozie, hive, mapreduce, and yarn settings. Changes are made in these shim configuration files:

  • config.properties
  • mapred-site.xml
  • yarn-site.xml

Edit config.properties (Windows)

By default, the config.properties file is configured for unsecured clusters.  Verify that these values are properly set.

  1. Go to pentaho-big-data-plugin/hadoop-configurations/maprxx and open config.properties.
  2. Verify these values are set.  If they are not set, change them so that they match what follows.
Parameter Values
windows.classpath This value should match your local MapR client tools installation directory.  Set the windows.classpath parameter equal to these:
  • Hadoop classpath
  • Pentaho installation directory path
  • MapR shim directory path

The MapR shim might fail to load correctly if the drive letter in the Windows classpath or library path has a capital letter. This is a known issue with MapR software.  If this happens, use the lower case instead, like this: file:///c:/opt/mapr.

The value of windows.classpath parameter should include lib/hadoop2-windows-patch-08072014.jar as a first entry in the string, the Hadoop classpath of MapR client on the current machine, a full directory path where MapR shim is located under each Pentaho component, and this entry: file:///c:/opt/mapr/lib. To determine your hadoop classpath, execute the hadoop classpath command and use those values instead. Convert any directory paths to Windows URL format.  The following is an example. 

EXAMPLE:

windows.classpath=lib/hadoop2-windows-patch-08072014.jar,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common,file:///C://opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib,file:///C:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce,file:///C:/opt/mapr/sqoop/sqoop-1.4.5,file:///C:/opt/mapr/sqoop/sqoop-1.4.5/lib,file:///C:/contrib/capacity-scheduler,file:///C:/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401,file:///C:/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401/lib,file:///C:​/opt/mapr/lib​
windows.library.path
windows.library.path=C:\\opt\\mapr\\lib
pentaho.oozie.proxy.user You do not need to verify this unless you plan to access the oozie service through a proxy.  If so, add the proxy user's name here.
  1. Save and close the file.

Edit config.properties (Linux and Mac)

To configure the config.properties file, do these things.

  1. Go to pentaho-big-data-plugin/hadoop-configurations/maprxx and open config.properties.
  2. Verify these values are set.  If they are not set, change them so that they match what follows.
Parameter Values
linux.classpath Edit this value to match your local MapR client tools installation directory. Set the linux.classpath parameter equal to these:
  • Hadoop classpath
  • Pentaho installation directory path
  • MapR shim directory path

The linux.classpath should contain the Hadoop classpath of MapR client on the current machine, a full directory path where MapR shim is located under each Pentaho component, and this entry: /opt/mapr/lib. To determine your hadoop classpath, execute the hadoop classpath command and use those values instead. the following is an example.

EXAMPLE:

linux.classpath=/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib,/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce,/opt/mapr/sqoop/sqoop-1.4.5,/opt/mapr/sqoop/sqoop-1.4.5/lib,/contrib/capacity-scheduler,/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401,/opt/Pentaho/design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/mapr401/lib,/opt/mapr/lib​
linux.library.path
linux.library.path=/opt/mapr/lib
pentaho.oozie.proxy.user You do not need to verify this unless you plan to access the oozie service through a proxy.  If so, add the proxy user's name here.
  1. Save and close the file.

Edit mapred-site.xml

Make changes to indicate where job history logs are stored and to allow mapreduce jobs to run on different platforms.

  1. Go to the Hadoop directory in your MapR Client and open mapred-site.xml.
  2. Make the following changes.
Parameter Value
mapreduce.jobhistory.address Set this to the place where job history logs are stored.
mapreduce.app-submission.cross-platform

This property allows mapreduce jobs to run on windows and linux platforms, and vice versa.

<property>
  <name>mapreduce.app-submission.cross-platform</name>
  <value>true</value>
</property>

 

  1. Save and close the file.

Edit yarn-site.xml

Make changes to these yarn parameters, if necessary.

  1. Go to the Hadoop directory in your MapR Client and open yarn-site.xml.
  2. Make the following changes, if needed. 
Parameter Values
yarn.application.classpath
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*
:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*
:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*
:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:/usr/share/aws/emr/emrfs/lib/*
:/usr/share/aws/emr/lib/*:/usr/share/aws/emr/auxlib/*:$PWD/*:%PWD%/*
</value>
</property>
yarn.resourcemanager.hostname Update the hostname in your environment .
yarn.resourcemanager.address Update the hostname and port for your environment.
yarn.resourcemanager.admin.address Update the hostname and port for your environment.
  1. Save and close the file.

Set MAPR_HOME

Set the MAPR_HOME environment variable to the installation location of the MapR client, then restart your computer.

Next Step

Go to Connect Pentaho Components to MapR Cluster for instructions on how to configure and test your connection.

Connect Pentaho Components to MapR Cluster

Creating a connection to the cluster involves setting an active shim, then configuring and testing the connection to the cluster.  Making a shim active means it is used by default when you access a cluster.  When you initially install Pentaho, no shim is active by default.  You must choose a shim to make active before you can connect to a cluster.   Only one shim can be active at a time.  The way you make a shim active, as well as the way you configure and test the cluster connection differs by Pentaho component.

Create and Test a Connection to the Cluster in Spoon

To connect to the MapR cluster from Spoon involves two tasks:

  • Set the Active Shim in Spoon
  • Configure and Test the Cluster Connection

Set the Active Shim in Spoon

Set the active shim when you want to connect to a Hadoop cluster the first time, or when you want to switch clusters.  Only one shim can be active at a time.

  1. Start Spoon.
  2. Select Hadoop Distribution... from the Tools menu.

HadoopDistribution.png

  1. In the Hadoop Distribution window, select the Hadoop distribution you want.
  2. Click OK.
  3. Stop, then restart Spoon.

Configure and Test the Cluster Connection

Provide connection details for the cluster and services you will use, such as the hostname for HDFS or the URL for Oozie.  Then, you can use a built-in tool to test your configuration to find and troubleshoot common configuration issues, such as wrong hostnames and user permission errors.

Connection settings are set in the Hadoop cluster window.  You can get to the settings from several places, but in these instructions, you will get the Hadoop cluster window from the View tab in a transformation or job.

View Tab

  1. In Spoon, create a new job or transformation or open an existing one.
  2. Click the View tab.

clusterss.png

  1. Right-click the Hadoop cluster folder, then click New.  The Hadoop cluster window appears.  
  2. Configure and Test the Hadoop Cluster connection.

Configure and Test Connection

Once you have opened the Hadoop cluster window from a step or entry, the View tab, or the Repository Explorer window, configure the connection.

  1. Enter information in the Hadoop cluster window.  You can get most of the information you need from your Hadoop Administrator.

As a best practice, use Kettle variables for each connection parameter value to mitigate risks associated with running jobs and transformations in environments that are disconnected from the repository. 

HadoopClusterWindow.png

Option Definition
Cluster Name Name that you assign the cluster connection.
Use MapR Client Indicates that this connection is for a MapR cluster.  If this box is checked, the fields in the HDFS and JobTracker sections are disabled because those parameters are not needed to configure MapR.
Hostname (in HDFS section) Hostname for the HDFS node in your Hadoop cluster.
Port (in HDFS section) Port for the HDFS node in your Hadoop cluster.  
Username (in HDFS section) Username for the HDFS node.
Password (in HDFS section) Password for the HDFS node.
Hostname (in JobTracker section) Hostname for the JobTracker node in your Hadoop cluster.  If you have a separate job tracker node, type in the hostname here. Otherwise use the HDFS hostname.
Port (in JobTracker section) Port for the JobTracker in your Hadoop cluster.  Job tracker port number; this cannot be the same as the HDFS port number.
Hostname (in ZooKeeper section) Hostname for the Zookeeper node in your Hadoop cluster.  Supply this only if you want to connect to a Zookeeper service.
Port (in Zookeeper section) Port for the Zookeeper node in your Hadoop cluster.  Supply this only if you want to connect to a Zookeeper service.
URL (in Oozie section) Oozie client address.  Supply this only if you want to connect to the Oozie service.
  1. Click the Test button.  Test results appear in the Hadoop Cluster Test window.  If you have problems, see Troubleshoot Connection Issues to resolve the issues, then test again.

HadoopClusterTest.png

  1. If there are no more errors, congratulations!  The connection is properly configured.  Click the Close button to the remaining Hadoop Cluster Test window.
  2. When complete, click the OK button to close the Hadoop cluster window.

Copy Spoon Shim Files to Other Pentaho Components

Once your connection has been properly configured on Spoon, copy configuration files to the shim directories in other Pentaho components. 

The location of the pentaho-big-data-plugin directory listed in these instructions is referenced in the Locate the Shim Directories section of this document.

  1. Copy following configuration files from the pentaho-big-data-plugin/hadoop-configurations/maprxx directory in Spoon to pentaho-big-data-shim/maprxx on the DI Server, BA Server, PRD, or PME. 
  • hbase-site.xml
  • hdfs-site.xml
  • hive-site.xml
  1. Copy the core-site.xml, mapred-site.xml, and yarn-site.xml files from the Hadoop directory under the MapR Client on your computer to same place in the MapR Client directory structure on the DI Server, BA Server, PRD, or PME.
  2. Complete the tasks in the Connect Other Components to MapR Cluster section to connect and test.

Create and Test a Connection to the Cluster in Other Components

These instructions explain how to connect the DI Server, BA Server, PRD, and PME to the MapR Cluster. 

  • Set the Active Shim on PRD, PME, and the DI and BA Servers
  • Create and test the cluster connections.

Set the Active Shim on PRD, PME, and the DI and BA Servers

Modify a properties file to set the active shim for the DI Server, BA Server, PRD, and PME.

The location of the pentaho-big-data-plugin directory listed in these instructions is referenced in the Locate the Shim Directories section of this document.

  1. Stop the component.
  2. Locate the pentaho-big-data-plugin directory for your component. 
  3. Go the hadoop-configurations directory.  For more information on directory names, see Locate the Pentaho Big Data Plugin and Shim Directories.
  4. Go back to the pentaho-big-data-plugin directory and open the plugin.properties file.
  5. Set the active.hadoop.configuration property to the directory name of the shim you want to make active.  Here is an example:
active.hadoop.configuation=cdh54
  1. Save and close the plugin.properties file.
  2. Restart the component.

Create and Test Connections

Connection tests appear in the following table.

Component Test
DI Server Create a transformation in Spoon and run it remotely.
BA Server Create a connection to the cluster in the Data Source Wizard.
PME Create a connection to the cluster in PME.
PRD Create a connection to the cluster in PRD.

 

Once you've connected to the cluster and its services properly, provide connection information to users who need access to the cluster and its services.  Those users can only obtain access from computers that have been properly configured to connect to the cluster.

Here is what they need to connect.

  • Hadoop Distribution and Version of the Cluster
  • HDFS, JobTracker, Zookeeper, and Hive2/Impala Hostnames, IP Addresses and Port Numbers
  • Oozie URL (if used)
  • Users also require the appropriate permissions to access the directories they need on HDFS.  This typically includes their home directory and any other required directories.

They might also need more information depending on the job entries, transformation steps, and services they use.  Here's a more detailed list of information that your users might need from you.

General Notes

Sqoop "Unsupported major.minor version" Error

If you are using Pentaho 6.0 and the Java version on your cluster is older than the Java version that Pentaho uses, you must change Pentaho's JDK so it is the same major version as the JDK on the cluster. The JDK that you install for Pentaho must meet the requirements in the Supported Components matrix. To learn how to download and install the JDK read this article

Version-Specific Notes

The following are special topics for MapR.

Drive Letter Casing Issue (Windows)

The MapR shim might fail to load correctly if the drive letter in the Windows classpath or library path has a capital letter. This is a known issue with MapR software.  If this happens, use the lower case instead, like this: file:///c:/opt/mapr.

MapR 4.1 Notes

The following notes address issues with MapR 4.1.

Impala Support Note

Pentaho does not support connections to Impala on a secured MapR 4.1 cluster.  For more information, please see these references:

Troubleshoot Cluster and Service Configuration Issues

General Configuration Problems

The issues in this section explain how to resolve common configuration problems. 

Shim and Configuration Issues

Symptoms Common Causes Common Resolutions

No shim

  • Active shim was not selected.
  • Shim was installed in the  wrong place.
  • Shim name was not entered correctly in the plugin.properties file.
  • Verify that the plugin name that is in the plugin.properties file matches the directory name in the pentaho-big-data-plugin/hadoop-configurations directory
  • Make sure the shim is installed in the correct place.
  • Check the instructions for your Hadoop distribution in the Set Up Pentaho to Connect to an Apache Hadoop Cluster article for more details on how to verify the plugin name and shim installation directory.
Shim doesn't load
  • Required licenses are not installed.
  • You tried to load a shim that is not supported by your version of Pentaho.
  • If you are using MapR, the client might not have been installed correctly. 
  • Configuration file changes were made incorrectly.
  • Verify the required licenses are installed and have not expired.
  • Verify that the shim is supported by your version of Pentaho. Find your version of Pentaho, then look for the corresponding support matrix for more details. For example, if you are running Pentaho 6.0, then see this Components Reference topic which is the support matrix for Pentaho 6.0.
  • Verify that configuration file changes were made correctly.  Contact your Hadoop Administrator or see the Set Up Pentaho to Connect to an Apache Hadoop Cluster article.
  • If you are connecting to MapR, verify that the client was properly installed.  See MapR documentation for details.
  • Restart Spoon, then test again.
  • If this error continues to occur, files might be corrupted.  Download a new copy of the shim from the Pentaho Customer Support Portal.
The file system's URL does not match the URL in the configuration file. Configuration files (*-site.xml files) were not configured properly.  Verify that the configuration files were configured correctly.  Verify that the core-site.xml file is configured correctly.  See the instructions for your Hadoop distribution in the Set Up Pentaho to Connect to an Apache Hadoop Cluster article for details.

Connection Problems

Symptoms Common Causes Common Resolutions
Hostname incorrect or not resolving properly.
  • No hostname has been specified.
  • Hostname/IP Address is incorrect.
  • Hostname is not resolving properly in the DNS.
  • Verify that the Hostname/IP address is correct.
  • Check the DNS to make sure the Hostname is resolving properly. 
Port name is incorrect.
  • No port number has been specified.
  • Port  number is incorrect.
  • Port number is not numeric.
  • Verify that the port number is correct.
  • If you don't have a port number, determine whether your cluster has been enabled for high availability. If it has, then you do not need a port number.
Can't connect.
  • Firewall is a barrier to connecting.
  • Other networking issues are occurring.
  • Verify that a firewall is not impeding the connection and that there aren't other network issues. 

Directory Access or Permissions Issues

Symptoms Common Causes Common Resolutions

Can't access directory.

  • Authorization and/or authentication issues.
  • Directory is not on the cluster.
  • Make sure the user has been granted read, write, and execute access to the directory. 
  • Ensure security settings for the cluster and shim allow access.
  • Verify the hostname and port number are correct for the Hadoop File System's namenode. 

Can't create, read, update, or delete files or directories

Authorization and/or authentication issues.

  • Make sure the user has been authorized execute access to the directory. 
  • Ensure security settings for the cluster and shim allow access.
  • Verify that the hostname and port number are correct for the Hadoop File System's namenode. 
Test file cannot be overwritten.  Pentaho test file is already in the directory.
  • A file with the same name as the Pentaho test file is already in the directory.  The test file is used to make sure that the user can create, write, and delete in the user's home directory.
  • The test was run, but the file was not deleted.  You will need to manually delete the test file.  Check the log fo the test file name.

Oozie Issues

Symptoms Common Causes Common Resolutions

Can't connect to Oozie.

  • Firewall issue.
  • Other networking issues.
  • Oozie URL is incorrect.
  • Verify that the Oozie URL was correctly entered.
  • Verify that a firewall is not impeding the connection. 

Zookeeper Problems

Symptoms Common Causes Common Resolutions

Can't connect to Zookeeper.

  • Firewall is hindering connection with the Zookeeper service.
  • Other networking issues.
  • Verify that a firewall is not impeding the connection. 

Zookeeper hostname or port not found or doesn't resolve properly.  

  • Hostname/IP Address and Port name is missing or is incorrect.
  • Try to connect to the Zookeeper nodes using ping or another method.
  • Verify that the Hostname/IP Address and Port numbers are correct.