Skip to main content
Pentaho Documentation

Use Kerberos Authentication to Access a MapR Cluster

Overview

Learn how to configure impersonation and spoofing to access MapR Cluster components.

Setting up impersonation and spoofing is part of configuring Pentaho to connect to a Hadoop cluster.

By default, the Pentaho Server admin user executes transformations and jobs.  But, if your transformation or job needs to run on a MapR cluster or access its resources, the Pentaho Server admin might not have an account on the cluster or have the right permissions and accesses.

Using impersonation helps solve this issue.  With impersonation, you indicate that a transformation should run using the permissions and accesses of a different Hadoop user.  Impersonation leverages the Hadoop user’s existing permissions and accesses to provide access to components that run on MapR clusters such as MapReduce, Pig,Oozie,Sqoop, Hive, or a directory on HDFS.

There are a couple of limitations.

  • Only the MapR super user can impersonate other users.
  • With Linux, you use impersonation to specify different proxy users for HDFS, MapReduce, Pig, Sqoop, Oozie, and Pentaho Map Reduce components.
  • With Windows you can only specify a single, spoofed user.

Instructions for impersonation or spoofing depend on your Spoon client’s operating system.

Prerequisites for Impersonation and Spoofing

Prerequisites for impersonation and spoofing include setting up and configuring the MapR distribution, setting up user accounts, then storing authorization provider credentials and overriding the Kerberos ticket cache.

Set Up MapR Nodes

Set up a MapR cluster and apply MapR security.  See MapR documentation for installation and security instructions.  We also recommend that you review the instructions on setting up a MapR client.

Make MapR the Active Hadoop Distribution

Make MapR your active Hadoop distribution and configure it. See Set Active Hadoop Distribution Set Active Hadoop Distribution and Additional Configuration for MapR Shims Additional configuration for MapR Shims for more detail.

Set Up User Accounts

  • Set up user accounts for MapR and client nodes.
  • Set up accounts for the users you want to impersonate or spoof on each MapR node. The usernames, passwords, UID, and GID should be the same on each node.
  • For Linux Spoon client and Pentaho Server nodes, set up accounts for the users you want to impersonate The usernames, passwords, UID, and GID should be the same on each node. You do not have to do this for Windows Spoon client or Pentaho Server nodes.
  • On both Windows and Linux nodes, add the impersonated or spoofed users to additional groups, if needed.  Do this if users require access to resources restricted to members of a group.  Ensure the group names and GIDs are correct and are the same for each node.

Store Authorization Provider Credentials

Store authorization provider credentials so you do not have to retype usernames, passwords, or other credential information each time you need them for a transformation step or job entry. 

  1. On the Pentaho Server, open the config.properties file in server/pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop distribution] is the name of the Hadoop distribution, such as mapr31.

  1. Set the Kerberos Principal property.
authentication.kerberos.principal=user@omnicorp.com
  1. Decide whether to authenticate using a password or a keytab file.
  • To authenticate with a password, set the authentication.kerberos.password property.
authentication.kerberos.password=userPassword

Use Kettle encryption to store the password more securely.       

  • To authenticate with a keytab file, set the authentication.kerberos.keytabLocation property to the keytab file path.
authentication.kerberos.keytabLocation=/home/Server14/Kerberos/username.keytab

If both the authentication.kerberos.password and authentication.kerberos.keytabLocation properties are set, the authentication.kerberos.password property takes precedence. 

  1. Assign an ID to the authentication credentials that you just specified (Kerberos Principal and password or keytab), by setting the authentication.kerberos.id property.
authentication.kerberos.id=mapr-kerberos
  1. To use authentication credentials you just specified, set the authentication.superuser.provider to the authentication.kerberos.id.
authentication.superuser.provider=mapr-kerberos
  1. Save and close the file.
  2. Repeat this process on Spoon.  The config.properties file is in design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

Override Kerberos Ticket Cache

If you are logged into the Spoon host machine and your account has already been authenticated using Kerberos, indicate that you want to use the authentication information that is in the config.properties file instead, not the one that has already been saved in the Kerberos ticket cache. 

  1. Open the mapr.login.conf file on the host.  By default, the file is located in opt/mapr/conf.
  2. In the hadoop_hybrid section, set useTicketCache and renewTGT variables to false, like this:​
hadoop_hybrid{
org.apache.hadoop.security.login.KerberosBugWorkAroundLoginModule optional
      useTicketCache=false
      renewTGT=false 
  1. Save and close the mapr.login.conf file.

Set Up Impersonation on Linux Client Node

To set up impersonation on a Linux client node, specify proxy users in the core-site.xml file on the Pentaho Server and the core-site.xml file in Spoon.

  1. On the Pentaho Server node, open the core-site.xml file in pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop configuration] is the name of the Hadoop distribution, such as mapr31.

  1. Set the usernames in the <value> tag for the proxy users as needed.  The username you use should be recognized by every node in the MapR cluster.
Component Proxy User Property
HDFS pentaho.hdfs.proxy.user
MapReduce pentaho.mapreduce.proxy.user
Pig pentaho.pig.proxy.user
Sqoop pentaho.sqoop.proxy.user
Oozie pentaho.oozie.proxy.user

Here is an example of modified code.

<configuration>
<property>
<name>pentaho.hdfs.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.mapreduce.proxy.user</name>
<value>bmichaels</value>
</property>
<property>
<name>pentaho.pig.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.sqoop.proxy.user</name>
<value>cclarke</value>
</property>
<property>
<name>pentaho.oozie.proxy.user</name>
<value>jdoe</value>
</property>
  1. Remove comment tags from proxy properties you want to use.
  2. Save and close the file.
  3. Repeat this process (Steps 1 through 4) in Spoon.  The core-site.xml file is located in data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

Set Up Spoofing on Windows Client Node

To set up spoofing on a Windows client node, indicate the spoofed user in the core-site.xml file. 

  1. On the Pentaho Server node, open the core-site.xml file in pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop distribution] is the name of the Hadoop distribution, such as mapr31.

  1. Add the following text to the file:
<property>
  <name>hadoop.spoofed.user.uid</name>
  <value>{UID}</value>
</property>
<property>
  <name>hadoop.spoofed.user.gid</name>
  <value>{GID}</value>
</property>
<property>
  <name>hadoop.spoofed.user.username</name>
  <value>{id of user who has UID}</value>
</property>
  • Replace {id of user who has UID} with the username the principal in the config.properties file.
  • Replace {UID} with the hadoop.spoofed.user.username UID.
  • Replace {GID} with the hadoop.spoofed.user.username GID.
  1. Save and close the file.
  2. Repeat this process (Steps 1 through 3) for Spoon.  In Spoon, the core-site.xml file is in data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].