Skip to main content
Pentaho Documentation

Use Impersonation to Access a MapR Cluster

Overview

Learn how to configure impersonation and spoofing to access MapR Cluster components.

By default, the DI Server admin user executes transformations and jobs.  But, if your transformation or job needs to run on a MapR cluster or access its resources, the DI Server admin might not have an account there or have the right permissions and accesses.

Using impersonation helps solve this issue.  With impersonation, you indicate that a transformation should run using the permissions and accesses of a different Hadoop user.  Impersonation leverages the Hadoop user’s existing permissions and accesses to provide access to components that run on MapR clusters such as mapreduce, pig, oozie, sqoop, hive, or a directory on HDFS.

There are a couple of limitations.

  • Only the MapR super user can impersonate other users.
  • With Linux, you use impersonation to specify different proxy users for HDFS, Mapreduce, Pig, Sqoop, Oozie, and Pentaho Map Reduce components.
  • With Windows you can only specify a single, spoofed user.

Instructions for impersonation or spoofing depend on your Spoon client’s operating system.

Prerequisites for Impersonation and Spoofing

Prerequisites for impersonation and spoofing include setting up and configuring the MapR distribution, setting up user accounts, then storing authorization provider credentials and overriding the Kerberos ticket cache.

Set Up MapR Nodes

Set up a MapR cluster and apply MapR security.  See MapR documentation for installation and security instructions.  We also recommend that you review the instructions on setting up a MapR client.

Make MapR the Active Hadoop Distribution

Make MapR your active Hadoop distribution and configure it. See Set Active Hadoop Distribution Set Active Hadoop Distribution and Additional Configuration for MapR Shims Additional configuration for MapR Shims for more detail.

Set Up User Accounts

  • Set up user accounts for MapR and client nodes.
  • Set up accounts for the users you want to impersonate or spoof on each MapR node. The usernames, passwords, UID, and GID should be the same on each node.
  • For Linux Spoon client and DI Server nodes, set up accounts for the users you want to impersonate The usernames, passwords, UID, and GID should be the same on each node.  You do not have to do this for Windows Spoon client or DI Server nodes.
  • On both Windows and Linux nodes, add the impersonated or spoofed users to additional groups, if needed.  Do this if users require access to resources restricted to members of a group.  Ensure the group names and GIDs are correct and are the same for each node.

Store Authorization Provider Credentials

Store authorization provider credentials so you do not have to retype usernames, passwords, or other credential information each time you need them for a transformation step or job entry. 

  1. On the DI Server, open the config.properties file in server/data-integration-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop distribution] is the name of the Hadoop distribution, such as mapr31.

  1. Set the Kerberos Principal property.
authentication.kerberos.principal=user@omnicorp.com
  1. Decide whether to authenticate using a password or a keytab file.
  • To authenticate with a password, set the authentication.kerberos.password property.
authentication.kerberos.password=userPassword

Use Kettle encryption to store the password more securely.       

  • To authenticate with a keytab file, set the authentication.kerberos.keytabLocation property to the keytab file path.
authentication.kerberos.keytabLocation=/home/Server14/Kerberos/username.keytab

If both the authentication.kerberos.password and authentication.kerberos.keytabLocation properties are set, the authentication.kerberos.password property takes precedence. 

  1. Assign an ID to the authentication credentials that you just specified (Kerberos Principal and password or keytab), by setting the authentication.kerberos.id property.
authentication.kerberos.id=mapr-kerberos
  1. To use authentication credentials you just specified, set the authentication.superuser.provider to the authentication.kerberos.id.
authentication.superuser.provider=mapr-kerberos
  1. Save and close the file.
  2. Repeat this process on Spoon.  The config.properties file is in design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

Override Kerberos Ticket Cache

If you are logged into the Spoon host machine and your account has already been authenticated using Kerberos, indicate that you want to use the authentication information that is in the config.properties file instead, not the one that has already been saved in the Kerberos ticket cache. 

  1. Open the mapr.login.conf file on the host.  By default, the file is located in opt/mapr/conf.
  2. In the hadoop_hybrid section, set useTicketCache and renewTGT variables to false, like this:​
hadoop_hybrid{
org.apache.hadoop.security.login.KerberosBugWorkAroundLoginModule optional
      useTicketCache=false
      renewTGT=false 
  1. Save and close the mapr.login.conf file.

Set Hive Database Connection Parameters

To access Hive, you need to set several database connection parameters from within Spoon.

  1. Open the hive-site.xml file that is on the hive server host.  Note the values for the kerberos.principal and the sasl.qop.
  2. Close the hive-site.xml file.
  3. Start Spoon.
  4. In Spoon, open the Database Connection window.
  5. Click Options.  Add the following parameters and set them to the values that you noted in the hive-site.xml file.​
  • sasl.qop

  • principal

The principal typically has a mapr prefix before the name, like this:  mapr/mapr31.pentaho@mydomain

  1. Click OK to close the window.

Set Up Impersonation on Linux Client Node

To set up impersonation on a Linux client node, specify proxy users in the core-site.xml file.

  1. On the DI Server node, open the core-site.xml file in pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop configuration] is the name of the Hadoop distribution, such as mapr31.

  1. Set the usernames in the <value> tag for the proxy users as needed.  The username you use should be recognized by every node in the MapR cluster.
Component Proxy User Property
HDFS pentaho.hdfs.proxy.user
Mapreduce pentaho.mapreduce.proxy.user
Pig pentaho.pig.proxy.user
Sqoop pentaho.sqoop.proxy.user
Oozie pentaho.oozie.proxy.user

Here is an example of modified code.

<configuration>
<property>
<name>pentaho.hdfs.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.mapreduce.proxy.user</name>
<value>bmichaels</value>
</property>
<property>
<name>pentaho.pig.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.sqoop.proxy.user</name>
<value>cclarke</value>
</property>
<property>
<name>pentaho.oozie.proxy.user</name>
<value>jdoe</value>
</property>
  1. Remove comment tags from proxy properties you want to use.
  2. Save and close the file.
  3. Repeat this process in Spoon.  The core-site.xml file is located in data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

Set Up Spoofing on Windows Client Node

To set up spoofing on a Windows client node, indicate the spoofed user in the core-site.xml file. 

  1. On the DI Server node, open the core-site.xml file in pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].

[hadoop distribution] is the name of the Hadoop distribution, such as mapr31.

  1. Add the following to the file.
<property>
  <name>hadoop.spoofed.user.uid</name>
  <value>{UID}</value>
</property>
<property>
  <name>hadoop.spoofed.user.gid</name>
  <value>{GID}</value>
</property>
<property>
  <name>hadoop.spoofed.user.username</name>
  <value>{id of user who has UID}</value>
</property>
  • Replace {id of user who has UID} with the username the principal in the config.properties file.
  • Replace {UID} with the hadoop.spoofed.user.username UID.
  • Replace {GID} with the hadoop.spoofed.user.username GID.
  1. Save and close the file.
  2. Repeat these steps for Spoon.  In Spoon the core-site.xml file is in data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].