Secure impersonation can be implemented when you connect to a Hadoop cluster with the PDI client, depending on the options you select. This article explains optional manual and advanced configurations for secure impersonation on the Pentaho Server. For an overview of secure impersonation, refer to Setting Up Big Data Security.
The following sections guide you through the optional manual setup and advanced configurations:
- Manually configuring secure impersonation parameters
- Configuring MapReduce jobs (Windows-only)
- Connecting to a Cloudera Impala database (Cloudera-only)
- Next Steps
The following requirements must be met to use secure impersonation:
- The cluster must be secured with Kerberos, and the Kerberos server used by the cluster must be accessible to the Pentaho Server.
- The Pentaho computer must have Kerberos installed and configured. See Set Up Kerberos for Pentaho for instructions.
- A Pentaho driver for your Hadoop cluster must be installed and a named connection in the PDI client created. See Connecting to a Hadoop cluster with the PDI client for instructions.
Manually configuring secure impersonation parameters
The mapping types value in the config.properties file turns secure impersonation on or off. The mapping types supported by the Pentaho Server are disabled and simple. When set to disabled or left blank, the Pentaho Server does not use authentication. When set to simple, the Pentaho users can connect to the Hadoop cluster as a proxy user.
Perform the following steps to manually set up secure impersonation for your Hadoop cluster and PDI:
Stop the Pentaho Server.
Navigate to the <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<user-defined connection name> directory and open the config.properties file with a text editor.NoteThis filepath and the config.properties file are created when you set up your named connection. See Connecting to a Hadoop cluster with the PDI client for instructions.
Modify the config.properties file with the values in the following table:
Parameter Value pentaho.authentication.default.kerberos.principal
pentaho.authentication.default.kerberos.keytabLocation Set the Kerberos keytab. You only need to set the password or the keytab, not both. pentaho.authentication.default.kerberos.password Set the Kerberos password. You only need to set the password or the keytab, not both. pentaho.authentication.default.mapping.impersonation.type simple pentaho.authentication.default.mapping.server.credentials.kerberos.principal
pentaho.authentication.default.mapping.server.credentials.kerberos.keytabLocation You only need to set the password or the keytab, not both. pentaho.authentication.default.mapping.server.credentials.kerberos.password You only need to set the password or the keytab, not both. pentaho.oozie.proxy.user Add the proxy user's name if you plan to access the Oozie service through a proxy. Otherwise, leave it set to oozie.
In this table,
exampleUser@EXAMPLE.COMis provided as a sample of how you would specify your proxy user. If you have key-value pairs in your existing config.properties file that are not security related, merge those settings into the file.
Save and close the config.properties file.
Restart the Pentaho Server
Configuring MapReduce jobs
Perform the following steps to modify the mapred-site.xml file for secure impersonation:
Navigate to the <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<user-defined connection name> directory and open the mapred-site.xml file with a text editor.
Add the following two properties to the mapred-site.xml file:
<property> <name>mapreduce.app-submission.cross-platform</name> <value>true</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Save and close the file.
Connecting to a Cloudera Impala database
Perform the following steps to update your connection to the secure Cloudera Impala database:
Download the Cloudera Impala JDBC driver for your operating system from the Cloudera web site https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.htmlNoteSecure impersonation with Impala is only supported with the Cloudera Impala JDBC driver. You may have to create an account with Cloudera to download the driver file.
Extract the ImpalaJDBC41.jar file from the downloaded zip file into the folder <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdh61/lib. The ImpalaJDBC41.jar file is the only file to extract from the downloaded file.
Connect to a secure CDH cluster.If you have not set up a secure cluster, complete the procedure in the article Set up Pentaho to Connect to a Cloudera Cluster to set up a secure cluster.
Start the PDI Client and choose to add a new transformation.
Click the View tab, then right-click Database Connections and choose New.
In the Database Connection dialog box enter the values from the following table:
Field Value Connection Name User-defined name Connection Type Cloudera Impala Host Name Hostname Database Name default Port Number 21050
Click Options in the left pane of the Database Connection dialog box and enter the parameter values as shown in the following table:
Parameter Value KrbHostFQDN The fully qualified domain name of the Impala host KrbServiceName The service principal name of the Impala server KrbRealm The Kerberos realm used by the cluster
Click Test when your settings are entered.
When you save your changes in the repository and your Hadoop cluster is connected to the Pentaho Server, you are now ready to use secure impersonation to run your transformations and jobs from the Pentaho Server.
See Set up the Pentaho Server to connect to a Hadoop cluster for instructions on any further advance configurations you may need to perform to connect your Hadoop cluster to the Pentaho Server.