Skip to main content
Pentaho Documentation

Set Up the Adaptive Execution Layer (AEL)

Pentaho uses the Adaptive Execution Layer (AEL) for running transformations in different engines. AEL adapts steps from a transformation developed in PDI to native operators in the engine you select for your environment, such as Spark in a Hadoop cluster. The AEL daemon builds a transformation definition in Spark, which moves execution directly to the cluster.

Your installation of Pentaho 8.0 includes the AEL daemon which you can set up for production to run on your clusters. After you configure the AEL daemon, the PDI client communicates with both your Spark cluster and the AEL daemon, which lives on a node of your cluster to launch and run transformations. 

Before you can select the Spark engine through run configurations, you will need to configure AEL for your system and your workflow. Depending on your deployment, you may need to perform additional configuration tasks, such as setting up AEL in a secure cluster.

Before You Begin

You must meet the following requirements for using the AEL daemon and operating the Spark engine for transformations:

The dependency on Zookeeper has been removed from Pentaho 8.0. If you installed AEL for Pentaho 7.1, you must delete the adaptive-execution folder and follow the Pentaho 8.0 Installation instructions to use AEL with Pentaho 8.0.

Pentaho 8.0 Installation

When you install the Pentaho Server, the AEL daemon is installed in the folder data-integration/adaptive-execution. This folder will be referred to as 'PDI_AEL_DAEMON_HOME'.

Spark Client

The Spark client is required for the operation of the AEL daemon. The Apache Spark 2.x client is recommended. Perform the following steps to install the Spark client.

  1. Download the Spark client, spark-2.1.0-bin-hadoop2.7.tgz, from http://spark.apache.org/downloads.html.
  2. Extract it to a folder where the daemon can access it. This folder will be referred to as the variable 'SPARK_HOME'

Pentaho Spark Application

The Pentaho Spark application is built upon PDI's Kettle engine, which allows transformations to run unaltered within a Hadoop cluster. Some third-party plugins, such as those plugins available in the Pentaho Marketplace, may not be included by default within the Pentaho Spark application. To address this issue, we include the Spark Application builder tool so you can customize the Pentaho Spark application by adding or removing components to fit your needs. 

After running the Spark application builder tool, copy and unzip the resulting pdi-spark-driver.zip file to an edge node in your Hadoop cluster. The unpacked contents consist of the data-integration folder and the pdi-spark-executor.zip file, which includes only the required libraries needed by the Spark nodes themselves to execute a transformation when the AEL daemon is configured to run in YARN mode. Since this zip file needs to be accessible by all nodes in the cluster, it must be copied into HDFS.

Perform the following steps to run the Spark application build tool and manage the resulting files.

  1. Ensure that you have configured your PDI client with all the plugins that you will use.
  2. Navigate to the design-tools/data-integration folder and locate the spark-app-builder.bat (Windows) or the spark-app-builder.sh (Linux).
  3. Execute the Spark application builder tool script. A console window will display and the pdi-spark-driver.zip file will be created in the data-integration folder (unless otherwise specified by the -outputLocation parameter described below). 

    The following parameters can be used when running the script to build the pdi-spark-driver.zip.

    Parameter Action
    –h or --help Displays the help.
    –e or --exclude-plugins Specifies plugins from the data-integration/plugins folder not to exclude from the assembly.
    –o or --outputLocation Specifies the output location.

     

  4. The pdi-spark-driver.zip file contains a data-integration folder and pdi-spark-executor.zip file. Copy the data-integration folder to the edge node where you want to run the AEL daemon.  
  5. Copy the pdi-spark-executor.zip file to the HDFS node where you will run Spark and extract the contents. This folder will be referred to as 'HDFS_SPARK_EXECUTOR_LOCATION'

For the cluster nodes to use the functionality provided by PDI plugins when executing a transformation, they must be installed into the PDI client prior to generating the Pentaho Spark application. If you install other plugins later, you must regenerate the Pentaho Spark application.

Configuring the AEL Daemon for Local Mode

Configuring the AEL daemon to run in Spark local mode is not supported, but can be useful for development and debugging.

You can configure the AEL daemon to run in Spark local mode for development or demonstration purposes. This will let you build and test a Spark application on your desktop with sample data, then reconfigure the application to run on your clusters. To configure the AEL daemon for a local mode, complete the following steps:

  1. Navigate to the data-integration/adaptive-execution/config directory and open the application.properties file.
  2. Set the following properties for your environment:
  • Set the sparkHome property to the Spark 2 filepath on your local machine.
  • Set the sparkApp property to the data-integration directory.
  • Set the hadoopConfDir property to the directory containing the *site.xml files.
  1. Save and close the file.
  2. Add a SPARK_HOME/kettleConf directory. If you are not using Apache Spark, then you will have to create this folder.
  3. Run the daemon.sh command from the command line interface.

Configuring the AEL Daemon in YARN Mode

Typically, the AEL daemon is run in YARN mode for production purposes. In YARN mode, the driver application launches and delegates work to the YARN cluster. The pdi-spark-executor application must be installed on each of the YARN nodes.

The daemon.sh script is only supported in UNIX-based environments.

To configure the AEL daemon for a YARN production environment, complete the following steps.

  1. Navigate to the adaptive-execution/config directory and open the application.properties file.
  2. Set the following properties for your environment:
Property Value
sparkHome The Spark 2 file path on your cluster
sparkApp The data-integration directory
hadoopConfDir The directory containing the *site.xml files. This property value tells Spark which Hadoop/YARN cluster to use. You can download the directory containing the *site.xml files using the cluster management tool, or you can set the hadoopConfDir property to the location in the cluster.
hadoopUser The user ID the Spark application will use, if you are not using security.

sparkMaster

yarn

assemblyZip

hdfs:$HDFS_SPARK_EXECUTOR_LOCATION
  1. Save and close the file.
  2. Add a SPARK_HOME/kettleConf directory. If you are not using Apache Spark, then you will need to create this folder.
  3. Copy the pdi-spark-executor.zip file to your HDFS cluster, as in the example below.

$ hdfs dfs put pdi-spark-executor.zip /opt/pentaho/pdi-spark-executor.zip

  1. Run the pdi-daemon startup script, daemon.sh from the command line interface.

You can manually start the AEL daemon by running the daemon.sh. By default, this startup script is installed in the folder data-integration/adaptive-execution, which is referred to as the variable 'PDI_AEL_DAEMON_HOME'.

Perform the following steps to manually to start the AEL daemon.

  1. Navigate to the SPARK_HOME/kettleConf directory.
  2. Run the daemon.sh script.

The startup script supports the following commands:

Command Action
daemon.sh Starts the daemon as a foreground  process. 
daemon.sh start Starts the daemon as a background process.  Logs are written to the PDI_AEL_DAEMON_HOME/daemon.log file.
daemon.sh stop Stops the daemon. 
daemon.sh status Reports the status of the daemon.

Configuring AEL with Spark in a Secure Cluster

The AEL daemon works in an unsecured cluster by default. You can secure communication channels between the PDI client and the AEL daemon server and also between the AEL daemon server and the Spark driver using SSL (Secure Sockets Layer), Kerberos, or both. If your AEL daemon server and your cluster machines are in a secure environment like a data center, you may only want to configure a secure connection between the PDI client and the AEL daemon server.

Authentication with Kerberos

To enable security, you can configure the AEL daemon to work in a secure cluster using impersonation. Kerberos authentication can be used with AEL in two ways: with the connection from the client to the AEL daemon and with the Spark submit process.

Setup a Secure Client Connection

Complete the following steps to set up a secure connection from the PDI client to the AEL daemon:

  1. Download and install Kerberos server. Refer to Set Up Kerberos for Pentaho for further details on installing the Kerberos server.
  2. Create a keytab and principal to use for your client access.
  3. Open the PDI client and choose Edit > Edit the kettle.properties file.
  4. Add the properties KETTLE_AEL_PDI_DAEMON_KEYTAB and KETTLE_AEL_PDI_DAEMON_PRINCIPAL and set the values to the location of the keytab and principal, respectively.
  5. Restart the PDI client.

Setup a Secure Server Connection

Complete the following steps to set up a secure connection from the AEL daemon to the cluster:

  1. Create a keytab and server principal to use for your server access.
  2. Navigate to the adaptive-execution/config/application.properties file and open it with a text editor. Set the values for your environment as in the following table:
Parameter Value
keytabLocation Path to the keytab used for the Kerberos principal.
kerberosPrincipal Path to the Kerberos service principal that has the authority to impersonate another user.
disableProxyUser The AEL daemon can impersonate a proxy user when authenticating to your secure cluster. Set to true to disable the proxy user. The acting user will then be the Kerberos service principal. The default value is false.

You can now test your AEL configuration by creating a run configuration using the Spark engine. Refer to Run Configurations for more details. 

Using SSL Encryption

Complete the following steps to set up SSL connections for the PDI client and the Pentaho Server:

  1. Set up SSL security by following the instructions in the article Enable SSL in the Pentaho Server with a Certificate Authority.
  2. Import your certificate to the Java keystore on the machine where the PDI client is installed. If the Pentaho Server is installed on a different machine, import the certificate to the Java keystore on that machine.
  3. At the following prompts, enter a new password and enter ‘Y’:

Enter keystore password: 
Trust this certificate?

The certificate is now trusted by the PDI client and the Pentaho Server.

Configure the Daemon for SSL

Complete the following to configure the AEL daemon for SSL:

  1. Navigate to the adaptive-execution/config/application.properties file and open it with a text editor.
  2. Set the values for your environment as in the following table:
Parameter Value
server.ssl.enabled true
server.ssl.key-store /users/myusername/pentaho/mycertificate.p12
server.ssl.key-store-type PKCS12
server.ssl.key-store-password Changeit
server.ssl.key-password Changeit

The first time you start the AEL daemon, it will prompt you to enter the SSL keystore and key passwords. 

 You can now test your AEL configuration by creating a run configuration using the Spark engine. Refer to Run Configurations for more details.