Troubleshooting AEL

Steps cannot run in parallel

If you are using the Spark engine to run a transformation with a step that cannot run in parallel, it generates errors in the log.

Some steps cannot run in parallel (on multiple nodes in a cluster), and will produce unexpected results. However, these steps can run as a coalesced dataset on a single node in a cluster. To enable a step to run as a coalesced dataset, add the step ID as a property value in the configuration file for using the Spark engine.

Get the step ID

Each PDI step has a step ID, a globally unique identifier of the step. Use either of the following two methods to get the ID of a step:

Method 1: retrieve the ID from the log

You can retrieve a step ID though the PDI client with the following steps:

Procedure

In the PDI client, create a new transformation and add the step to the transformation.
For example, if you needed to know the ID for the Select values step, you would add that step to the new transformation.
Set the log level to debug.
Execute the transformation using the Spark engine.
The step ID will display in the Logging tab of the Execution Results pane. For example, the log will display Selected the SelectValues step to run in parallel as a GenericSparkOperation, where SelectValues is the step ID.

Method 2: Retrieve the ID from the PDI plugin registry

If you are a developer, you can retrieve the step ID from the PDI plugin registry as described in Dynamically build transformations.

NoteIf you have created your own PDI transformation step plugin, the step ID is one of the annotation attributes that the developer supplies.

Add the step ID to the configuration file

The configuration file, org.pentaho.pdi.engine.spark.cfg, contains the forceCoalesceSteps property. The property is a pipe-delimited listing of all the IDs for the steps that should run with a coalesced dataset. Pentaho supplies a default set to which you can add IDs for steps that generate errors.

Perform the following steps to add another step ID to the configuration file:

Procedure

Navigate to the data-integration/system/karaf/etc folder and open the org.pentaho.pdi.engine.spark.cfg file.
Append your step ID to the forceCoalesceSteps property value list, using a pipe character separator between the step IDs.
Save and close the file.

Table Input step fails

If you run a transform using the Table Input step with a large database, the step will not complete. Use one of the following methods to resolve the issue:

Method 1: Load the data to HDFS before running the transform

Run a different transformation using the Pentaho engine to move the data to the HDFS cluster.
Then use HDFS Input to run the transformation using the Spark engine.

Method 2: Increase the driver side memory configuration

Navigate to the config/ folder and open the application.properties file.
Increase the value of the sparkDriverMemory parameter, then save and close the file.

User ID below minimum allowed

If you are using the Spark engine in a secured cluster and an error about minimum user ID occurs, the user ID of the proxy user is below the minimum user ID required by the cluster. See Cloudera documentation for details.

To resolve, change the ID of the proxy user to be higher than the minimum user ID specified for the cluster.

Hadoop version conflict

On an HDP cluster, if you receive the following message, your Hadoop library is in conflict and the AEL daemon along with the PDI client might stop working:

command hdp-select is not found, please manually *export HDP_VERSION* in spark-env.sh or current environment

To resolve the issue, you must export the HDP_VERSION variable using a command like the following example:

export HDP_VERSION=${HDP_VERSION:-2.6.0.3-8}

The HDP version number should match the HDP version number of the distribution on the cluster. You can check your HDP version with the hdp-select status hadoop-client command.

Hadoop libraries are missing

If you use the Spark libraries packaged with Cloudera and Hortonworks’ distributions, you must add the Hadoop libraries to the classpath with the SPARK_DIST_CLASSPATH environment variable because these distributions are not packaged with the Hadoop libraries.

Add the class path

The following command will add the libraries to the classpath:

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

You can add this command to the daemon.sh file so you do not have run this command every time you the start the AEL daemon.

Set Spark home variable

If you are using the Spark client from the Cloudera or Hortonworks Hadoop distributions, you may also receive the following log error:

Exception in thread "main" java.lang.NoSuchFieldError: TOKEN_KIND

If you received this log error, you must also complete the following steps for your Hadoop distribution:

Procedure

Download the Spark client for your Hadoop cluster distribution (Cloudera or Hortonworks).
Navigate to the adaptive-execution/config directory and open the application.properties file.
Set the sparkHome location to where Spark 2 is located on your machine.

Example for Cloudera:

sparkhome = /opt/cloudera/parcels/SPARK2/lib/spark2

Example for Hortonworks:

sparkhome =/opt/horton/SPARK2/lib/spark2

Spark libraries conflict with Hadoop libraries

In some cases, library versions contained in JARs from PDI, Spark, Hadoop, AEL, and/or Kettle plugins may conflict with one another, causing general problems where Spark libraries conflict with Hadoop libraries and potentially creating AEL-specific problems. To read more about this issue, including how to address it, see the article AEL and Spark Library Conflicts on the Pentaho Community Wiki.

JAR file conflict in Kafka Consumer step

When using the Kafka Consumer step with HDP 3.x on AEL Spark, there is a known conflict with the JAR file /usr/hdp/3.x/hadoop-mapreduce/kafka-clients-0.8.2.1.jar

Use one of the following solutions to resolve the JAR conflict.

On HDP 3.x do not set the SPARK_DIST_CLASSPATH variable before running the Adaptive Execution Layer daemon. Otherwise, there may be issues in other AEL components.

Exclude the JAR file from the path on SPARK_DIST_CLASSPATH with the spark-dist-classpath.sh script. Create the script with any text editor and include the following code:

#!/bin/sh
##
## helper script for setting up SPARK_DIST_CLASSPATH for AEL
## removes conflicting JAR files existing in HDP 3.x
## Using: call this the same way you use hadoop classpath, command, i.e.:
## export SPARK_DIST_CLASSPATH=$(spark-dist-classpath.sh)

# grab hadoop classpath
HCP=`hadoop classpath`

## expand it to grab all jar files
(
  for entry in `echo "$HCP" | sed -e 's/:/\n/g'` ; do
     ## clean up dirs ending with *
     entryCleaned=`echo "$entry" | sed -e 's/\*$//'`
     ## if dir, expand it
     if test -d $entryCleaned ; then
       find $entryCleaned  
     else
       echo "$entry"
     fi 
  done 
) | grep -v kafka-clients-0.8.2.1.jar |  paste -s -d: 

exit

Internet Address data type fails

When running an AEL transformation using an input step with the data type 'Internet Address' selected for a URL field, your transformation may not complete properly.

When you are using the Spark engine to run an AEL transformation, do not use the data type 'Internet Address' when entering a URL in a step. Instead, use the data type 'String' for the URL.

Message size exceeded

If you are using the Spark engine to run an AEL transformation and an error is generated indicating a decoded message was too big for the output buffer, you need to increase the maximum size (2 MB by default) of the message buffers for your AEL environment.

Perform the following steps to increase the message buffer limit:

Procedure

Stop the AEL daemon.
Navigate to the data-integration/adaptive-execution/config directory and open the application.properties file using a text editor.

Enter the following incoming WebSocket message buffer properties, setting the same value for each property:

Property	Value
daemon.websocket.maxMessageBufferSize	The maximum size (in bytes) for the message buffer on the AEL daemon. For example, to allocate a 4 MB limit, set the memory value as shown here: daemon.websocket.maxMessageBufferSize=4000000
driver.websocket.maxMessageBufferSize	The maximum size (in bytes) for the message buffer on the AEL Spark driver. For example, to allocate a 4 MB limit, set the memory value as shown here: driver.websocket.maxMessageBufferSize=4000000

Save and close the file.
Restart the AEL daemon.

Results

When the AEL daemon submits the AEL Spark driver application, it passes the driver’s maximum message buffer size value as part of the submit; then, when the driver application is started, it receives the maximum buffer size value sent by the daemon.

Running sub-transformations on Spark

To use a sub-transformation with Spark, you must use the Transformation Executor step. Transformations with these steps are not supported in Spark:

Simple Mapping (sub-transformation)
Mapping (sub-transformation)

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com.

Steps cannot run in parallel

Get the step ID

Method 1: retrieve the ID from the log

Method 2: Retrieve the ID from the PDI plugin registry

Add the step ID to the configuration file

Table Input step fails

Method 1: Load the data to HDFS before running the transform

Method 2: Increase the driver side memory configuration

User ID below minimum allowed

Hadoop version conflict

Hadoop libraries are missing

Add the class path

Set Spark home variable

Spark libraries conflict with Hadoop libraries

JAR file conflict in Kafka Consumer step

Internet Address data type fails

Message size exceeded

Running sub-transformations on Spark