Skip to main content
Pentaho Documentation

PDI Hadoop Job Workflow

PDI enables you to execute a Java class from within a PDI/Spoon job to perform operations on Hadoop data. The way you approach doing this is similar to the way would for any other PDI job. The specifically-designed job entry that handles the Java class is Hadoop Job Executor. In this illustration it is used in the WordCount - Advanced entry.

File:/hadoop_job_workflow.jpg

The Hadoop Job Executor dialog box enables you to configure the entry with a jar file that contains the Java class.

hadoopjobexecutor_wordcount.png

If you are using the Amazon Elastic MapReduce (EMR) service, you can Amazon EMR Job Executor. job entry to execute the Java class This differs from the standard Hadoop Job Executor in that it contains connection information for Amazon S3 and configuration options for EMR.

AmazonEMREntry.png