Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Using Table input to Table output steps with AEL for managed tables in Hive

Parent article

If you are using managed tables in Hive and want to join a Table input step to a Table output step, use the following workflow when executing on AEL to ensure correct processing. This workflow includes creating separate transformations for the steps and then joining the transformations using a job entry.

See Hive for further configuration information when using Hive with Spark on AEL.

Create separate input and output KTRs

Follow the steps below to create separate transformations to process managed tables in Hive.

NoteDepending on the size of your managed tables, use the noted alternative PDI transformation steps to maximize processing efficiency.

Procedure

  1. Select File New Transformation in the PDI client window to create a new transformation.

    A new canvas opens.
  2. On the Design tab, click Input and then double-click Table input.

    The Table input step appears on the canvas.
    NoteAs a best practice for smaller managed input tables, use the Copy rows to result step. For larger managed input tables, instead use the Set files in result step.
  3. Enter your connection and option information in the Table input step.

  4. Select File Save As then enter a name for the file, such as Table_In. Save the file.

  5. Select File New Transformation in the PDI client window to create a new transformation.

    A new canvas opens.
  6. On the Design tab, click Output and then double-click Table output.

    The Table output step appears on the canvas.
    NoteAs a best practice for smaller managed input tables, use the Get rows from result step. For larger managed input tables, instead use the Get files from result step.
  7. Enter your configuration information for the target table in the Table output step.

  8. Click File Save As then enter a name for the file, such as Table_Out. Save the file.

Results

You have now created separate transformation steps for data processing. Proceed to Create a job to join the KTRs.

Create a job to join the KTRs

Follow the steps below to create a job to process managed tables in Hive.

Procedure

  1. Select File New Job in the PDI client window to create a new job.

    A new canvas opens.
  2. On the Design tab, click General and then double-click Start.

    The Start entry appears on the canvas.
  3. Under General, double-click Transformation.

    The Transformation entry appears on the canvas and is connected by a hop from the Start entry.
  4. Double-click the Transformation entry.

    The Transformation entry dialog box appears.
  5. Browse to your saved Table input KTR file, then enter a name, such as Table_In. Click OK to save the entry.

  6. Under General, double-click Transformation.

    The Transformation entry appears on the canvas and is connected by a hop from the previous Transformation entry.
  7. Double-click the Transformation entry.

    The Transformation entry dialog box appears.
  8. Browse to your saved Table output KTR file, then enter a name such as Table_Out. Click OK to save the entry.

  9. (Optional) Add a Dummy entry joined by an error hop to each Transformation entry to handle any false results.

  10. Under General, double-click Success.

    The Success entry appears on the canvas and is connected by a hop from the Transformation entry.
  11. Click File Save As and enter a name for the file. Save it.

  12. Press Run to execute the job.

    The following example illustrates the job on the canvas:PDI sample job

Results

The content of your managed table in Hive was correctly processed using the job.