The Pentaho 8.2 Enterprise Edition delivers a wide range of features and improvements, from new streaming and Spark capabilities in PDI to big data enhancements and cloud data security. Pentaho 8.2 also continues to improve the Pentaho platform experience by introducing new features and improvements.
New Python Executor Step
The Python Executor step incorporates the robust scripting capabilities and algorithms of the CPython scripting language into your transformations. This new PDI step is especially useful for data scientists and data engineers who want to leverage machine learning and deep learning methods, model management strategies, and integration with data science notebooks.
With native support for Pandas dataFrames and NumPy arrays, the Python Executor step can read data from various sources, modify and derive values from the data, then provide the output as a set of PDI fields. The step features two methods for executing a script: running the script file from a local or hosted location, or manually embedding the script inside the step.
Access to HCP from PDI
You can now access the Hitachi Content Platform (HCP) distributed storage system from PDI's Virtual File System (VFS) browser. Within HCP, access control lists (ACLs) grant user privileges to perform various file operations. Namespaces are used for logical groupings, access, and object metadata (such as retention and shred settings). Learn more about how to set up access to HCP from PDI.
Streaming Data Improvements
Pentaho Data Integration (PDI) features new steps adapted to the Spark engine in the Adaptive Execution Layer (AEL) and access to Advanced Message Queuing Protocol (AMQP) streaming data.
Increased Spark Capabilities in PDI. The Spark steps are now customized to use the native Spark APIs. Spark APIs are designed to leverage the advanced Spark engine which is designed for both faster processing and distribution of hardware resources. Learn more about Spark on AEL in PDI.
AMQP Enhancements in PDI. The Advanced Message Queuing Protocol (AMQP) provides powerful connectivity for producing or consuming live streaming data in Pentaho. You can use the new AMQP Consumer and AMQP Producer transformation steps to build transformations and message queues for IoT data processing as events occur. These steps feature integration with, and secure connectivity to, AMQP message sources, data streams, or monitor alerts, whether on-site or in the cloud.
Push-based Streaming for Dashboards. You can now create a Pentaho streaming data service. With CTools, you can use this data service to develop a dashboard to display your streaming data. The streaming data is pushed through the data service into your dashboard. Learn more about streaming analytics, streaming data services, and streaming dashboard development.
Improved Data Operation
PDI 8.2 includes more custom data analyzers, an updated execution status interface, and OpenJDK support.
New Data Lineage Analyzers. PDI now includes the following custom metaverse step and entry analyzers for data lineage tracking:
- Hadoop File Input
- Hadoop File Output
- Spark Submit
To view the full list of steps and entries with customs data lineage analyzers, see Data Lineage.
Improved Execution Status Monitoring Window. The PDI Status page, used for viewing remotely executed and scheduled transformation and jobs details, has been improved for ease of use. The page now has clear graphics featuring controls for running, resuming, pausing, and stopping a transformation or job.
OpenJDK Support. Pentaho now supports both Oracle JDK 8 and OpenJDK 8. This support extends to the Adaptive Execution Layer (AEL). When using AEL with Amazon EMR, you no longer need to install Oracle JDK 8 to run in OpenJDK 8. See Pentaho software requirements for Java Runtime Environment (JRE) to learn more.
Minor Business Analytics and Data Integration Improvements
In Analyzer, the addition of new filters provides the ability to compare datasets. In PDI, the additional support for metadata injection and an improved JSON step contribute to platform stability and an overall better usability experience.
Analyzer Comparison Filters on Numeric Levels: You can now use comparison filters on numeric dimension levels to filter data for a more focused view. These filters include Greater Than, Less Than, Greater Than or Equals, Less Than or Equals, and Between. For example, you can display the Sales measure with a focus on just customers aged between 20 and 40. See Create a Comparison Filter on a Numeric Level for more information.
Expanded Metadata Injection Support. You now can inject metadata into any field in the following Pentaho Data Integration (PDI) steps:
Learn more about PDI steps supporting metadata injection.
JSON Enhancements. The JSON Input step now features a new Select Fields window for specifying what fields you want to extract from your source file. The window displays the structure of the source JSON file. Each field in the structure is displayed with a checkbox for you to indicate if it should be extracted from the file. You can also search within the structure for a specific field.
PDI Steps Removed. PDI transformation steps and job entries for SAP, Paleo, and OpenERP are deprecated in Pentaho 8.2. You can now find these steps and entries in the Deprecated folder of the Design tab in the Explore pane of the PDI Client.