The Pentaho 8.0 Enterprise Edition delivers a wide range of new features, from near real-time monitoring using streaming ingestion in PDI to enhanced Adaptive Execution Layer (AEL) functionality and expanded data exploration capabilities. Pentaho 8.0 also continues to build on the investments made in big data security, run configurations and ease-of-use improvements to the ways you can set up and use your data.
AEL: Enhanced and Simplified
The Adaptive Execution Layer (AEL) feature now includes compatibility for Spark libraries packaged with Cloudera, Hortonworks and Apache distributions. Additionally, AEL now features a faster startup time, enhanced security features and improved performance when executing transformation (.ktr) files.
Kafka and Streaming Ingestion in PDI
PDI now supports Kafka streaming with both data ingestion (Kafka Consumer step) and data publishing (Kafka Producer step). You can leverage Kafka streaming for use in analysis, monitoring, and near-real time alerting. You can connect to a streaming data source such as Kafka then continually ingest streaming data. For example, PDI with Kafka can consume clickstream events from web applications, trade data, and point of sales records. These streaming ingestion features are aimed for data architects and engineers, ETL developers, and IT administrators. PDI also supports secure communication and user authentication with Kafka.
Big Data Security: Named Clusters and Knox Support
Pentaho now supports Apache Knox on the Hortonworks distribution of Hadoop, providing a secure, single point of access to Hadoop components on a cluster. This new integration allows customers to securely and transparently leverage PDI when they are using Knox-protected Hortonworks clusters for their big data processing. When used in conjunction with Apache Ranger, you can control user level access in your cluster. In addition to access control, Ranger lets you create an audit trail of user access and actions.
Create Pentaho Server Run Configurations
Some ETL activities are lightweight, such as loading in a small text file to write out to a database. For these activities, you can run your transformation locally using the default Pentaho engine. Some ETL activities are more demanding, containing many steps calling other steps or a network of transformation modules. For these activities, you can set up a run configuration dedicated for running transformations on the Pentaho Server.
Using worker nodes, you can now elastically scale PDI transformations and jobs (i.e. work items) easily and securely, while coordinating and monitoring them at the same time. To monitor the work item status, you can check a web-based interface or use the existing functionality of Pentaho Operations Mart or SNMP log data. This functionality allows PDI workloads to run effectively at scale, coordinating and monitoring the items sent to the worker nodes. When the execution is complete, the completed work items are returned to the Pentaho Server.
Filters for Inspecting Your Data
When inspecting your data, you can further explore transformation data by applying filters and viewing the results in your visualizations. You can add filters by dragging fields to the Filters panel or by performing actions within the visualization. You can also add multiple filters, with each individual filter further refining the data. The Filters panel tooltip displays a summary of the applied filters. Filters can be applied in both the Stream and Model Views. Also apply filters to flat tables, charts, and pivot tables. When you exit the Data Explorer, all your the visualizations and corresponding filters are remembered.
Additional Big Data Formats
Pentaho 8.0 adds support for Avro and Parquet data formats in PDI. For big data users, the improved Avro and Parquet input/output transformation steps eases the process of gathering raw data from various sources and moving that data into the Hadoop ecosystem to create a useful, summarized data set for analysis. Organizations can now design big data pipelines with optimal read performance and storage usage by leveraging the familiar, easy to use PDI drag and drop interface.The Avro and Parquet steps can be used in transformations running on the Kettle engine or the Spark engine via Adaptive Execution Layer (AEL).
Use the following links to learn more about these steps: