Pentaho 7.1 delivers a wide range of new capabilities, from enhanced big data features to time-saving execution and advanced data exploration functionality. The highlight of the release is the ability to run transformations with a Spark engine in Pentaho Data Integration (PDI). Pentaho 7.1 also continues to build on the investments made in big data security and ease-of-use improvements to the enterprise platform.
Smarter Data Processing using the Adaptive Execution Layer (AEL)
Pentaho uses the Adaptive Execution Layer (AEL) for running transformations in different engines. On CDH 5.10 and higher clusters, you can now run your transformations using a Spark execution engine. AEL builds a transformation definition for Spark, which moves execution directly to the cluster, leveraging Spark's ability to coordinate large amounts of data over multiple nodes. Learn more about the Adaptive Execution Layer.
Drill-down Deeper on Your Data In-Flight
Easily customize your model by drilling down into data in the visualization itself. When you double-click on drill-down fields in your visualization, these fields display in the Applied Filters panel. When you exit the inspection environment in PDI, all your tabs are remembered when you re-open the step or log on to PDI in a subsequent session. Learn more about these new features in Inspecting Your Data.
Azure HDInsight Cluster Configuration Now Supported
You can now use PDI to connect to a Microsoft Azure HDInsight cluster. The Azure HDInsight cluster is a Microsoft web service for big data processing and analysis that is an alternative to hosting in-house cluster computing. Pentaho supports both HDFS and WASB (Windows Azure Storage BLOB) storage. Learn more about connecting to Azure HDInsight.
Expanded Big Data Security
Pentaho Server support of Kerberos impersonation has been expanded to Hortonworks Hadoop clusters. This allows multiple PDI users to access Kerberos-enabled Hortonworks Hadoop clusters as multiple authenticated Hadoop users. The Pentaho Server controls access to specific data within Hadoop – based on a user’s role in the organization – to enforce enterprise data authorization, Kerberos authentication, and secure impersonation on Hadoop clusters. Learn more about Use Secure Impersonation to Access a Hortonworks Cluster.
Simpler PDI Sub-Transformations Specification
The following PDI entries and steps now have a simpler way of specifying related sub-transformations in the PDI client:
- Transformation (Job Entry)
- Job (Job Entry)
- Pentaho MapReduce
- ETL Metadata Injection
- Transformation Executor
- Job Executor
- Simple Mapping
- Single Threader
Previously, sub-transformations were specified through three separate fields depending on whether your files were in a repository or on a file system. Now, sub-transformations are specified through a single field with a Browse button. You can browse in either a repository or a file system.
Improved Monitoring of System Performance with DI Operations Mart
Monitor your Pentaho Data Integration activities with DI Operations Mart. The DI Operations Mart collects log data tables of your transformations and jobs in a PostgreSQL database as a data warehouse using a star schema. This data warehouse can be accessed with Pentaho Server tools to examine log reports, charts, and dashboards. Learn more about Data Integration Operations Mart.