Pentaho 7.0 delivers a wide range of new capabilities, from enhanced big data security features to time-saving advanced data exploration functionality. The highlight of the release is the ability to visually explore your data anywhere in the pipeline while working in Pentaho Data Integration (PDI). Pentaho 7.0 also continues to build on the investments made in big data security, a unified Pentaho Server, metadata injection, and many other improvements to the enterprise platform.
Inspect Your Datanywhere in the Pipeline
You can now spot check your transformation data in-flight using PDI, without having to switch in and out of tools. By previewing the data and using an array of available visualizations, you can test, refine and prep the data set for specific data models before publishing it as a source for analytical reports. Visualizing data in-flight and during the data-prep process is unique to Pentaho. Learn more about in Inspecting Your Data.
Improved Big Data Security
The new Pentaho Server now promotes a more secure big data integration by supporting Kerberos impersonation. This allows multiple PDI users to access Kerberos-enabledClouderaHadoop clusters as multiple authenticated Hadoop users. The Pentaho server also works with Cloudera Sentry to control access to specific data within Hadoop – based on a user’s role in the organization – to enforce enterprise data authorization,Kerberos authentication and secure impersonation on Hadoop clusters. Learn more about Setting Up User Security.
Easier Installation with the Single Pentaho Server
Pentaho’s Business Analytics (BA) Server and Data Integration (DI) Server have been merged and renamed the Pentaho Server. Since the BA and DI components are now integrated, the complexity of installing and configuring two separate servers is eliminated. Learn more about Installation and Configuration of the Pentaho Server.
Enhanced Spark Support
The Spark Submit job entry now supports Spark code written in Scala and Python. Learn more about the Spark Submit job entry. The Spark Submit job entry now supports Kerberos authentication on Cloudera (CDH) distributions of Hadoop clusters. Learn more about Using Kerberos Authentication with Spark Submit. Spark SQL databases can now be used as data sources when working with Hortonworks (HDP) distributions of Hadoop clusters. Learn more about Spark SQL data source support.
Metadata Injection Support
You can now inject metadata into any field of the following Pentaho Data Integration steps:
- Big Data - Avro Input, Cassandra Input, Cassandra Output, CouchDB Input, Hadoop File Input, Hadoop File Output, HBase Input, HBase Output, HBase Row Decoder, MapReduce Input, MapReduce Output, MongoDB Input, MongoDB Output,
- Bulk Loading - Greenplum Load, MySQL Bulk Loader, Oracle Bulk Loader, Vertical Bulk Loader
- Data Warehouse - Combination Lookup / Update
- Flow - Annotate Stream, Append Streams, ETL Metadata Injection, Filter Rows, Shared Dimension, Switch / Case
- Input - Get Table Names, JSON Input
- Job - Get Variables
- Joins - Join Rows (Cartesian product), Merge Join, Merge Rows, Multiway Merge Join, Sorted Merge, XML Join
- Output - Insert / Update, Synchronize After Merge, Update
- Statistics - Memory Group By
- Transform - Add XML, Replace in String
- Utility - If Field Value is Null, Null If
- Validation - Data Validator
Learn more about the ETL Metadata Injection step.
We have added documentation on how to add metadata injection support to steps you have created as plugins. For more information, see Add Metadata Injection Support to Your Step.
Improved Repository Management
Now with just a single click from the PDI client, you can create and manage your repositories with easy to follow steps. Learn more about Work with Repositories.
Agile BI Plugin
Agile BI will no longer ship in a Pentaho distribution. Agile BI is still available as a Marketplace plugin.