Learn about R Step Improvements, New DI Server administration features, enhanced HDP and Cloudera Kerberos support, Upgrade Utility changes.
Pentaho Data Integration 5.2 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data.
New Streamlined Data Refinery Feature
The Streamlined Data Refinery (SDR) is a simplified, ad hoc ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to the BA Server for report designers to use in Analyzer.
R Script Executor Step Improvements
The R Script Executor, Weka Forecasting, and Weka Scoring steps form the core of the Data Science Pack and transforms PDI into a powerful, predictive analytics tool. The R Script Executor step allows you to incorporate R scripts in your transformation so that you can include R-based statistical programming in your data flow. In PDI Version 5.2 you can now "plug and play" R scripts, without extra customization. Now you can pass incoming field metadata to the output field metadata, use a more intuitive user interface to run scripts by rows or by batches, and test scripts.
New DI Server Administration Features
Porting content from one environment to another and performing general DI Repository maintenance is easier with the introduction of the new Purge Utility. The Purge Utility permanently purges the repository of versions of shared objects, such as database connection information, jobs, and transformations. You can also turn DI Repository versioning and comment capturing capabilities on and off.
Kerberos Security Support for CDH 5.1 and HDP 2.1
If you are already using Kerberos to authenticate access to a Cloudera Distributed Hadoop 5.1 or Hortonworks Data Platform 2.1 cluster, with a little extra configuration, you can also use Kerberos to authenticate Pentaho DI users who need to access those clusters.
New Marketplace Plugins
Pentaho Marketplace continues to grow with many more of your contributions. Pentaho Marketplace is a home for community-developed plugins and a place where you can contribute, learn, benefit from, and connect to others. New contributions include:
- LookupTimeDimensionStep: Looks up and creates an entry on a data warehouse dimension time table and returns the ID.
- Probabilistic Row Distributions: Contains a collection of Row Distribution plugins for PDI that use probabilistic methods for determining the distribution of rows.
- PDI Groovy Console: Adds a Groovy console to the Help menu that has helper methods and classes that interact with the PDI environment.
- Gremlin Script Step: Provides a Gremlin script step for graph pipeline processing.
Improved Upgrade Experience
Upgrading PDI is easier because it is no longer a manual process. You can now upgrade from 5.1.x to 5.2 using the same upgrade utility used for patch releases.
There is now only one upgrade guide instead of two.
Minor Functionality Changes
To learn more about minor functionality changes that might impact your upgrade or migration experience, see the PDI 5.1 to 5.2 Functionality Changes article.