Learn about new features in Pentaho 5.4.
Pentaho 5.4 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data. Highlights include new features and improvements for the Streamlined Data Refinery, Analyzer APIs and documentation, improvements to the Pentaho Operations Mart, new scheduling PDI APIs, POST methods for Carte, new support for SAP HANA, Sqoop, and Spark, clustering improvements for Hadoop, and some minor functionality improvements around Pentaho Interactive Reports and PDI steps and entries.
Pentaho Business Analytics 5.4
Our new features and improvements will help you work with Analyzer APIs, effectively use the Streamlined Data Refinery, and monitor system performance with the Pentaho Operations Mart.
Streamlined Data Refinery - New Features
We have added two new features to the Streamlined Data Refinery: the Shared Dimension step and the Create Link Dimensions annotation type. The Shared Dimension step allows you to share groups of annotations and dimensions with other users. The Create Link Dimension annotation lets you reuse a previously created Shared Dimension.
SDR Documentation Updates
The SDR documentation has been restructured to make it more accessible. The PDI steps and entries that are used for creating models for the SDR have been incorporated into the official documentation, and expanded to include definitions of key terms, discussion on the use of each step, and example workflows for the job entry or step. These can be found under the section called Building Blocks for the SDR Model.
New Analyzer APIs & Documentation Updates
We have added new APIs to provide more control over Analyzer when working in an embedded fashion. These APIs allow for more fine-grained interaction with the Analyzer reports and data. The Analyzer extensibility APIs live in a single place, and include introductory material, as well as samples.
Pentaho Operations Mart
Improvements have been made to the Pentaho BA Operations Mart, which helps you monitor system performance information such as how long it takes to run certain reports. Several bugs have been fixed and MySQL, MS SQL, and Oracle are now fully supported. We’ve also included a cleanup job to so that you can schedule when Pentaho Operations Mart data is deleted.
Pentaho Data Integration 5.4
These new and powerful features will help you quickly and securely access, blend, transform, and explore data with Pentaho Data Integration.
If are already using Spark, you can use the new Spark Submit job entry to orchestrate the execution of a Spark java application on a CDH 5.3 cluster. Spark is an open-source cluster computing framework that is an up-and-coming alternative to the Hadoop MapReduce paradigm.
Support for SAP HANA
If you are an SAP HANA user, you can now connect to your SAP HANA database in Pentaho Data Integration. The SAP HANA database has been added to the list of available database connections. Users, such as data scientists, can use this new connection to both access data from SAP HANA and load data into SAP HANA.
When connecting to your SAP HANA database, note the following settings information:
- For SAP customers, the JDBC driver is part of your client tools.
- The default port number is '30015' where '00' is the instance of the machine you are connecting to. For example, you can connect to the same machine using '30015', '30115', or '31015'.
New Hadoop Configurations Supported
You can now use PDI to connect to Amazon Elastic Map Reduce (EMR) version 3.4. EMR is an Amazon Web Service (AWS) for big data processing and analysis that is a popular alternative to hosting in-house cluster computing. We’ve also rolled the 5.3 patch release Hadoop Configurations support into this release. We now support the MapR 4.0.1, HDP 2.2, and CDH 5.3 configurations, right out of the box.
It is now easier to automate the deployment of a job that runs on a Kettle cluster on YARN. You can now indicate that you want your local configuration files to be copied to Kettle Cluster nodes from the Start a YARN Kettle Cluster job entry. You can also copy other files to the cluster from a centralized YARN folder.
Call BA Platform Endpoints in your Transformations
The Call Endpoint step helps users who want to build custom solutions for the BA Platform using the Pentaho API. Pentaho API is REST-based, so it can be operated using HTTP calls. This step exposes the Pentaho public API and offers an interface which allows advanced users to build custom solutions which interact with a running BA Server. This step is supported by two additional steps, Get Session Variables and Set Session Variables.
Named Hadoop Cluster Configuration Improvements
The Named Hadoop Configurations feature is now supported by the following steps and entries: Sqoop Input, Sqoop Output, Hadoop File Input, Hadoop File Output, and Hadoop Copy Files.
Sqoop bug fixes have been made so that it works more reliably. Improved support for command line arguments, and Hadoop configuration have also been included.
PDI API Improvements
We have added new APIs to provide more control over PDI scheduling and have added many POST methods for Carte. The PDI APIs are gathered into a single location, and include introductory material as well as samples.
Scheduling APIs for PDI
We have implemented APIs for creating, reading, updating, deleting, and listing schedules for the DI Server.
New PDI Look and Feel
PDI icons have been redesigned and are now more modern and intuitive. The PDI Embed and Extend guide also contains a handy guide for developers and artists who want to create new icons for their plugins.
Minor Functionality Changes
Minor functionality changes cover changes to the software that might impact your upgrade or migration experience. If you are migrating from an earlier version than 5.4, check the What's New articles and Minor Functionality changes for each intermediate version of the software.
Google Analytics Step Changes
Google Analytics APIs now use the OAuth 2.0 protocol for authentication and authorization. To accommodate this change, we have updated the Google Analytics step. The Google Analytics step can be found under the Input category in Spoon.
A similar step named Google Analytics Input is under the Deprecated category in Spoon. The Google Analytics Input step no longer works. Use the new Google Analytics step instead.
If you have existing transformations and jobs created in pre-5.4.0 versions of PDI, and they contain the Google Analytics step, you will need to make changes so that the transformations and jobs can still run. If you do not, and you attempt to run the transformation, an error will occur.
Here is what you need to do.
- Follow the instructions here (http://wiki.pentaho.com/display/EAI/Google+Analytics) to generate a private key, create a service account, and to add the service account's email as a user on your Google Analytics account. You only need to set this up once.
- For each existing transformation or job from pre-5.4.0 version of PDI that contains the Google Analytics step, do these things.
- Open the transformation in Spoon.
- Open the Google Analytics step’s window and enter the enter the Google Developer Service Account's email address in the OAuth Server Email field.
- Enter the path to the P12 private key in the Key File field.
- Click OK to save and close the Google Analytic step’s window.
- Save the transformation.
Text File Output Step's Split Row Fix
Previously, if you wanted output files in the Text File Output step to contain a specific number of rows, each file would contain one less row than you specified. For example, if you indicated you wanted to output 50 rows of data per file, 49 rows of data per file were generated instead.
This bug has been fixed so output files contain the exact number of rows you indicate. If you are migrating or updating to 5.4 and you have transformations that use the Text File Output step, adjust the Split every ... rows field so that your output files contain the number of rows you want to appear in your output files.
Roll Back to Previous System Row Limit for PIR
We have provided a way for administrators to designate the system maximum row limit for Pentaho Interactive Reports at the previous default level, if desired.
Help Site Improvements
This section covers changes to the Pentaho Help site which impacts your user experience.
New Search Tool
Our new search provides a decision tree interface which which filters results based on your selection. Begin by entering your search query into the search box.
At the top of page, you'll see a new filtering tool called a carousel. Typically, the carousel features a row of buttons corresponding to each Pentaho software version included in the help site. Click the version you want to search.
The next set of buttons in the carousel will be category or guide pages along with their corresponding images. Click the arrows on either side of the row to scroll and find more categories or guides to further narrow your search results.
If there are no results matching your search term under a specific category or guide page, that page will not show in the carousel, thus providing you with a more focused list of search results.
This new search tool is also auto-faceted. If you begin your search on a specific category page, the results returned will only be in that category. You can use the carousel to expand your results to other versions, categories, and guides.