What's new in Pentaho 8.3
The Pentaho 8.3 Enterprise Edition delivers a variety of features and enhancements, from improved access to your data stored in Snowflake and HCP to improved capabilities for Spark in Pentaho Data Integration. Pentaho 8.3 also continues to enhance the Pentaho platform experience by introducing new features and improvements.
You can now connect to Snowflake as a Pentaho data source. Snowflake is a fully relational ANSI SQL data warehouse-as-a-service running in the cloud. You can now use your Snowflake data in ETL activities with PDI or visualize data with Analyzer, Pentaho Interactive Reports, and Pentaho Report Designer. For information about connecting Pentaho to your Snowflake data warehouse, see the Components reference and JDBC drivers reference.
With the Pentaho Snowflake plugin, you can now use PDI to manage additional Snowflake tasks, including bulk loading data into a Snowflake data warehouse.
Using the Snowflake plugin job entries in PDI, data engineers can set up virtual warehouses (compute clusters), bulk load data, and start and stop virtual warehouses. You can scale Snowflake virtual warehouses up and down, reconfigure them, or suspend them when not in use to reduce costs.
Download and install the Snowflake plugin for Pentaho 8.3 to access the following job entries:
- Bulk load into Snowflake
- Create Snowflake warehouse
- Modify Snowflake warehouse
- Delete Snowflake warehouse
- Start Snowflake warehouse
- Stop Snowflake warehouse
For more information, including how to download and install the plugin, see PDI and Snowflake.
The Bulk load into Amazon Redshift entry is
now available in Pentaho Data Integration to enable greater productivity and automation while populating your
Amazon Redshift data warehouse, eliminating the need for repetitive SQL scripting. The new
job entry leverages the Redshift COPY
command to take advantage of parallel
loading and other capabilities.
The new Kinsesis Consumer and Kinesis Producer transformation steps leverage the real-time processing capabilities of Amazon Kinesis Data Streams from within PDI.
Collect real-time data, such as monitored events, user consumption of data streams, and monitored alerts from specific Amazon Kinesis data streams for ETL transformation.
Create PDI transformations to push data from application logs, website clickstreams, and IoT telemetry, as it arrives, to specific Amazon Kinesis data streams.
Hitachi Content Platform (HCP) is the distributed, object data storage system from Hitachi Vantara that provides a scalable, easy-to-use repository that can accommodate all types of fixed-content data from simple text files to images and video to multi-gigabyte database images.
You can use three new PDI steps to access metadata and properties of objects within your HCP repository:
Locate data objects by searching system and custom metadata annotations.
Identify and select an HCP object by its URL path, then select a specific target annotation name to read.
Identify and select an HCP object by its URL, then write custom metadata annotations to the object associated with the object URL.
Pentaho 8.3 includes improvements in AEL for Spark, including enhancements to work with the Apache Spark Dataset API. PDI includes Spark-related improvements to two steps: Switch-Case and Merge rows (diff).
The Pentaho Server Upgrade Installer is an easy to use graphical user interface that automatically applies the minor release version to your Pentaho installation. You can upgrade Pentaho versions 8.1 and 8.2 directly to Pentaho 8.3 using this simplified upgrade process via the user interface of the Upgrade Installer or command line for automated deployment scenarios. If upgrading from another version, use the existing manual upgrade process.
Further information about the upgrade installer and the download can be found in the Pentaho Customer Portal.
General upgrade documentation can be found here.
Pentaho 8.3 includes the following minor Data Integration improvements:
Stored VFS connections
Pentaho Data Integration (PDI) now stores VFS connection properties so you can use the connection information whenever you want to access your Google Cloud Storage or Hitachi Content Platform content. The VFS Connection is also used by the new Query HCP, Read metadata from HCP, and Write metadata to HCP steps.
Python list support
The Python Executor step now supports a Python list of dictionaries as a data input type. This new input type for All Rows processing converts each row in a PDI stream to a Python dictionary and all the dictionaries are then put into a Python list.
PDI lineage improvements
This version features newly added custom metaverse analyzers to support data lineage tracking for the following transformation steps and job entries:
- AMQP Consumer and AMQP Producer
- JMS Consumer and JMS Producer
- Kafka Consumer and Kafka Producer
- MQTT Consumer and MQTT Producer
To view the full list of steps and entries with custom data lineage analyzers, see Data lineage.
PDI expanded metadata injection support
Metadata injection enables the passage of metadata to transformation templates at runtime to drastically increase productivity, reusability, and automation of transformation workflow. With this ability, you can support use cases like the onboarding of data from many files and tables to data lakes. In addition to existing metadata injection enabled steps, as of 8.3 you now can inject metadata into any field in the following Pentaho Data Integration (PDI) steps:
- Table Output (added Connection field)
- Salesforce Input
- Strings cut
Pentaho 8.3 includes the following minor Business Analytics improvements:
Analyzer improvements
Pentaho 8.3 includes enhancements to Analyzer for an improved user experience when exporting reports. You now can:
- Un-merge Analyzer cells on export to Excel.
- Change the CSV separator for exports to CSV format.
- Export content via REST API call (and invoke export via URL).
Interactive Reports improvements
Pentaho 8.3 includes improvements to Interactive Reports. You now can:
- Search for fields via the Find text input box.
- Disable and enable the Select distinct option in the Query Settings dialog box by default.
Visualization API 3.0 support
The Visualization API 3.0 is now supported. Visualization API 3.0 provides a simple, powerful, tested, and documented approach to develop new visualizations and configure visualizations. As a reminder, new Pentaho installations are configured to use visualizations based on Visualization API 3.0. Both versions of Visulaization API are supported and available in Analyzer for you to convert from Visualization API 2.0 to Visualization API 3.0 as needed.