Skip to main content
Pentaho Documentation

Pentaho Data Integration

Pentaho Data Integration (PDI) provides the Extract, Transform, and Load (ETL) capabilities that facilitates the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end users and IoT technologies.

Common uses of Pentaho Data Integration include:

  • Data migration between different databases and applications
  • Loading huge data sets into databases taking full advantage of cloud, clustered and massively parallel processing environments
  • Data Cleansing with steps ranging from very simple to very complex transformations
  • Data Integration including the ability to leverage real-time ETL as a data source for Pentaho Reporting
  • Data warehouse population with built-in support for slowly changing dimensions and surrogate key creation (as described above)

Using the PDI Client 

PDI Client (Spoon) is a desktop application that you install on your workstation, which enables you to build transformations and schedule and run jobs:

Using the Data Integration Perspective

PDI workflows are built using steps or entries joined by hops that pass data from one item to the next. This workflow is built within two basic file types:

  • Transformations perform ETL tasks.
  • Jobs orchestrate ETL activities such as defining the flow, dependencies, and execution preparation.
Using Transformations and Jobs
Additional Features

Step and Entry Reference

Using the Schedule Perspective in PDI

Schedule transformations and jobs to run at specific times.

All about Scheduling

Learn how to Schedule Transformations and Jobs

PDI Administration

Learn about system requirements, the permissions needed for license and security management, and how to perform ETL solutions and data analytics tasks in PDI and Pentaho Business Analytics.

Supported Technologies

View the full list of hardware and software requirements for PDI and Pentaho Business Analytics:

Installation and Licenses

Use one of the following methods to install PDI and Pentaho Business Analytics:

Configuration and Management

Get started creating ETL solutions and data analytics tasks, manage servers, and fine-tune performance:

PDI Tools and User Management

Server Management

Performance Improvement 

Advanced PDI Concepts

Learn about developing custom plugins to extend or embed PDI functionality, sharing plugins, streamlining the data modeling process, connecting to Big Data sources, ways to maintain meaningful data and more.

Use the Command Line with PDI

Kitchen, Pan, and Carte are command line tools for executing jobs and transformations modeled in Spoon:

Adaptive Execution Layer Pentaho uses the Adaptive Execution Layer (AEL) for running transformations in different engines.
Embed and Extend PDI

Learn how to develop custom plugins that extend PDI functionality or embed the engine into your own Java applications.

Data Services

Use a Data Service to query the output of a step as if the data were stored in a physical table. Read about how to turn a transformation into a data service.

Marketplace

Use the Marketplace to download, install, and share plugins developed by Pentaho and members of the user community.

Data Lineage

Use Data Lineage to track your data from source systems to target applications  and take advantage of third-party tools, such as Meta Integration Technology (MITI) and yEd, to track and view specific data.

Big Data and Streamlined Data Refinery

Use transformation steps to connect to a variety of Big Data data sources, including Hadoop, NoSQL, and analytical databases such as MongoDB. Work through step-by-step tutorials, move beyond the basics, and learn how to edit transformations and metadata models.