Pentaho Data Integration (PDI) provides the Extract, Transform, and Load (ETL) capabilities that facilitates the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end users and IoT technologies.
Common uses of Pentaho Data Integration include:
- Data migration between different databases and applications
- Loading huge data sets into databases taking full advantage of cloud, clustered and massively parallel processing environments
- Data Cleansing with steps ranging from very simple to very complex transformations
- Data Integration including the ability to leverage real-time ETL as a data source for Pentaho Reporting
- Data warehouse population with built-in support for slowly changing dimensions and surrogate key creation (as described above)
Using the PDI Client
PDI Client (Spoon) is a desktop application that you install on your workstation, which enables you to build transformations and schedule and run jobs:
Work with Repositories
We recommend that you use the Pentaho Repository for enterprise deployments.
Using the Data Integration Perspective
PDI workflows are built using steps or entries joined by hops that pass data from one item to the next. This workflow is built within two basic file types:
- Transformations perform ETL tasks.
- Jobs orchestrate ETL activities such as defining the flow, dependencies, and execution preparation.
|Using Transformations and Jobs|
Step and Entry Reference
Learn about system requirements, the permissions needed for license and security management, and how to perform ETL solutions and data analytics tasks in PDI and Pentaho Business Analytics.
View the full list of hardware and software requirements for PDI and Pentaho Business Analytics:
Installation and Licenses
Use one of the following methods to install PDI and Pentaho Business Analytics:
Configuration and Management
Get started creating ETL solutions and data analytics tasks, manage servers, and fine-tune performance:
PDI Tools and User Management
Advanced PDI Concepts
Learn about developing custom plugins to extend or embed PDI functionality, sharing plugins, streamlining the data modeling process, connecting to Big Data sources, ways to maintain meaningful data and more.
|Use the Command Line with PDI||
Kitchen, Pan, and Carte are command line tools for executing jobs and transformations modeled in Spoon:
|Adaptive Execution Layer||Pentaho uses the Adaptive Execution Layer (AEL) for running transformations in different engines.|
|Embed and Extend PDI||
Learn how to develop custom plugins that extend PDI functionality or embed the engine into your own Java applications.
Use the Marketplace to download, install, and share plugins developed by Pentaho and members of the user community.
Use Data Lineage to track your data from source systems to target applications and take advantage of third-party tools, such as Meta Integration Technology (MITI) and yEd, to track and view specific data.
|Big Data and Streamlined Data Refinery||
Use transformation steps to connect to a variety of Big Data data sources, including Hadoop, NoSQL, and analytical databases such as MongoDB. Work through step-by-step tutorials, move beyond the basics, and learn how to edit transformations and metadata models.