Skip to main content
Pentaho Documentation


To perform the tutorials in this section you must have these components installed.

PDI—The primary development environment for the tutorials. See the Data Integration Installation Options if you have not already installed PDI.

Apache Hadoop 0.20.X—A single-node local cluster is sufficient for these exercises, but a larger and/or remote configuration also works. If you are using a different distribution of Hadoop see Configure Your Big Data Environment. You need to know the addresses and ports for your Hadoop installation.

*Hive—A supported version of Hive. Hive is a Map/Reduce abstraction layer that provides SQL-like access to Hadoop data. For instructions on installing or using Hive, see the Hive Getting Started Guide.

*HBase—A supported version of HBase. HBase is an open source, non-relational, distributed database that runs on top of HDFS. For instructions on installing or using HBase, see the Getting Started section of the Apache HBase Reference Guide.

*Component only required for corresponding tutorial.