Skip to main content
Pentaho Documentation

Support Statement for Analyzer on Impala

Using Analyzer on Impala

These are the minimum requirements for Analyzer to work with Impala.

  1. Use minimum Pentaho BA Suite EE 5.1 or later.
  2. Use minimum Cloudera CDH5.x, Impala 1.3.x or later.
  3. Recommend using Parquet compressed file format for tables in Impala.
  4. Recommendations for the Hive and Simba drivers. Which driver you should use depends on the following scenarios:
Scenario Recommended Driver
Pentaho 5.4 with CDH 5.3 or earlier Apache Hive JDBC that was distributed as part of the CDH shim
Pentaho 6.0 with CDH 5.4 shim Impala JDBC Connector 2.5.24 Cloudera Simba driver
Pentaho 6.1 with CDH 5.5 shim Impala JDBC Connector 2.5.29 Cloudera Simba driver
Pentaho 6.1 with CDH 5.7 shim Impala JDBC Connector 2.5.31 Cloudera Simba driver
  1. Make sure that the JDBC driver is dropped into the BA Server and Schema Workbench directories.
  2. Turn off connection pooling in Pentaho BA Server.
  3. Set global order by limit in Cloudera manager.
  4. In Mondrian schemas, divide dimension tables with high cardinality into several levels.

As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the link https://support.pentaho.com/hc/en-us/articles/208652846 to the best practice concerning Pentaho Analyzer on Impala.

There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver: