Skip to main content
Pentaho Documentation

Support Statement for Analyzer on Impala

Using Analyzer on Impala

These are the minimum requirements for Analyzer to work with Impala.

  1. Use minimum Pentaho BA Suite EE 5.1 or later.
  2. Use minimum Cloudera CDH5.x, Impala 1.3.x or later.
  3. Recommend using Parquet compressed file format for tables in Impala.
  4. Recommendations for the Hive and Simba drivers.
  • If you are using Pentaho 5.4 or CDH 5.3 or earlier, use Apache Hive JDBC that was distributed as part of the CDH shim.
  • If you are using Pentaho 6.0 or later, and you are using CDH 5.4 shim supports the Cloudera JDBC Simba driver:  Cloudera JDBC Driver for Impala 2.5.24.
  1. Make sure that the JDBC driver is dropped into the BA Server and Schema Workbench directories.
  2. Turn off connection pooling in Pentaho BA Server.
  3. Set global order by limit in Cloudera manager.
  4. In Mondrian schemas, divide dimension tables with high cardinality into several levels.

As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the link https://support.pentaho.com/hc/en-us/articles/208652846 to the best practice concerning Pentaho Analyzer on Impala.

There are some compiled Mondrian automated test suite results for Analyzer on Impala with OEM Simba, as well as the community Apache Hive driver: