This section contains advice and procedures for testing and improving Pentaho Analyzer (Mondrian) performance. There are two facets of Pentaho Analyzer performance to consider: query speed and execution speed.
Query speed is the amount of time it takes to retrieve data from your data warehouse or data mart, and execution speed is the amount of time it takes to manipulate or perform calculations with that data after it has been retrieved. With these two performance factors in mind, review the following outline of questions to ask during a typical performance-tuning process:
- Locate the performance problem. Does the problem involve query speed (retrieving a result set) or execution speed (calculations done client-side and in the Mondrian engine)? Most commonly, the performance problem is in your data structure, not the Analysis engine or client machine.
- If query speed is slow, you must reconsider your data warehouse design and implementation.
- If your data warehouse is soundly designed, are you using an analytic database to achieve maximum query performance?
- If execution speed is slow, you may need to do some tuning of the Mondrian or Pentaho Reporting engine.
- If high-cardinality dimensions are unavoidable, you may need to partition them and streamline your schema to support table partitioning.
Optimize your infrastructure
The guidelines and advice in this section are specific to changes that you can make with your in-house infrastructure. None of the performance-tuning tips in this section have specifically to do with modifying Pentaho software. Before you get to the point where you can confidently tune the Analysis engine and Pentaho Analyzer, you must ensure that everything on your side of the equation is properly optimized.
Redesign your data warehouse
- Be open to modifying the original design to meet adjusted requirements from business users (iterative design).
- Remove data that is not actually used by business users.
- Optimize for the right purpose. There are basically two use cases to consider: analysis (slice/dice/pivot) and static reporting. You could also use a data warehouse to cleanse and consolidate transactional data for data mining, but this model would almost certainly be inappropriate for analysis or reporting.
- Avoid creating high-cardinality dimensions (putting too many records into fact tables). High-cardinality dimensions will never perform well.
- If there is a lot of unrelated information in your data warehouse, consider breaking it up into more topic-specific data marts.
- Create indexes for large fact tables.
- Create aggregate tables for frequently-computed views.
Switch to analytic database
If you are not currently using an analytic database as your ROLAP data source, and you are experiencing poor query performance, then switching to an analytic database should be among your first considerations for improving Pentaho Analyzer performance.
Optimize Pentaho Analyzer
Once you've properly tuned your data warehouse, you can move on to tuning your ROLAP schema, the Mondrian engine, and the Analyzer client tool.
Partitioning high-cardinality dimensions
To instruct the Analyzer engine to properly address a (partitioned) high-cardinality dimension, you must modify the ROLAP schema and explicitly set the highCardinality property of the ElementCubeDimension element to true on each applicable dimension. This will streamline SQL generation for partitioned tables; ultimately, only the relevant partitions will be queried, which could greatly increase query performance.
Mondrian log analysis
Base all of your performance tuning on this data; it will tell you everything that you need to know about bottlenecks in your data structure.
You can also determine the causes behind hanging queries in an Analyzer report by viewing Mondrian log information directly through the Analyzer interface:
Log into the BA Server as an administrator.
Create or load an Analyzer report.
Click the More actions and options icon in the report toolbar, and select . Click Ok.
Click the icon again and select Ok.. Click
Click the icon again and select.
- If each SQL query is reported twice. The first time is for Mondrian to get the first record and the second time is to retrieve all records
- SQL queries with high execution times
- SQL queries that return large volumes of data (more than 1000 rows)
- SQL queries that don't join tables
- SQL queries that don't include filters
- This log entry: WARN mondrian.rolap.RolapUtil Unable to use native SQL evaluation for 'NonEmptyCrossJoin'; reason: arguments not supported. If you see this, try switching the contains filter into an includes filter, or make the contains filter more selective
Configuring Pentaho Analyzer for large data warehouses
These analyzer.properties settings are explained in the following table:
|filter.members.max.count||Controls the maximum number of values to show in the filter dialogue, such as include/exclude filters and date range dropdowns.|
|filter.dialog.apply.report.context||If set to true, when showing available members in the filter dialog, Analyzer will limit those members to the existing filters or measures on the report. This means that when retrieving the list of members, Analyzer will perform the join in the fact table and then apply dimension filters. For a high-cardinality dimension, this may significantly reduce the list of members loaded into memory.|
|filter.dialog.useTopCount||If both this and mondrian.native.topcount.enable in mondrian.properties are set to true, when showing the first set of members in the filter dialogue, Analyzer will only show that set of members sorted within hierarchy. For high-cardinality dimensions, this is required to avoid loading all members into memory. However, if a user uses the Find box in the filter dialogue or if you have filter.dialog.apply.report.context set to true, then the TopCount will not be used.|
|report.request.service.result.expire.time.seconds||Report results are released after this amount of time has
Analyzer report requests are processed asynchronously and immediately cleaned up after the first download. While this is efficient because clients usually don't need to download a report more than once, it causes issues with popup blockers that will block the first download and re-submit the download after prompting the user. If you expire the request after 30 seconds, you will work around the popup blocker issues while also enabling people to refresh the browser to re-download a report. This only applies to PDF, Excel or CSV downloads.
|report.request.service.result.cleanup.time.seconds||Report result cleanup occurs after this amount of time.|
Configuring the Mondrian engine for large data warehouses
|mondrian.result.limit||Controls the largest cross join size that Mondrian will handle in-memory. Ideally, no queries should involve large cross joins in-memory; instead, they should be handled by the database.|
|mondrian.rolap.iterationLimit||This is similar to mondrian.result.limit, except this applies to calculating aggregates in-memory such as SUM, MAX, AGGREGATE, etc. This should be set to the same value as mondrian.result.limit.|
|mondrian.rolap.queryTimeout||If any query runs past this number of seconds, then the query is immediately cancelled. The total sum of all SQL statements to process a single MDX statement must be less than this timeout. Setting this to zero disables query timeout, which is not recommended because runaway queries can deprive system resources from other necessary processes.|
|mondrian.native.crossjoin.enable||If this is set to true, when Mondrian needs to cross join multiple dimensions in a report, if the cross join is non-empty, meaning a fact table relationship has been defined, then the join operation is done using SQL. The resultant SQL query returns only combined dimension members that actually have fact data. This reduces the amount of tuples that need to be processed and is critical for performance on high-cardinality dimensions.|
|mondrian.native.topcount.enable||If set to true, when fetching the first set of records for the filter dialog, Mondrian will only read that set of records into memory. If set to false, all records from the dimension level will be read into memory.|
|mondrian.native.nonempty.enable||If set to true, when fetching the first set of records for the filter dialog, Mondrian will only read that set of records into memory. If set to false, all records from the dimension level will be read into memory.|
|mondrian.rolap.maxConstraints||This should be set to the largest number of values that the data warehouse database supports in an IN list.|
|mondrian.native.ExpandNonNative||Works in conjunction with native evaluation of cross joins. If set to true, Mondrian will expand cross join inputs to simple member lists that are candidates for pushdown.|
Redesigning Analyzer reports for maximum performance
When you re-create your reports, follow this process for best performance:
Add and filter by low-cardinality dimensions first
Add measures to the report
Add high-cardinality dimensions lastNoteWhen filtering, always choose include/exclude over contains/doesn't contain.
PentahoAnalyzer configuration files
|/pentaho/server/pentaho-server/pentaho-solutions/system/mondrian/mondrian.properties||Contains low-level configuration options for the PentahoAnalyzer(Mondrian) engine.|
|/pentaho/server/pentaho-server/pentaho-solutions/system/analyzer/analyzer.properties||Contains low-level configuration options for Pentaho Analyzer. These are not options that can be set through Analyzer's graphical user interface.|