Skip to main content
Pentaho Documentation

Troubleshooting MongoDB and Analyzer

Improve Performance for High Cardinality Attributes

Large MongoDB collections often have one or more attributes with high cardinality, in other words, they have many distinct values.  For example, a collection of customer transactions could have tens or hundreds of thousands of unique customer identifiers. 

Analyzer for MongoDB optimizes the aggregation pipeline queries it uses when retrieving dimensional metadata to better handle these high cardinality attributes.  Where possible, it does this by pushing down applicable query constraints into the $match operator of the pipeline.  This allows Analyzer for MongoDB to load a smaller subset of the total values of the attribute

As an example, consider a report which displays all customers who have purchased "Carlson Chocolate Milk" at stores in Tacoma:

AnaMongoHighCardAttScreen.png

Rather than retrieving the metadata for all customers, Analyzer for MongoDB will load just those customers who actually have transactions in Tacoma for Chocolate Milk.  Additionally, Analyzer for MongoDB will load the required metadata for [Store City] and [Product Name] in the same pipeline query.  This has the advantage of producing more efficient MongoDB queries, but just as importantly reduces the total overhead of data processing within Analyzer for MongoDB.

This feature is also important for large cross-joins, even when the individual cross-joined attributes have lower cardinality.  If several attributes are cross-joined on a report, Analyzer for MongoDB needs to determine which of the permutations of attribute values have associated data.  By querying for all requested attributes in a single pipeline query, with constraints in place, Analyzer for MongoDB is able to reduce the amount of processing required for evaluation.

The properties which control this feature are enabled by default and, as a general rule, should be left on. They are contained in the configuration file at .../biserver-ee/pentaho-solutions/system/osgi/bundles/mondrian.cfg.

mondrian.native.crossjoin.enable=true
mondrian.native.nonempty.enable=true

Troubleshoot Data Limit Errors

If you have an abnormally large number of attributes in your Analyzer report, you may run into an error message alerting you to this.

MongoAnaErrorMsg.jpg

There are a couple of things that you can try to work around this issue.

  • If you are using an "Excludes" or "Doesn't Contain" filter, try changing the filter to an "Includes" or "Contains" filter.
  • Try turning off one or more of the following: subtotals, grand totals, or empty cells.

After you try one of these tips, refresh the report to see if it is able to run.

Clear the Cache for Analyzer on MongoDB

  1. Login to the BA Server as an administrator.
  2. Click Tools in the tool bar.
  3. Select Refresh , then select Mondrian Schema Cache.
  4. An info box pops up when the cache is cleared successfully. Click OK.
  5. The cache is now cleared.