Note: The advice in this section applies to building and optimizing data warehouses in general and is not specific to Analysis. However, since poor data warehouse design is so frequently a significant source of performance loss for Pentaho Analysis, it is listed in this section.
A data warehouse exists to consolidate and partition transactional data into one streamlined, organized, canonical source for reporting and analysis. Some guidelines to follow in data warehouse design are:
- Be open to modifying the original design to meet adjusted requirements from business users (iterative design).
- Remove data that is not actually used by business users.
- Optimize for the right purpose. There are basically two use cases to consider: analysis (slice/dice/pivot) and static reporting. You could also use a data warehouse to cleanse and consolidate transactional data for data mining, but this model would almost certainly be inappropriate for analysis or reporting.
- Avoid creating high-cardinality dimensions (putting too many records into fact tables). High-cardinality dimensions will never perform well.
- If there is a lot of unrelated information in your data warehouse, consider breaking it up into more topic-specific data marts.
- Create indexes for large fact tables.
- Create aggregate tables for frequently-computed views.