Dimensional modeling is the process of transforming data from multiple sources in non-human-friendly formats into a single data source that is organized to support business analytics. Below is a typical workflow for developing a dimensional model:
- Collect user requirements for business logic and processes
- Considering the entirety of your data, break it down into subjects
- Isolate groups of facts into one or more fact tables
- Design dimensional tables that draw relationships between levels (fact groups)
- Determine which members of each level are useful for each dimensional table
- Build and publish a Mondrian (Pentaho Analysis) schema and collect feedback from users
- Refine your model based on user feedback, continue iterating through this list until users are productive
Or, expressed as a series of questions:
- What topics or subjects are important to the users who are analyzing the data? What do your users need to learn from the data?
- What are the important details your users will need to examine in the data?
- How should each data column relate to other data columns?
- How should datasets be grouped and organized?
- What are some useful short descriptions for each dimensional level in a hierarchy (for each element, decide what is useful within that element; for instance, in a dimensional table representing time, your levels might be year, month, and day, and your members for the year level might be 2003, 2004, 2005).
- How effective is this dimensional model for the intended userbase? How can it improve?
The Agile BI tools in Pentaho Data Integration make dimensional modeling much easier than the traditional methods. Through PDI, you can quickly adjust your business logic, the granularity of your fact tables, and the attributes of your dimension tables, then generate a new model and push it out to a test environment for evaluation.