Skip to main content
Hitachi VantaraPentaho Documentation
Pentaho Documentation

Work with Physical Schemas

A physical schema defines a set of MongoDB collections. The physical model is the source of the data which is presented through the logical model. MongoDB allows each document in a collection to have a different schema. Analyzer for MongoDB requires all the documents in your collection to be uniform. The details for working with physical schemas are covered here.

Here is an example MongoDB document taken from the Foodmart database.


{
"_id" : ObjectId("527ab2396a65f6f4b153cee5"),
"product_subcategory" : "Nuts",
"product_category" : "Specialty",
"product_department" : "Produce",
"product_family" : "Food",
"product_id" : 195,
"product_class_id" : 1,
"the_year" : 1997,
"the_month" : "June",
"the_day" : "Sunday",
"quarter" : "Q2",
"country" : "USA",
"state_province" : "WA",
"city" : "Sedro Woolley",
"marital_status" : "S",
"gender" : "M",
"marital_status" : "S",
"store_sales" : 1.77,
"store_cost" : 0.5664,
"unit_sales" : 1
} 
 

Tables

A table is a named use of a MongoDB Collection. Tables are declared using the <Table> element, as shown below.


<Table name="sales"/>

Columns

Inside a table, you must specify each attribute which will be used within your schema by defining <CalculatedColumnDef> elements.


<Table name="sales">
<ColumnDefs>
<CalculatedColumnDef name="unit_sales" type="Integer"/>
          <ExpressionView >
            <SQL dialect="generic">
              $unit_sales
            </SQL>
          </ExpressionView>
</CalculatedColumnDef>
<CalculatedColumnDef name="month_of_year" type="String"/>
          <ExpressionView >
            <SQL dialect="generic">
              $month_of_year
            </SQL>
          </ExpressionView>
</CalculatedColumnDef>
<CalculatedColumnDef name="the_year" type="Integer"/>
          <ExpressionView >
            <SQL dialect="generic">
              $the_year
            </SQL>
          </ExpressionView>
</CalculatedColumnDef>
         </ColumnDefs>
</Table>

The <CalculatedColumnDef name="the_year" type="Integer"> declares that the sales table has a column called the_year and that you want to use it as an integer. This affects how the column is sorted, and affects the MDX type of expressions built from this column.  The <ExpressionView> defines $the_year as the MongoDB attribute to use for the column.  There is no requirement that the MongoDB attribute name match the CalculatedColumnDef name, but in most cases that is a reasonable convention to follow.

Nested Attributes

Dot notation can be used to reference attributes that are nested within a MongoDB collection.  For example, consider the following document, which groups related attributes under common parents:


{
"_id" : ObjectId("527ab2396a65f6f4b153cee5"),
“products” : { "product_subcategory" : "Nuts",
 "product_category" : "Specialty",
  "product_department" : "Produce",
  "product_family" : "Food" },
“time” : { "the_year" : 1997,
  "the_month" : "June",
  "the_day" : "Sunday",
  "quarter" : "Q2"}
}

To define the quarter column, use a <CalculatedColumnDef> element as above, specifying $time.quarter as the referenced attribute under <ExpressionView>.


<CalculatedColumnDef name="quarter" type="Integer"/>
          <ExpressionView >
            <SQL dialect="generic">
              $time.quarter
            </SQL>
          </ExpressionView>
    </CalculatedColumnDef>

Apply MongoDB Aggregation Framework Operators

Valid string, arithmetic and conditional operators are permitted within <CalculatedColumnDef> elements.  For example, to construct the full_name column as the concatenation of $fname and $lname, use a definition like the following:


<CalculatedColumnDef name="full_name" type="String"/>
          <ExpressionView >
            <SQL dialect="generic">
               {$concat : ['$fname', ' ', '$lname']}
            </SQL>
          </ExpressionView>
    </CalculatedColumnDef>

The promotion_sales calculation gives the store_sales value when promotion_id is not 0, and a value of 0 when promotion_id equals 0.  That can be expressed as:


<CalculatedColumnDef name="promotion_sales" type="Numeric" >
          <ExpressionView >
            <SQL dialect="generic">
              {$cond : [{ $eq: ['$promotion_id', 0]}, 0, '$store_sales']}
            </SQL>
          </ExpressionView>
        </CalculatedColumnDef>

The requirement is the expression must be valid within a MongoDB $project.  Note that complex or deeply nested expressions can have performance impact.

MongoDB Arrays

MongoDB arrays can be referenced  using square brackets.  For example, to specify that the emails array should be unwound, use $emails[*] as shown below:


<CalculatedColumnDef name="emails" type="Integer"/>
          <ExpressionView >
            <SQL dialect="generic">
              $emails[*]
            </SQL>
          </ExpressionView>
    </CalculatedColumnDef>

Use of arrays within a schema has some current limitations:

  1. Only one array attribute can be unwound.
  2. Array ordinals (e.g. $emails[1]) are not supported.
  3. Arrays which introduce many-to-many relationships can produce incorrect results.