Skip to main content
Pentaho Documentation

Inspect Your Data

When working with your transformation, you can gain valuable insights by visualizing and interacting with your data.  You can quickly inspect step data, reducing the amount of iterative work needed while building your transformation. Then you can rapidly publish a data source to share with your teams or across your organization.

Depending on your operating system, you may need to upgrade your Web browser for the full experience. See our list of supported components here.

Get Started

Begin inspecting your data by clicking on a step in the transformation.

transformation canvas cropped.png

The fly-out inspection bar appears at the top of the transformation canvas. The bar displays the name of the step selected and offers two options:

  • Run and Inspect Button.jpg Run and Inspect Data - Runs the transformation up to the selected step, then lets you inspect your data.
  • Inspect Button.jpg Inspect Data - Lets you inspect the data of a step once the transformation has run.
    Note: This option runs your transformation only if it was not previously executed

After the transformation runs, a flat table of your step data is displayed with all the available fields selected in Stream View.

stream view w filters panel4.png

Additionally, you can begin inspecting data using these other methods:

  • Step Context Menu - Right-click on a step and choose either Inspect Data or Run and Inspect Data.

  • Preview Data Panel - Select the Preview Data tab. Click the Inspect Data button located at the top right of the Preview Data bar.
  • Actions Menu - Select a step. From the Menu bar, click Action>Inspect Data or Action>Run and Inspect Data.
  • Keyboard Shortcuts - Select a step. Then using your keyboard, do the following:
    • In Windows and Ubuntupress either Shift+Ctrl+F9 (Inspect Data) or Ctrl+F9 (Run and Inspect Data).
    • In OS X, press Shift+Command+F9 (Inspect Data) or Command+F9 (Run and Inspect Data).

Tour the Environment

The following illustration shows selected data visualized as a bar chart in Model View.

screen cap w filters panel.png

Use the number locators in the preceding illustration to reference the sections of the inspection environment in the table below. 

Key Feature Description
Circle 1 Header bar

Use the Header bar to access:

  • The title of the step being inspected.
  • The row count of the data sampled, up to a maximum default of 50,000 rows.
  • The Publish button, used to create a data source for collaborative use later through a data service.
  • The Exit button, to return to the transformation canvas
Circle 2

Stream View / Model View

Toggle between the Stream View and Model View modes to inspect data and build visualizations based on the data sampled.

  • Use Stream View to inspect the data using a flat table or visualization types that do not require modeling.
  • Model View to inspect the data in a dimensional model so that you can view measures, hierarchies, and annotations.

When a visualization mode is not supported, the unsupported view is disabled.

Search Box Use the Search Box to find a specific field in the list of available fields. This feature is especially useful in Stream View where the order of the fields is solely determined by the transformation.
Available Fields panel

The Available Fields panel lists all available fields from the subset of data being inspected. Field types are automatically assigned as the step data are ingested, including:

  • Default fields (no icon provided), which contain default data depending upon the view:
    • Stream View data that are not numeric, with no date or timestamp, including string, booleans and other types.
    • Model view data that are non-measure and not annotated as location or time hierarchies.    
  • Date fields (date icon.png icon), which contain date data. (Stream View only)
  • Numeric fields (numerics icon.png icon), which contain discrete categorical data. (Stream View only)
  • Geographic fields (geography icon.png icon), which contain location data. (Model View only)
  • Measure fields (measures icon.png icon), which contain quantitative data. (Model View only)
  • Time fields (clock icon.png icon), which contain time data. (Model View only)

You can select the specific fields you want to inspect from this list. Click a field to select or clear it, or drag a field into the Layout panel. Selected fields display with a blue disk icon (blue dot icon.png) to the left of their names.

  • Select Clear All to remove all fields from the Layout panel, clear all filters from the Filters panel, and clear the canvas.
  • For a flat table in Stream View, click Select All to include all fields in the flat table in the order they are listed.
Circle 3 Visualization Selector Use the Visualization Selector to choose a visualization type. Selecting a visualization from the drop-down menu displays it in the Canvas area.
Circle 4 Layout panel Displays the available drop zones and associated field types needed for the selected visualization.
Circle 5 Filters panel

Displays all filters applied to a visualization. To apply a filter, you can drag a field from the Available Fields panel into the Filters panel. Also, some specific filtering actions can be applied by clicking on the visualization. See the Use Filters to Explore Your Data article for more information.

Circle 6 Canvas The Canvas displays the selected visualization.
007-number.png Tabs bar

Use the Tabs bar to manage and navigate the tabs:

  • The active tab is always indicated with a blue highlight.
  • Create tabs for different data visualizations, by duplicating existing tabs or by adding new tabs.
  • Scroll through multiple tabs.
  • Delete tabs you no longer need.

Use Visualizations

Data visualizations have two modes: Stream View and Model View. You can switch between these modes to inspect data and shape visualizations based on the sampled set. Stream View is the default mode. Stream View generates SQL queries used in entity-relational modeling and executed in a relational database. Model View builds upon the same tables as Stream View, laying a dimensional model on top of them, and allowing for multidimensional queries, supported in the background by MDX queries to a Mondrian engine.

The first view provided during data inspection is a Stream View of your step data in a flat table on the Canvas. To reduce the number of data fields selected, click anywhere on the field name in the Available Fields panel. The blue disc icon to the left of the name disappears, indicating that the field is no longer selected. To change the visualization type in the Stream or Model view, use the Visualization Selector. If you select a visualization that requires a model, the mode will automatically switch to Model View. Otherwise, it remains in Stream View, and if available Model View can be manually selected.

Drag the fields you want to visualize from the Available Fields panel and drop them into the drop zones of the Layout panel. The drop zones and the data they accept are determined by the visualization type. To explore your data with additional visualization types, create additional tabs. Note that when exploring visualization types, the drop zones types and the data they accept will change according to the requirements of that visualization.  

You can further customize your visualization by keeping or excluding fields, by drilling down into data points in the visualization including the legend or axis labels of a chart, and by other filtering options. When you filter, the filtering action is applied to the data and the Filters panel and visualization automatically updates, based on the selected filter. For more information, see the Filters article.

Once you are satisfied with your step data and model, you can make the content available for collaboration by publishing a data source

Save Your Inspection Session

You can save your data inspection session for later use and sharing. After you have made changes to the generated data and you exit the application, the inspection icon (inspection icon.png) appears on the step in the transformation canvas to indicate it has a remembered session. When you save, this session gets stored as a Kettle transformation (.ktr) file. The session can then be restored by reopening the saved file and re-inspecting the step.

When opening older saved file formats, they will be automatically updated to the current format. After this conversion, the formats can only be opened in the current version of PDI.

Use Tabs to Create Multiple Visualizations

A tab is denoted by its associated visualization type. A tab is created when you run and inspect your data, add a new tab, or duplicate a tab. By using multiple tabs, you can create unique visualizations to inspect differences, spot trends, and develop insights regarding your data. You can add a new tab to build a new visualization, or you can duplicate an existing tab to investigate the results of small changes to your data. Tabs remain open between sessions so that you can always return to the inspection canvas to fine tune your transformation at any time until satisfied with the results. 

Note that tabs can become invalid when you reopen a remembered inspection session, if, for example, some of the selected fields in the transformation or step were removed, renamed, or changed in relation to the hierarchy. Additionally, tabs can become invalid when the metadata of the field changes for the filters being used. To revalidate those tabs, you can clear the fields from the visualization in the inspection canvas, or exit your session and add the fields back to the transformation or to the step itself. In the flat table, all invalid fields are removed automatically.

Publish for Collaboration

When you’re ready to make your content available for others, you can publish it as a data source. The data source uses a data service that is automatically created on the step, and is available to other tools. Note that to publish the data source, you must be connected to your repository.

Perform the following steps to publish your data:

  1. Click the Publish button at the top right of the Header bar to open the Publish Data Source.
  2. Click Get Started to open the Publish Details window.

Enter the data source information in the following fields:

Fields Description
Data Source Name The name used by other Pentaho applications when accessing your data source.
Server The default value for this field is your current repository. You can select other repository connections, if you have created them, through the Repository Manager.
URL The base URL string used to connect to the server.
User Name

The user name required to access the server.

The user must also have publishing permissions. 

Password The password associated with the provided user name.
  1. When you are done, click Finish
  2. Once your data source is created, a confirmation will appear. The data source should now be available on the server. Click Close to continue inspecting your data or click View this in User Console to open a new browser window and work with the data source in Analyzer.
Learn More

For more information on inspecting your data, see the following articles: