Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Write Metadata

Parent article

You can use the Write Metadata step to add new metadata to any existing metadata in the Lumada Data Catalog that is already associated with specific data resources. You can add a description and tags to the data resource. You can also add metadata for any new data resources that your transformation may be adding to Data Catalog.

The Write Metadata step includes options to identify, locate, and append the metadata associated with existing Resource IDs in the Data Catalog.

For more information about accessing Lumada Data Catalog in PDI, see PDI and Lumada Data Catalog.

NoteThis step is supported on the PDI engine but not on the Spark engine. Only CSV text file and Parquet data formats are currently supported. You must have role permissions in Data Catalog to read the data resources.

Before you begin

Before using the Write Metadata step, you must have an established VFS connection to Data Catalog. For more information see Access to Lumada Data Catalog. In addition, you must have role permissions set in Data Catalog to create and write tag descriptions for data resources registered in Data Catalog.

General

Write metadata step

The following fields are general to this transformation step.

FieldDescription
Step NameSpecify the unique name of the Write Metadata step on the canvas. You can customize the name or leave it as the default.
Connection

Use the list to select the name of your connection to Data Catalog.

See Connecting to Virtual File Systems for details.

Options

Use the Write Metadata step to select the ID for an existing data resource in Data Catalog. This step appends new tags and descriptions of the selected tags to any existing tags for the resource.

Input tab

Use the Input tab to describe where a data resource ID (Data Catalog’s identification key) originates as input from within your transformation.

If your transformation only needs a specific data resource, type the ID into the Resource ID field. The Write Metadata step then adds new tags only to that specific data resource.

Input tab

If the transformation is designed to work with multiple resource IDs, they can be supplied as input from a previous step in the transformation, for example, with the Read metadata step. Specify one of the following options.

Resource ID optionDescription
Accept resource ids from previous stepSelect this option if the exact resource IDs are the incoming data from a previous step in the transformation.
Pass through fields from previous stepSelect this option if the resource IDs are located in a specific field that is incoming from a previous step in the transformation.
Field in the input to use as resource idIf you select Pass through fields from previous step, enter the name of the field that contains the resource IDs.

Metadata tab

In this tab, specify any of the existing metadata tags you want to associate with the data resources identified in the Input tab. This PDI option matches the Add a tag feature for a data resource in the Data Catalog.

For example, if your transformation uses the Catalog Input step to create or update a data resource in Data Catalog, you could use this step to inject resource IDs into your transformation and add tags to those data resources.

Metadata tab

Use the drop-down menu to select a tag, or several tags, from the existing tags that are available through your Data Catalog connection. Enter a description for this tag or group of tags in the Description field, or edit the existing description.

FieldDescription
DescriptionType or revise the description for the tag or group of tags you selected with the Tags option.
TagsSelect any existing tags you want associated with, and assigned to, the resource ID and click ADD.
NoteIf missing or incomplete data is returned, you may need to change the default limit for returned results. See Lumada Data Catalog searches returning incomplete or missing data for information.