A row in PDI is represented by a Java object array,
Object. Each field value is stored at an index in the row. While the array representation is efficient to pass data around, it is not immediately clear how to determine the field names and types that go with the array. The row array itself does not carry this meta data. Also an object array representing a row usually has empty slots towards its end, so a row can accommodate additional fields efficiently. Consequently, the length of the row array does not equal the amount of fields in the row. The following sections explain how to safely access fields in a row array.
PDI uses internal objects that implement
RowMetaInterface to describe and manipulate row structure. Inside
processRow() a step can retrieve the structure of incoming rows by calling
getInputRowMeta(), which is provided by the
BaseStep class. The step clones the
RowMetaInterface object and passes it to
getFields() of its meta class to reflect any changes in row structure caused by the step itself. Now, the step has
RowMetaInterface objects describing both the input and output rows. This illustrates how to use
RowMetaInterface objects to inspect row structure.
There is a similar object that holds information about individual row fields. PDI uses internal objects that implement
ValueMetaInterface to describe and manipulate field information, such as field name, data type, format mask, and alike.
A step looks for the indexes and types of relevant fields upon first execution of
processRow(). These methods of
RowMetaInterface are useful to achieve this.
| ||Given a field name, determine the index of the field in the row.|
| ||Returns an array of field names. The index of a field name matches the field index in the row array.|
| ||Given a field name, determine the meta data for the field.|
| ||Given a field index, determine the meta data for the field.|
| ||Returns a list of all field descriptions. The index of the field description matches the field index in the row array.|
If a step needs to create copies of rows, use the
cloneRow() methods of
RowMetaInterface to create proper copies. If a step needs to add or remove fields in the row array, use the static helper methods of
RowDataUtil. For example, if a step is adding a field to the row, call
resizeArray(), to add the field. If the array has enough slots, the orignial array is retruned as is. If the array does not have enough slots, a resized copy of the array is returned. If a step needs to create new rows from scratch, use
allocateRowData(), which returns a somewhat over-allocated object array to fit the desired number of fields.
Summary Table of Classes and Interfaces for Row Processing