|Java Interface|| |
|Base class|| |
The class implementing
StepInterface is responsible for the actual row processing when the transformation runs.
The implementing class can rely on the base class and has only three important methods it implements itself. The three methods implement the step life cycle during transformation execution: initialization, row processing, and clean-up.
During initialization PDI calls the
init() method of the step once. After all steps have initialized, PDI calls
processRow() repeatedly until the step signals that it is done processing all rows. After the step is finished processing rows, PDI calls
Aside from the methods it needs to implement, there is one additional and very important rule: the class must not declare any fields. All variables must be kept as part of the class implementing
StepDataInterface. In practice this is not a problem, since the object implementing
StepDataInterface is passed in to all relevant methods, and its fields are used instead of local ones. The reason for this rule is the need to decouple step variables from instances of
StepInterface. This enables PDI to implement different threading models to execute a transformation.
init() method is called when a transformation is preparing to start execution.
public boolean init()
Every step is given the opportunity to do one-time initialization tasks, such as opening files or establishing database connections. For any steps derived from
BaseStep, it is mandatory that
super.init() is called to ensure correct behavior. The method returns
true in case the step initialized correctly, it returns
false if there is an initialization error. PDI will abort the execution of a transformation in case any step returns
false upon initialization.
Once the transformation starts, it enters a tight loop, calling
processRow() on each step until the method returns
false. In most cases, each step reads a single row from the input stream, alters the row structure and fields, and passes the row on to the next step. Some steps, such as input, grouping, and sorting steps, read rows in batches, or can hold on to the read rows to perform other processing before passing them on to the next step.
public boolean processRow()
A PDI step queries for incoming input rows by calling
getRow(), which is a blocking call that returns a row object or
null in case there is no more input. If there is an input row, the step does the necessary row processing and calls
putRow() to pass the row on to the next step. If there are no more rows, the step calls
setOutputDone() and returns
The method must conform to these rules.
- If the step is done processing all rows, the method calls
- If the step is not done processing all rows, the method returns
true. PDI calls
processRow()again in this case.
The sample step plugin project shows an implementation of
processRow() that is commonly used in data processing steps.
In contrast to that, input steps do not usually expect any incoming rows from previous steps. They are designed to execute
processRow() exactly once, fetching data from the outside world, and putting them into the row stream by calling
putRow() repeatedly until done. Examining existing PDI steps is a good guide for designing your
The row structure object is used during the first invocation of
processRow() to determine the indexes of fields on which the step operates. The
BaseStep class already provides a convenient First flag to help implement special processing on the first invocation of
processRow(). Since the row structure is equal for all input rows, steps cache field index information in variables on their
Once the transformation is complete, PDI calls
dispose() on all steps.
Public void dispose()
Steps are required to deallocate resources allocated during
init() or subsequent row processing. Your implementation should clear all fields of the
StepDataInterface object, and ensure that all open files or connections are properly closed. For any steps derived from
BaseStep, it is mandatory that
super.dispose() is called to ensure correct deallocation.