PDI uses partitioner plugins for its partitioning feature. Each partitioner plugin implements a specific partitioning method.
For most applications, the Remainder of Division partitioner works well. On the rare occasion that an application would benefit from an additional partitioning method, this section explains how to implement them.
This section explains the architecture and programming concepts for creating your own partitioner plugin. We recommended you open and refer to the sample partitioner plugin sources while following these instructions.
A partitioner plugin integrates with PDI by implementing two distinct Java interfaces. Each interface represents a set of responsibilities performed by a PDI partitioner. Each of the interfaces has a base class that implements the bulk of the interface in order to simplify plugin development.
|Package||Interface||Base Class||Main Responsibilities|
Implementing the Partitioner Interface
Partitioner is the main Java interface that a plugin implements.
Keep Track of Partitioner Settings
The implementing class keeps track of partitioner settings using private fields with corresponding
set methods. The
dialog class implementing
PartionerDialogInterface is using these methods to copy the user supplied configuration in and out of the dialog.
public Object clone()
This method is called when a step containing partitioning configuration is duplicated in Spoon. It needs to return a deep copy of this partitioner object. It is essential that the implementing class creates proper deep copies if the configuration is stored in modifiable objects, such as lists or custom helper objects. The copy is created by calling
super.clone() and deep-copying any fields the partitioner may have declared.
public Partitioner getInstance()
This method is required to return a new instance of the partitioner class, with the plugin id and plugin description inherited from the instance upon which this method is called.
Serialize Partitioner Settings
The plugin serializes its settings to both XML and a PDI repository.
public String getXML()
This method is called by PDI whenever the plugin needs to serialize its settings to XML. It is called when saving a transformation in Spoon. The method returns an XML string containing the serialized settings. The string contains a series of XML tags, one tag per setting. The helper class
org.pentaho.di.core.xml.XMLHandler constructs the XML string.
public void loadXML()
This method is called by PDI whenever a plugin reads its settings from XML. The XML node containing the plugin settings is passed in as an argument. Again, the helper class
org.pentaho.di.core.xml.XMLHandler is used to read the settings from the XML node.
public void saveRep()
This method is called by PDI whenever a plugin saves its settings to a PDI repository. The repository object passed in as the first argument provides a convenient set of methods for serializing settings. The transformation id and step id passed in are used as identifiers when calling the repository serialization methods.
public void readRep()
This method is called by PDI whenever a plugin needs to read its configuration from a PDI repository. The step id given in the arguments should be used as the identifier when using the repositories serialization methods.
When developing plugins, make sure the serialization code is in synch with the settings available from the partitioner plugin dialog. When testing a partitioned step in Spoon, PDI internally saves and loads a copy of the transformation before it is executed.
Provide the Name of the Dialog Class
PDI needs to know which class will take care of the settings dialog for the plugin. The interface method
getDialogClassName() must return the name of the class implementing the
StepDialogInterface for the partitioner.
Partition Incoming Rows During Runtime
The class implementing
Partitioner executes the actual logic that distributes the rows to available partitions.
public int getPartition()
This method is called with the row structure and the actual row as arguments. It returns the partition to which this row is sent. The total number of partitions is available in the inherited field
nrPartitions and the return value is between zero (0, inclusive) and
Interface with the PDI plugin system
In order for PDI to recognize the plugin, the class implementing the
Partitioner interface must also be annotated with the Java annotation
Supply these annotation attributes:
||A globally unique ID for the plugin|
||A short label for the plugin|
||A longer description for the plugin|
Implementing the Partitioner Settings Dialog Box
StepDialogInterface is the Java interface that implements the settings dialog of a partitioner plugin.
Maintain the Dialog for Partitioner Settings
dialog class is responsible for constructing and opening the settings dialog for the partitioner. When you open the partitioning settings in Spoon, the system instantiates the
dialog class passing in a
StepPartitioningMeta object. Retrieve the
Partitioner object by calling
getPartitioner() and call the
open() method on the dialog. SWT is the native windowing environment of Spoon and the framework used for implementing dialogs.
public String open()
This method returns only after the dialog has been confirmed or cancelled. The method must conform to these rules.
- If the dialog is confirmed
Partitionobject must be updated to reflect the new settings
- If you changed any settings, the
StepPartitioningMetaobject Changed flag must be set to
open()returns the name of the step to which the partitioning is applied—use the
stepnamefield inherited from
- If the dialog is cancelled
Partitionobject must not be changed
StepPartitioningMetaobject Changed flag must be set to the value it had at the time the dialog opened
StepPartitioningMeta object has an internal Changed flag that is accessible using
setChanged(). Spoon decides whether the transformation has unsaved changes based on the Changed flag, so it is important for the dialog to set the flag appropriately.
The sample Partitioner plugin project has an implementation of the dialog class that is consistent with the these rules and is a good basis for creating your own dialogs.
Deploying Partitioner Plugins
To deploy your plugin, follow the following steps.
- Create a jar file containing your plugin class(es)
- Create a new folder, give it a meaningful name, and place your jar file inside the folder
- Place the plugin folder you just created in a specific location for PDI to find. Depending on how you use PDI, you need to copy the plugin folder to one or more locations as per the following list.
Deploying to Spoon or Carte:
Copy the plugin folder into this location: design-tools/data-integration/plugins/steps
Restart Spoon. After restarting Spoon, the new database type is available from the PDI database dialog.
Deploying to Pentaho Server for Data Integration:
Copy the plugin folder to this location: server/pentaho-server/pentaho-solutions/system/kettle/plugins/steps
Restart the server. After restarting the Pentaho Server, the plugin is available to the server.
Deploying to Pentaho Server for Business Analytics:
Restart the server. After restarting the Pentaho Server, the plugin is available to the server.
Sample Partitioner Plugin
The sample Partitioner plugin project is designed to show a minimal functional implementation of a partitioner plugin that you can use as a basis to develop your own custom plugins.
The sample Partitioner plugin distributes rows to partitions based on the value of a string field, or more precisely the string length. The sample shows a partitioner executing on five partitions, assigning longer strings to higher partition numbers.
Follow these steps in order to build and deploy the sample plugin.
- Obtain the Sample Plugin Source
The plugin source is available in the download package. Download the package and unzip it. The partitioner plugin resides in the kettle-sdk-partitioner-plugin folder.
- Configure the Build
Open kettle-sdk-partitioner-plugin/build/build.properties and set the
kettle-dirproperty to the base directory of your PDI installation.
- Build and Deploy
You may choose to build and deploy the plugin from the command line, or work with the Eclipse IDE instead. Both options are described below.
Build and Deploy From the Command Line
The plugin is built using Apache Ant. Build and deploy the plugin from the command line by invoking the install target from the build directory.
kettle-sdk-partitioner-plugin $ cd build build $ ant install
The install target compiles the source, creates a jar file, creates a plugin folder, and copies the plugin folder into the plugins/steps directory of your PDI installation.
Build and Deploy From Eclipse
Import the plugin sources into Eclipse:
To build and install the plugin, follow these steps:
- From the menu, select File > Import > Existing Projects Into Worksapace.
- Browse to the kettle-sdk-partitioner-plugin folder and choose the project to be imported.
- Open the Ant view in Eclipse by selecting Window > Show View from the main menu and select Ant.
You may have to select Other > Ant if you have not used the Ant view before.
- Drag the file build/build.xml from your project into the Ant view, and execute the install target by double-clicking it.
- After the plugin has been deployed, restart Spoon.
- You can test the new plugin using the transformation from the demo_transform folder.
Exploring Existing Partitioners
PDI sources are useful if you want to investigate the implementation of the standard modulo partitioner. The main class is available as
org.pentaho.di.trans.ModPartitioner. The corresponding
dialog class in located in
A complete explanation of partitioning in Kettle, including sample transformations, is available here http://type-exit.org/adventures-with-open-source-bi/2011/09/partitioning-in-kettle/.