The SSTable Output step writes to a filesystem directory as an Apache Cassandra SSTable using CQL (Cassandra Query Language) version 3.x.
When using the SSTable Output step with the Adaptive Execution Layer, the following factor affects performance and results:
- Spark processes null values differently than the Pentaho engine. You will need to adjust your transformation to successfully process null values according to Spark's processing rules.
The following options are available for the SSTable Output transformation step.
|Step name||Specify the unique name of the SSTable Output step on the canvas. You can customize the name or leave it as the default.|
|Cassandra yaml file||Specify the location of YAML file. A cassandra.yaml file is the main configuration file for Cassandra. It defines node and cluster configuration details.|
|Directory||Specify where to write the output. This directory points to the target table to load to and must match the Keyspace and Table fields.|
|Keyspace||Specify the keyspace (database) name of the target table to load. The name specified must match the Directory field.|
|Table||Specifies the table (column family) to upload. It assumes the metadata for this table was previously defined in Cassandra. The table specified must match the Directory field.|
|Incoming fields to use as the key||Specify which incoming row to use as the key. You can use Set Fields to specify the key from the names of incoming PDI transformation fields.|
|Set Fields||Select from a list of incoming PDI transformation fields to specify as the Incoming fields to use as the key.|
|Buffer (MB)||Specify buffer size to use. A new table file is written every time the buffer is full.|