Steps using Dataset tuning options

Last updated
Save as PDF

As part of Spark Tuning, you can use the Dataset tuning options with the following steps.

Step category	Step name
Agile	MonetDB Agile Mart Table Agile Mart
Big Data	Avro output Cassandra Input Cassandra Output CouchDB Input Hadoop file output HBase output HBase row decoder MapReduce Input MapReduce Output MongoDB Input MongoDB Output Orc output Parquet output SSTable Output
Bulk loading	ElasticSearch Bulk Insert Greenplum Load Infobright Loader Ingres VectorWise Bulk Loader MonetDB Bulk Loader MySQL Bulk Loader Oracle Bulk Loader PostgresSQL Bulk Loader SAP HANA Bulk Loader Teradata Fastload Bulk Loader Teradata TPT Insert Upsert Bulk Loader Vertica Bulk Loader
Cryptography	Decrypt files with PGP Encrypt Files with PGP Secret Key Generator Symmetric Cryptography
Data Mining	AARF Output Knowledge Flow Weka Forecasting Weka Scoring
Data Warehouse	Combination lookup/update Dimension lookup/update
Deprecated	Aggregate Rows Example plugin Get previous row fields Greenplum Bulk loader IBM WebSphere MQ Consumer IBM WebSphere MQ Producer JMS Consumer (deprecated) JMS Producer (deprecated) LucidDB Bulk Loader LucidDB Streaming Loader OpenERP Object Delete OpenERP Object Input OpenERP Object Output Palo Cell Input Palo Cell Output Palo Dimension Input SAP Input Text file output (deprecated)
Experimental	Script SFTP Put
Flow	Abort Annotate stream Blocking step Detect empty stream Dummy ETL metadata injection Filter rows Identify last row in a stream Java filter Shared Dimension Switch / Case Transformation executor
Inline	Injector Socket reader Socket writer
Input	CSV file input Data Grid De-serialize from file Email messages input ESRI Shapefile Reader Fixed file input Generate random credit card numbers Generate random value Generate rows Get data from XML Get File Names Get File Rows Count Get repository names Get SubFolder names Get System Info Get table names Google Analytics Google Docs Input GZIP CSV Input HL7 Input JMS Consumer JSON Input LDAP Input LDIF Input Load file content in memory Microsoft Access Input Microsoft Excel Input Mondrian Input OLAP Input Property Input RSS Input Salesforce Input SAS Input XBase Input XML Input Stream (StAX) Yaml Input
Job	Copy rows to result Get files from result Get rows from result Get Session Variables Set files in result Set Session Variables
Joins	Join rows Merge join Merge rows (diff) Multiway Merge Join Sorted Merge XML Join
Lookup	Call DB Procedure Check if a column exists Check if file is locked Check if webservice is available Database join Database lookup Dynamic SQL row File exists Fuzzy match HTTP client HTTP Post MaxMind GeoIP Lookup REST Client Stream lookup Table exists Web services lookup
Mapping	Mapping Mapping input specification Mapping output specification Simple mapping
N/A	Spark Special - FileInputResolver Spark Special - GenericSparkOperation Spark Special - RecordsFromStreamSparkOperation
Output	Automatic Documentation Output Delete Insert / Update JSON output LDAP Output Microsoft Access Output Microsoft Excel Output Microsoft Excel Writer Pentaho Reporting Output Properties Output RSS Output Salesforce Delete Salesforce Insert Salesforce Update Salesforce Upsert Serialize to file SQL File Output Synchronize after merge Table output Text file output Update XML Output
Pentaho Server	Call Endpoint Get Session Variables Set Session Variables
Scripting	Execute row SQL script Execute SQL script Formula Modified Java Script Value Python Executor Regex Evaluation Rule Accumulator Rule Executor User Defined Java Class User Defined Java Expression
Statistics	Analytic Query Group by Memory group by Output steps metrics R script executor Reservoir Sampling Sample rows Univariate Statistics
Streaming	AMQP Producer JMS Producer Kafka Producer Get records from stream MQTT producer
Transform	Add a Checksum Add constants Add sequence Add value fields changing sequence Add XML Calculator Closure Generator Concat Fields Get ID from slave server Number range Replace in string Row denormaliser Row flattener Row Normaliser Select values Set field value Set field value to a constant Sort rows Split field to rows Split Fields Splunk Input Splunk Output String operations Strings cut Unique rows Unique rows (Hashset) Value Mapper XSL Transformation
Utility	Change file encoding Clone row Delay row Edi to XML Execute a process If field value is null Mail Metadata structure of stream Null if... Process files Run SSH commands Send messge to Syslog Table Compare Write to log Zip File
Validation	Credit card validator Data Validator Mail Validator XSD Validator

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com.