Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Steps using Dataset tuning options

Parent article

As part of Spark Tuning, you can use the Dataset tuning options with the following steps.

Step categoryStep name
Agile
  • MonetDB Agile Mart
  • Table Agile Mart
Big Data
  • Avro output
  • Cassandra Input
  • Cassandra Output
  • CouchDB Input
  • Hadoop file output
  • HBase output
  • HBase row decoder
  • MapReduce Input
  • MapReduce Output
  • MongoDB Input
  • MongoDB Output
  • Orc output
  • Parquet output
  • SSTable Output
Bulk loading
  • ElasticSearch Bulk Insert
  • Greenplum Load
  • Infobright Loader
  • Ingres VectorWise Bulk Loader
  • MonetDB Bulk Loader
  • MySQL Bulk Loader
  • Oracle Bulk Loader
  • PostgresSQL Bulk Loader
  • SAP HANA Bulk Loader
  • Teradata Fastload Bulk Loader
  • Teradata TPT Insert Upsert Bulk Loader
  • Vertica Bulk Loader
Cryptography
  • Decrypt files with PGP
  • Encrypt Files with PGP
  • Secret Key Generator
  • Symmetric Cryptography
Data Mining
  • AARF Output
  • Knowledge Flow
  • Weka Forecasting
  • Weka Scoring
Data Warehouse
  • Combination lookup/update
  • Dimension lookup/update
Deprecated
  • Aggregate Rows
  • Example plugin
  • Get previous row fields
  • Greenplum Bulk loader
  • IBM WebSphere MQ Consumer
  • IBM WebSphere MQ Producer
  • JMS Consumer (deprecated)
  • JMS Producer (deprecated)
  • LucidDB Bulk Loader
  • LucidDB Streaming Loader
  • OpenERP Object Delete
  • OpenERP Object Input
  • OpenERP Object Output
  • Palo Cell Input
  • Palo Cell Output
  • Palo Dimension Input
  • SAP Input
  • Text file output (deprecated)
Experimental
  • Script
  • SFTP Put
Flow
  • Abort
  • Annotate stream
  • Blocking step
  • Detect empty stream
  • Dummy
  • ETL metadata injection
  • Filter rows
  • Identify last row in a stream
  • Java filter
  • Shared Dimension
  • Switch / Case
  • Transformation executor
Inline
  • Injector
  • Socket reader
  • Socket writer
Input
  • CSV file input
  • Data Grid
  • De-serialize from file
  • Email messages input
  • ESRI Shapefile Reader
  • Fixed file input
  • Generate random credit card numbers
  • Generate random value
  • Generate rows
  • Get data from XML
  • Get File Names
  • Get File Rows Count
  • Get repository names
  • Get SubFolder names
  • Get System Info
  • Get table names
  • Google Analytics
  • Google Docs Input
  • GZIP CSV Input
  • HL7 Input
  • JMS Consumer
  • JSON Input
  • LDAP Input
  • LDIF Input
  • Load file content in memory
  • Microsoft Access Input
  • Microsoft Excel Input
  • Mondrian Input
  • OLAP Input
  • Property Input
  • RSS Input
  • Salesforce Input
  • SAS Input
  • XBase Input
  • XML Input Stream (StAX)
  • Yaml Input
Job
  • Copy rows to result
  • Get files from result
  • Get rows from result
  • Get Session Variables
  • Set files in result
  • Set Session Variables
Joins
  • Join rows
  • Merge join
  • Merge rows (diff)
  • Multiway Merge Join
  • Sorted Merge
  • XML Join
Lookup
  • Call DB Procedure
  • Check if a column exists
  • Check if file is locked
  • Check if webservice is available
  • Database join
  • Database lookup
  • Dynamic SQL row
  • File exists
  • Fuzzy match
  • HTTP client
  • HTTP Post
  • MaxMind GeoIP Lookup
  • REST Client
  • Stream lookup
  • Table exists
  • Web services lookup
Mapping
  • Mapping
  • Mapping input specification
  • Mapping output specification
  • Simple mapping
N/A
  • Spark Special - FileInputResolver
  • Spark Special - GenericSparkOperation
  • Spark Special - RecordsFromStreamSparkOperation
Output
  • Automatic Documentation Output
  • Delete
  • Insert / Update
  • JSON output
  • LDAP Output
  • Microsoft Access Output
  • Microsoft Excel Output
  • Microsoft Excel Writer
  • Pentaho Reporting Output
  • Properties Output
  • RSS Output
  • Salesforce Delete
  • Salesforce Insert
  • Salesforce Update
  • Salesforce Upsert
  • Serialize to file
  • SQL File Output
  • Synchronize after merge
  • Table output
  • Text file output
  • Update
  • XML Output
Pentaho Server
  • Call Endpoint
  • Get Session Variables
  • Set Session Variables
Scripting
  • Execute row SQL script
  • Execute SQL script
  • Formula
  • Modified Java Script Value
  • Python Executor
  • Regex Evaluation
  • Rule Accumulator
  • Rule Executor
  • User Defined Java Class
  • User Defined Java Expression
Statistics
  • Analytic Query
  • Group by
  • Memory group by
  • Output steps metrics
  • R script executor
  • Reservoir Sampling
  • Sample rows
  • Univariate Statistics
Streaming
  • AMQP Producer
  • JMS Producer
  • Kafka Producer
  • Get records from stream
  • MQTT producer
Transform
  • Add a Checksum
  • Add constants
  • Add sequence
  • Add value fields changing sequence
  • Add XML
  • Calculator
  • Closure Generator
  • Concat Fields
  • Get ID from slave server
  • Number range
  • Replace in string
  • Row denormaliser
  • Row flattener
  • Row Normaliser
  • Select values
  • Set field value
  • Set field value to a constant
  • Sort rows
  • Split field to rows
  • Split Fields
  • Splunk Input
  • Splunk Output
  • String operations
  • Strings cut
  • Unique rows
  • Unique rows (Hashset)
  • Value Mapper
  • XSL Transformation
Utility
  • Change file encoding
  • Clone row
  • Delay row
  • Edi to XML
  • Execute a process
  • If field value is null
  • Mail
  • Metadata structure of stream
  • Null if...
  • Process files
  • Run SSH commands
  • Send messge to Syslog
  • Table Compare
  • Write to log
  • Zip File
Validation
  • Credit card validator
  • Data Validator
  • Mail Validator
  • XSD Validator