Explains how to use the ElasticSearch Bulk Insert step.
Elastic is a platform that consists of products that search, analyze, and visualize data. The Elastic platform includes ElasticSearch, which is a Lucene-based, multi-tenant capable, and distributed search and analytics engine
Use this step if you have records that you want to submit to an ElasticSearch server to be indexed. When record data flows out of the ElasticSearch Bulk Loader step, PDI sends it to ElasticSearch along with metadata that you indicate such as the index and type. This step is commonly used when you want to send a batch of data to an ElasticSearch server and create new indexes of a certain type (category). It is also used when you want to add a batch of data to an index or category.
Because this is an output step, it is often placed at the end of the transformation.
Since ElasticSearch has a REST web interface you can also use the REST Client step to send data to an ElasticSearch server and to perform other REST functions.
- A working server that has ElasticSearch version 1.5.2 already installed. You should be able to connect to ElasticSearch from the computer that you are running PDI on.
- Insert, Update, and Create privileges for the directories on the ElasticSearch server that you need to access.
- Files or data you want ElasticSearch to index.
This step consists of four tabs: General, Servers, Fields, and Settings.
|Step name||Indicates the name given to this step.|
|Help||Displays help documentation.|
|OK||Saves the information you entered, then closes the window.|
|Cancel||Discards the changes you entered, then closes the window.|
|Index||Specifies the name of the index you want to add data to. If an index with that name doesn't yet exist in ElasticSearch, it creates one.|
|Type||Indicates the category the data should be placed in. You define the category. In general practice, the type sometimes describes the data. For example, if the index is "twitter" the type might be "tweet."|
|Test Index||Checks whether the index exists in ElasticSearch.|
|Batch Size||Indicates the number of items in the batch. (If you set the batch size is set to one, it is not a bulk insert, but setting it to a higher number is.)|
|Stop on Error||Stops processing if there is an error, such as a problem with adding the document or the bulk push to the index or if the JSON is not well-formed. If this option is not selected, and an error occurs, the row is not processed, but the transformation keeps running so that other rows are processed.|
|Batch Timeout||Indicates how long batch should be processed before the batch times out, and processing ends.|
|ID Field||Indicates the name of the ID field in the file.|
|Overwrite if exists||If the output file exists because this transformation was run before, allows the output to be overwritten.|
|Output Rows||Sends the rows that are successfully processed by ElasticSearch to the to the next step (or the output). If you've checked Stop on Error, the rows that were successful up until the time the error occurs is sent to the next step (or the output). Otherwise, rows successfully processed by Elastic search rows are sent to the next step (or the output).|
|ID Output Field||Indicates the name if the ID field that is in the output. If this is left blank, the value in the ID Field is used instead.|
|JSON Input||Indicates whether the input is a JSON file.|
|JSON Field||Indicates the JSON node from which processing should begin.|
Elastic, which is the company that makes ElasticSearch, has an API as well as user documentation that can give you more background on the fields in this step.
- ElasticSearch reference information can be found here: https://www.elastic.co/guide/en/ElasticSearch/reference/current/index.html.
- The Bulk API is here: https://www.elastic.co/guide/en/ElasticSearch/reference/current/docs-bulk.html.