Skip to main content
Pentaho Documentation

Splunk Input

The Splunk Input transformation step enables you to connect to a Splunk server, enter a Splunk query, and get results back for use within a transformation.  Once you have completed those steps, you can stream data from Splunk into your transformation. To learn more about Splunk see their online documentation.

Prerequisites

Before using the Splunk Input step, you must have read access to a Splunk server.  Contact your Splunk system administrator for host and port details.

General

PDI_TransStep_Tab_Input_Connection.png

Enter the following information in the transformation Step name field.

  • Specifies the unique name of the Splunk Input step on the canvas. The Step Name is set to 'Splunk Input' by default.

Options

The Splunk Input step features two tabs with fields and options for defining Splunk a connection and database fields. Each tab is described below.

Connection Tab

In this tab, you can define the following connection properties, as described in the table below.

Option Description
Host names(s) or IP addres(es) Specifies the network name or address of the Splunk instance or instances. 
Port Indicates the port number of the Splunk (splunkd) server. The default value is 8089, but your administrator may have changed the port number. 
User name Specifies the username required to access the Splunk server. 
Password Indicates the password associated with the Username. 
Test connection After you define the connection, you can test it by clicking this button. 
Preview Provides a first look at the data. Clicking Preview causes the Enter preview size window to appear. Enter the maximum number of records that you want to preview, then click OK. The preview data appears in the Examine preview data window. 

Fields Tab

PDI_TransStep_Tab_Input_Fields.png

In this tab, you can define the following properties and fields, as described in the table below.

Option Description
Splunk query expression

This field defines the Splunk query.  Note that unlike the queries defined in the Splunk user interface, you must start the query with the term: search 

For example: search * | head 100

One capability of  Splunk search is field selection. This allows you to get access to Splunk-parsed fields within the _raw column.  To select specific fields, use this syntax at the end of your defined search query:

... | field index source OpCode

Execute for each row

If checked, a new query is issued for each row of data coming into the step. You can reference incoming fields of data using the ?{<Field>} syntax.  For example, if you want to use the incoming field Size to drive the limit of results coming in, type this:

search *head ?{Size}

Name

Name of the field.

Splunk name

Indicates the Splunk name for the field.

Type

Specifies the data type of the field.

Length

Indicates the length of the field.

Format

Specifies the format of the field.

Get fields

Displays the field metadata and displays it in the Fields tab. After you have detected the field metadata using the Get Fields button on the Fields tab, you may choose to delete metadata fields that are not relevant to your specific query. Since each field must be translated to its mapped data type, removing unused fields should increase performance. 

Preview

Provides a first look at the data. Clicking Preview causes the Enter preview size window to appear. Enter the maximum number of records that you want to preview, then click OK. The preview data appears in the Examine preview data window. 

Raw Field Parsing

The input step automatically attempts to parse the raw field into a number of child fields denoted by: _raw.<Field Name>.  

It parses the raw field assuming that the field is formatted with name value pairs separated by a new line character, like this:

<Name1>=<Value1>\n <Name2>=<Value2>\n

If raw field data is not formatted like this, you must post-process those fields with other steps in the transformation flow.  Note that your secondary steps may include String variables.

Date Handling

Kettle does not support the parsing of ISO-8601 date formats, which is Splunk's format for passing date objects through web services. However, you can edit the date string returned from Splunk using the Modified Java Script Value step. Use this script to parse the date:

var dateobj = str2date((substr(_time, 0, 23) + "GMT" + substr(_time, 23)).trim(), "yyyy-MM-dd'T'HH:mm:ss.SSSz");

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.