Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Splunk Input

Parent article

The Splunk Input transformation step enables you to connect to a Splunk server, enter a Splunk query, and get results back for use within a transformation. Once you have completed those steps, you can stream data from Splunk into your transformation. To learn more about Splunk see their online documentation.

Prerequisites

Before using the Splunk Input step, you must have read access to a Splunk server. Contact your Splunk system administrator for host and port details.

AEL considerations

When using the Splunk Input step with the Adaptive Execution Layer, the following factor affects performance and results:

  • Spark processes null values differently than the Pentaho engine. You will need to adjust your transformation to successfully process null values according to Spark's processing rules.

General

Splunk Input

Enter the following information in the transformation Step name field.

  • Specifies the unique name of the Splunk Input step on the canvas. The Step Name is set to Splunk Input by default.

Options

The Splunk Input step features two tabs with fields and options for defining Splunk a connection and database fields. Each tab is described below.

Connection tab

In this tab, you can define the following connection properties, as described in the table below.

OptionDescription
Host name(s) or IP address(es)Specifies the network name or address of the Splunk instance or instances.
PortIndicates the port number of the Splunk (splunkd) server. The default value is 8089, but your administrator may have changed the port number.
User nameSpecifies the user name required to access the Splunk server.
PasswordIndicates the password associated with the User name.
Test connectionAfter you define the connection, you can test it by clicking this button.
PreviewProvides a first look at the data. Clicking Preview causes the Enter preview size window to appear. Enter the maximum number of records that you want to preview, then click OK. The preview data appears in the Examine preview data window.

Fields tab

Fields tabs in Splunk             Input

In this tab, you can define the following properties and fields, as described in the table below.

OptionDescription
Splunk query expression

This field defines the Splunk query. Note that unlike the queries defined in the Splunk user interface, you must start the query with the term: search

For example:

search * | head 100

One capability of Splunk search is field selection. This allows you to get access to Splunk-parsed fields within the _raw column. To select specific fields, use this syntax at the end of your defined search query:

... | field index source OpCode

Execute for each row

If checked, a new query is issued for each row of data coming into the step. You can reference incoming fields of data using the ?{<Field>} syntax. For example, if you want to use the incoming field Size to drive the limit of results coming in, type this:

search *head ?{Size}

NameName of the field.
Splunk nameIndicates the Splunk name for the field.
TypeSpecifies the data type of the field.
LengthIndicates the length of the field.
FormatSpecifies the format of the field.
Get fieldsDisplays the field metadata and displays it in the Fields tab. After you have detected the field metadata using the Get Fields button on the Fields tab, you may choose to delete metadata fields that are not relevant to your specific query. Since each field must be translated to its mapped data type, removing unused fields should increase performance.
PreviewProvides a first look at the data. Clicking Preview causes the Enter preview size window to appear. Enter the maximum number of records that you want to preview, then click OK. The preview data appears in the Examine preview data window.

Raw field parsing

The input step automatically attempts to parse the raw field into a number of child fields denoted by:

_raw.<Field Name>

It parses the raw field assuming that the field is formatted with name value pairs separated by a new line character, like this:

<Name1>=<Value1>\n <Name2>=<Value2>\n

If raw field data is not formatted like this, you must post-process those fields with other steps in the transformation flow. Note that your secondary steps may include String variables.

Date handling

Kettle does not support the parsing of ISO-8601 date formats, which is Splunk's format for passing date objects through web services. However, you can edit the date string returned from Splunk using the Modified Java Script Value step. Use this script to parse the date:

var dateobj = str2date((substr(_time, 0, 23) + "GMT" + substr(_time, 23)).trim(), "yyyy-MM-dd'T'HH:mm:ss.SSSz");

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.