Skip to main content
Pentaho Documentation

Microsoft Excel Writer

The Microsoft Excel Writer step writes incoming rows from PDI out to an MS Excel file and supports both the .xls and .xlsx file formats. The .xls files use a binary format which is better suited for simple content, while the .xlsx files use the Open XML format which works well with templates since it can better preserve charts and miscellaneous objects.

General

Enter the following information for the step.

  • Step Name: Specifies the unique name of the step on the canvas. The Step name is set to Microsoft Excel writer by default.

Options

The Microsoft Excel Writer transformation step features two tabs with fields. Each tab is described below.

File & Sheet Tab

File Sheet Tab to the Microsoft Excel Writer PDI Transformation Step Properties Dialog Box

This tab includes the following panels and fields:

File Panel

This panel includes fields for choosing the result file name, extension, and timestamp information. If the data is split over a number of rows, multiple files are created. You can also preview the results from this panel.

Field Description
Filename

Specify name and location of the Excel file in which to write the incoming rows from PDI. Variables or parameters can be used in this field.

Do not include a file extension when the date and time are to be appended and the Extension option is specified.

Extension

Select the Excel file extension to append to the file name. Options include:

  • .xls [Excel 97 and above]
  • .xlsx [Excel 2007 and above]

The default extension is .xls.

Stream XSLX data (check box)

Select this option when writing large .xlsx files to the output file. 

  • Select this option if you want to stream .xslx files. When you select this check box, the system uses a streaming API to write large files without any memory restrictions (not exceeding MS Excel's limit of 1,048,575 rows and 16,384 columns). 

  • Clear this check box if you do not want to stream .xlsx files. 

This option is only available if you select the '.xlsx [Excel 2007 and above]' option in the Extension field above.

Split every ... data rows Specify a positive number to create a new output file every 'n' data rows. For example, if you want to split the file every 200 data rows, then enter '200' in this field. The files will be numbered.
Include stepnr in filename? (check box)

Select this option to include the copy number in the file name ('_0', for example) when you run the step in multiple copies, such as launching several copies of a step simultaneously.

  • Select this option to append a copy number to the file name when multiple copies are launched.

  • Clear this option to keep the file name as is when multiple copies are launched. A copy number will not be appended.
Include date in filename? (check box)

Select this option to include the system date in the file name, such as in the format '_20181231', for example.

  • Select this option to append the system date to the file name.

  • Clear this option to keep the file name as is. The system date will not be appended.
Include time in filename? (check box)

Select this option to include the system time in the file name, such as  in the format '_235959', for example.

  • Select this option to append the system time to the file name.

  • Clear this option to keep the file name as is. The system time will not appended.
Specify Date time format? (check box)

Select this option to include the date and time in the file name using a format from the Date time format field below.

  • Select this option to append the system date and time to the file name.

  • Clear this option to keep the file name as is. The system date and time will not be appended.
Date time format From the list, select the date and time format you want to include in the file name. See Common Formats for information on valid date and time formats you can use in this step.
Show filename(s)... (button) Click to display a list of the files which will be generated.
If output file exists

If the output file already exists, decide if you want to replace it with the new file or keep the original file. 

  • Use existing file for writing: Keep the existing output file and write the new data to it. 
  • Replace with new output file: Overwrite the existing output file with the new output file created in this transformation step.
Wait for first row before creating file (check box)

If this option is selected, the step creates the file only after it has detected a row. If this option is cleared, the output file is always created, regardless of whether rows are actually written to the file or not. 

  • Select this option if you want this step to create the output file only after the system detects a row for writing to the file. If no rows are detected, no file will be created. 

  • Clear this option if you want to always create the output file, regardless of whether rows are actually written to the file.
Add filenames to result (check box)

Using this option adds file names to the MS Excel output file.

  • Select if you want to add file names to the Excel output file.

  • Clear if you do not want to add file names to the Excel output file.

Sheet Panel

This panel includes fields for setting security for the spreadsheet in the MS Excel output file. 

Field Description
Sheet name (max. 31 characters)

Specify the name of the worksheet within the MS Excel file in which to write the incoming rows from PDI. Variables or parameters can be used in this field.

Make this the active sheet (check box)

If checked, the MS Excel file will open by default on the above sheet when the file is opened in Excel. 

  • Select this option to make the specified sheet the default sheet when the file opens in Excel.
  • Clear this option if you do not want to set a default sheet.
If sheet exists in output file

If the sheet already exists in the output file, decide if you want to replace it with the new sheet or keep the original sheet. For example, you may be using a template sheet or writing to existing files, and therefore may choose to overwrite or replace the sheet.

  • Write to existing sheet: Keep the existing sheet and write the new data to it.
  • Replace with new sheet: Overwrite the existing sheet with the new sheet in the output file.
Protect sheet? (XLS format only) (check box)

The .xls file format allows to protect an entire sheet from changes. If selected, you need to provide a password. MS Excel will indicate that the sheet was protected by the user you specify here. 

  • Select to protect the Excel sheet from editing with a specified user and password. 
  • Clear to leave your Excel sheet unprotected. Any user can make changes to it.
Protected by user

Specify the name of the user you want shown in MS Excel as password-protecting this sheet. Variables or parameters can be used in this field.

Password

Specify the password for unlocking the editing features for this sheet in MS Excel. Variables or parameters can be used in this field.

Template Panel

When you create new files or replace existing files, you may choose to create a copy of an existing template file instead. Please make sure that the template file is of the same type as the output file: both must be either .xls or .xlsx. This panel includes fields for using template files and sheets in your Excel output file.

Field Description
Use template when creating new files (check box)

Select this option if you want to use a specified MS Excel template to create the output file.

  • Select this option to specify an Excel template for use in creating the output file. If you select this option, you must also specify the template file name in the Template file field below.
  • Clear this option if you do not want to specify a template file.
Template file

If you selected the Use template when creating new files check box above, then enter the template file name you want to use, or click the Browse button and navigate to it.

Use template when creating new sheets (check box)

Select this option if you want to use a specified template sheet to create the output sheet.

  • Select this option to specify a template sheet for use in creating the output sheet. If you select this option, you must also specify the name of the template sheet in the Template sheet field below.
  • Clear this option if you do not want to specify a template sheet.
Template sheet

If you selected the Use template when creating new sheets check box above, then enter the template sheet name you want to use, or click the Browse button and navigate to it.

Hide Template sheet (check box)

Decide if you want the template sheet to be visible when the MS Excel output file is opened.

  • Select this option to hide the template sheet from users who open the output file.
  • Clear this option to show the template sheet to users who open the output file.

Content Tab

Content Tab to the Microsoft Excel Writer PDI Transformation Step Properties Dialog Box

This tab includes the following panels and fields:

Content Options Panel

This panel includes options for customizing the format of the spreadsheet.

Field Description
Start writing at cell

Specify the cell in the MS Excel spreadsheet (letter column, number row) to start writing to in Excel notation, for example, 'A1'

When writing rows

Specify how to write in new rows when the spreadsheet contains existing rows with data. Select one of the following options:

  • Overwrite existing cells: Keep the existing cells and write the new data to them. 
  • Shift existing cells down: Append the new rows to the top of the sheet, shifting the existing cells down.
Write Header (check box)

Decide whether you want to use the first row as the header row in the output spreadsheet.

  • Select to use the first row of cells as the header row names for the output spreadsheet.
  • Clear to keep the first row formatted as is. 
Write Footer (check box)

Decide whether you want to use the last row as the footer row in the output spreadsheet.

  • Select to use the last row of cells as the footer row names for the output spreadsheet
  • Clear to keep the last row formatted as is.
Auto size columns (check box)

Decide whether the columns should automatically resize to fit the content. Note that auto-sizing is not a feature that the .xls and .xlsx file formats directly support, so results may vary. 

  • Select this option to auto-size columns to fit the content.
  • Clear this option to have columns maintain their width, despite the size of the content.

To ensure the most accurate column-width calculations when using the auto-size feature, install the appropriate fonts on your server environment.

Force formula recalculation (check box)

Decide whether you want the step to attempt to update all the formula fields in the output file. 

When this option is selected for .xls file formats, a flag is set by the step. The formulas are recalculated as soon as the file is opened in MS Excel. When this option is selected for .xlsx file formats, the step itself must attempt to recalculate the formula fields. The underlying POI library does not support the full set of Excel formulas, which may contribute to errors if the step cannot recalculate the formulas.

  • Select this option to force the step to update all the formula fields in the output file. 
  • Clear this option to keep the formula fields as is and not update them.
Leave styles of existing cells unchanged (check box)

Decide whether you want the step to set the style of existing cells it is writing to. This option is useful when you are writing to pre-formatted template sheets. 

  • Select this option to leave the style information of existing cells unchanged.
  • Clear this option to overwrite the style of existing cells.

When Writing to Existing Sheet Panel

This panel includes options for customizing how you want content to appear when it is written to an existing spreadsheet.

Field Description
Start writing at end of sheet (appending lines) (check box)

Decide if you want the output to begin at the last defined row of the spreadsheet. 

  • Select this option if you want the step to append rows after the last defined row of the spreadsheet.
  • Clear this option if you want the step to overwrite existing defined rows.
Offset by ... rows

Specify any non-zero number to cause the step to move this amount of rows down (positive numbers) or up (negative numbers) before writing rows. Negative numbers may be useful if you need to append to a sheet, but still preserve a pre-styled footer. 

Begin by writing ... empty lines

Specify any positive number to cause the step to leave this amount of rows empty.

When writing to an existing sheet, you may decide to write a few empty lines first. This option is useful if you want to append a few rows to an existing sheet, but offset them to see the bounds of each generated block of rows.   

Omit header (check box)

Decide if you want to include a header when writing to an existing spreadsheet. Omitting a header is useful if you want to generate new files with a header, but you do not want to repeat the header in subsequent runs where you append to the same file.

  • Select this option to omit the header.
  • Clear this option to include the header.

Fields Panel

This panel includes Fields table for specifying the fields that are written to the Excel files. The Fields table contains the following columns:

Column Description
Name

The name of the field.

Type

The data type of a field: String, Date, or Number.

Format

The Excel format to use in the sheet. Please consult the Excel manual for valid formats. There are some online references as well. 

Style from cell

A cell in the Excel spreadsheet (letter column, number row) such as 'A1', 'B4', etc., to copy the styling from for this column. This value is usually a pre-styled cell in a template.

Field title

If set, this is used for the Header/Footer instead of the Kettle field name.

Header/Footer style from cell

A cell to copy the styling from for the Header/Footer (usually some pre-styled cell in a template). 

field contains formula

Set to 'Yes', if the field contains an Excel formula.

You do not need to include the notation '=' before your field value. 

Hyperlink

A field that contains the target to link to. Supported targets can be links to other Excel spreadsheet cells, website URL's, ftp's, email addresses, or local documents. 

Cell comment (XLSX)

The .xlsx format allows to put comments on cells. If you would like to generate comments, you may specify fields holding the comment and author for a given column. 

Cell comment author (XLSX)

The .xlsx format allows to put comments on cells. If you would like to generate comments, you may specify fields holding the comment and author for a given column. 

Metadata Injection Support

You can use the Metadata Injection supported fields with ETL Metadata Injection step to pass metadata to your transformation at runtime. The following Value fields of the Microsoft Excel Writer step support metadata injection:

  • Name, Type, Format, Style from Cell, Field Title, Header/Footer Style from Cell, Field Contains Formula, Hyperlink, Cell Comment (XLSX), and Cell Comment Author (XLSX)