Skip to main content
Pentaho Documentation

Retrieving Data from a Flat File

Overview

Explains how to retrieve data from a flat file.

Follow the instructions below to retrieve data from a flat file.

  1. Select File > New > Transformation in the upper left corner of the Spoon window to create a new transformation.

  1. Under the Design tab, expand the Input node; then, select and drag a Text File Input step onto the canvas.
  2. Double-click on the Text File input step. The Text file input window appears.  This window allows you to set the properties for this step.

TextFileInput_File.png

  1. In the Step Name field, type Read Sales Data. This renames the Text file input step to Read Sales Data.
  2. Click Browse to locate the source file, sales_data.csv, available at ...\design-tools\data-integration\samples\transformations\files.   (The Browse button appears near the top right side of the window near the File or Directory field.)  Click Open​.  The path to the source file appears in the File or directory field.
  3. Click Add. The path to the file appears under Selected Files
  4. To look at the contents of the sample file:
    1. Click the Content tab, then set the Format field to Unix​.  
    2. Click the File tab again and click the Show file content near the bottom of the window.  
    3. The Nr of lines to view window appears.  Click the OK button to accept the default.
    4. The Content of first file window displays the file.  Examine the file to see how that input file is delimited, what enclosure character is used, and whether or not a header row is present. In the sample, the input file is comma (,) delimited, the enclosure character being a quotation mark (“) and it contains a single header row containing field names.
    5. Click the Close button to close the window.
  5. To provide information about the content:
    1. Click the Content tab. The fields under the Content tab allow you to define how your data is formatted.
    2. Make sure that the Separator is set to comma (,) and that the Enclosure is set to quotation mark ("). Enable Header because there is one line of header rows in the file.

textfileinput_content.png

  1. Click the Fields tab and click Get Fields to retrieve the input fields from your source file. When the Nr of lines to sample window appears, enter 0 in the field then click OK.

  2. if the Scan Result window displays, click Close to close the window.
  3. To verify that the data is being read correctly:
    1. Click Preview Rows.
    2. In the Enter preview size window click OK.  The Examine preview data window appears.  
    3. Review the data, then click Close.
  4. Click OK to save the information that you entered in the step.
  5. To save the transformation, do these things.
    1. Select File > Save to save the transformation.
    2. The Transformation Properties window appears.  In the Transformation Name field, type Getting Started Transformation.  (Note that the Transformation Properties window appears because you are connected to a repository.  If you were not connected to the repository, the standard save window would appear.)

TransformationProperties.png

  1. In the Directory field, click the folder icon.
  2. Expand the Home directory and double-click the folder in which you want to save the transformation.  Your transformation is saved in the Pentaho Repository.
  3. Click OK to close the Transformation Properties window.  When prompted for a comment, enter one then click OK. Your comment is stored for version control purposes in the Pentaho Repository.