The Table Input step reads information from a connected database using SQL statements. Basic SQL statements can be generated automatically by clicking the Get SQL select statement button.
When using the Table Input step to connect to a Hive database through the Adaptive Execution Layer (AEL), Hive must be on the same cluster as Spark and must be managed by YARN. The following factors will affect performance and results:
- Only Hive version 2.x.y is supported.
- The following options in the step are not supported:
- Enable Lazy conversion
- Execute for each row
- Pooling and Clustering database connection options.
- Structures, arrays, and user-defined data types are not supported.
The following fields are general to this transformation step:
|Step name||Specify the unique name of the Table Input step on the canvas. You can customize the name or leave it as the default.|
|Connection||Select the name of a connected database from the dropdown list.|
|Edit (button)||Click to edit your current database connection.|
|New (button)||Click to establish a new database connection.|
|Wizard (button)||Click to open a new database connection using the Wizard.|
The Execute SQL Script step has the following options:
|SQL||Specify a SQL statement to read information from the connected database. You can also click the Get SQL select statement button to browse tables and automatically generate a basic select statement.|
|Enable lazy conversion||Select this option to enable the lazy conversion algorithm. When selected, lazy conversion avoids unnecessary data type conversions when possible, which can significantly improve performance.|
|Replace variables in script?||Select this option to replace variables in the script. This feature provides testing capabilities with or without variable substitutions.|
|Insert data from step||Specify the input step name where PDI can expect information to come from. This information can then be inserted into the SQL statement. The locator where PDI inserts information is indicated with a question mark ( ? ).|
|Execute for each row?||Select this option to to execute the query for each individual row.|
|Limit size||Specify the number of lines to read from the database. A value of zero ('0') indicates to read all lines.|
|Preview (button)||Click Preview to open a new window and view an execution log derived from a temporary transformation with two steps: the Table Input step and the Dummy step. To see the log, click Logs in the Preview window that opens.|
Below is an SQL statement:
SELECT * FROM customers WHERE changed_date BETWEEN ? AND ?
This SQL statement requests two calendar dates, to create a range, that are read from the Insert data from step option. The target date range can be provided using the Get System Info step. For example, if you want to read all customers that have had their data changed yesterday, you can get a target range for yesterday and read the customer data.
Metadata Injection Support
You can use the Metadata Injection supported fields with the ETL Metadata Injection step to pass metadata to your transformation at runtime. The following Option fields of the Table Input Step support metadata injection:
- Options: Connection, SQL, Enable Lazy Conversion, Replace Variables in Script?, Insert Data from Step, and Limit Size.