Skip to main content
Hitachi VantaraPentaho Documentation
Pentaho Documentation

Define Native (JDBC) Database Connections

Overview

Explains how to define Native JDBC Database Connections.

Once you have chosen to use the Native (JDBC) access protocol, here are configuration and maintenance tasks you can perform.

Add Drivers

The DI Server and workstations need the appropriate driver to connect to the database that stores your data. Your database administrator, Chief Intelligence Officer, or IT manager should be able to provide the appropriate driver. If not, you can download drivers from your database vendor's website. See the Supported Technologies to ensure that your database and its driver are supported by Pentaho.

Note: If you are using a Microsoft SQL Server (MSSQL), you might need to use an alternative, non-vendor-supported driver called JTDS. Contact Pentaho support to ensure that you are adding the correct driver.

Installing Drivers

Once you have the correct driver, copy it to these directories.

DI Server: /pentaho/server/data-integration-server/tomcat/webapps/pentaho-di/WEB-INF/lib/ .

Spoon: data-integration/lib

You must restart Spoon for the driver to take effect.

There should be only one driver for your database in this directory. Ensure that there are no other versions of the same vendor's driver in this directory. If there are, back up the old driver files and remove them to avoid version conflicts. This is a concern when you are adding a driver for the same database type as your Pentaho solution repository. If you have any concerns about how to proceed, contact Pentaho support.

When the driver files are in place restart the server.

Connecting to a Microsoft SQL Server Using Integrated or Windows Authentication

If you are using a Microsoft SQL Server (MSSQL), you might need to use an alternative, non-vendor-supported driver called JTDS. Contact Pentaho support to ensure that you are adding the correct driver

For Microsoft Windows, most JDBC drivers support Type 2 integrated authentication through the integratedSecurity connection string property. To use integrated authentication, copy the sqljdbc_auth.dll file to all machines and directories to which you copied the JDBC driver. You can find this file in this location.

<installation directory>\sqljdbc_<version>\<language>\auth\
If running: Use the sqljdbc_auth.dll file here:
32-bit Java Virtual Machine (JVM) even if the operating system is version x64 x86 folder
64-bit JVM on a x64 processor x64 folder
64-bit JVM on an Itanium processor IA64 folder

Specify Native (JDBC) Connection Information

Before you can create the connection, you must have installed the appropriate JDBC driver for your particular data.

Pentaho Data Integration (PDI) allows you to define connections to multiple databases provided by multiple database vendors (MySQL, Oracle, PostgreSQL, and many more). PDI ships with the most suitable JDBC drivers for PostgreSQL, our default database.

Note:

Pentaho recommends that you avoid using ODBC connections. The ODBC to JDBC bridge driver does not always provide an exact match and adds another level of complexity that may affect performance. The only time you may have to use ODBC is if there is no available JDBC driver. For details, this article explains "Why you should avoid ODBC." http://wiki.pentaho.com/pages/viewpage.action?pageId=14850644.

When you define a database connection, the connection information (for example, the user name, password, and port number) is stored in the DI Repository and is available to other users when they connect to the repository. If you are not using the DI Repository, the database connection information is stored in the XML file associated with the transformation or job.

Connections that are available for use with a transformation or job are listed under the Database connections node in the View pane in Spoon.

You must have information about your database, such as your database type, port number, user name and password, before you define a JDBC connection. You can also set connection properties using variables. Variables provide you with the ability to access data from multiple database types using the same transformations and jobs.
Note: Make sure to use clean ANSI SQL that works on all used database types.

File:/ssSpoonDboxDatabaseConnection.png

  1. From within Spoon, navigate to the View tab of the Explorer pane. Double-click on the Database connections folder. The Database Connection dialog box appears.
    Section Name What to Do
    Connection Name Type name that uniquely identifies your new connection
    Connection Type Select the type of database to which you are connecting
    Access Select your method of access. Available access types depend on the connecting database type.
    Host Name Type the name of the server that hosts the database to which you are connecting. Alternatively, you can specify the host by IP address.
    Database Name Enter the name of the database to which you are connecting. If you are using a ODBC connection, enter the Data Source Name (DSN) in this field.
    Port Number Enter the TCP/IP port number if it is different from the default.
    User name Optionally, type the user name used to connect to the database.
    Password Optionally, type the password used to connect to the database.
  2. Click Test. A confirmation message displays if Spoon is able to establish a connection with the target database.
  3. Click OK to save your entries and exit the Database Connection dialog box.
  4. From within the View tab, right-click on the connection and select Share from the list that appears. This shares the connection with your users. They will be able to select the shared connection. From within the View tab, click Explore to open the Database Explorer for an existing connection. This shows you the schemas and tables inside the connection.

Add Database-Specific Options

Add database-specific options by adding parameters to the generated URL.

  1. From within the Database Connection dialog box, select Options.
  2. Select the first available row in the parameter table.
  3. Choose the database type and enter a valid parameter name and its corresponding value.
    Note: For more database-specific configuration help, click Help. A new browser opens and displays additional information about configuring the JDBC connection for the currently selected database type.
  4. Click OK to save your entries.

Advanced Configuration of Database Connections

The Advanced option in the Database Connection dialog box allows you to configure properties that are, for most part, associated with how SQL is generated. These options allow you to set a standard across all of your SQL tools, ETL tools and design tools. All database table names and column names are always upper case or lower case no matter what users do in the tools.

Feature Description
Supports boolean data types Instructs PDI to use native boolean data types if supported by the database.
Quote all in database Enables the databases to use a case-sensitive tablename (for example MySQL is case-sensitive on Linux but not case sensitive on Windows). If you quote the identifiers, the databases will use a case sensitive tablename.
Force all to lower case Enables all identifiers to lower case.
Force all to upper case Enables all identifiers to upper case.
Preferred schema name... Enter the preferred schema name (for example, MYSCHEMA).
Enter SQL name... Enter the SQL statement used to initialize a connection.

Pentaho has implemented a database-specific quoting system that allows you to use any name or character acceptable to the supported databases' naming conventions.

Pentaho Data Integration contains a list of reserved words for most of the supported databases. To ensure that quoting behaves correctly, Pentaho has implemented a strict separation between the schema (user/owner) of a table and the table name itself. Doing otherwise makes it impossible to quote tables or fields with one or more periods in them correctly. Placing periods in table and field names is common practice in some ERP systems (for example, fields such as "V.A.T.")

To avoid quoting-related errors, a rule stops the Pentaho Data Integration from performing quoting activity when there is a start or end quote in the table name or schema. This allows you to specify the quoting mechanism yourself.

Define Connection Pooling

Instead of having a connection open for each individual step in a transformation, you can set up a connection pool and define options like the initial pool size, maximum pool size, and connection pool parameters. For example, you might start with a pool of ten or fifteen connections, and as you run jobs or transformations, the unused connections drop off. Pooling helps control database access, especially if you have transformations that contain many steps and that require a large number of connections. Pooling can also be implemented when your database licensing restricts the number of active concurrent connections.

This table shows descriptions of the pooling options.

Feature Description
Enable connection pooling Enables connection pooling
Pool Size Sets the initial size of the connection pool; sets the maximum number of connections in the connection pool
Parameters Allows you to define additional custom pool parameters; click Restore Defaults when appropriate
Description Allows you to add a description for your parameters
  1. Select Enable Connection Pooling.
  2. Type the initial pool size in the Initial: area and the maximum pool size in the Maximum: area.
  3. Select the parameters you need from within the Parameters: area. A Description of the parameter appears in the Description: area when you select a check box.
  4. Click OK to save your selections and close the Database Connection dialog box.

Connect to Clusters

This option allows you to enable clustering for the database connection and create connections to the data partitions. To create a new data partition, enter a Partition ID and the Host Name, Port, Database, User Name, and Password for connecting to the partition.

Modify Connections

This table contains information about other database-related connection tasks you can perform.

Task Description
Edit a Connection Right-click on the connection name and select Edit.
Duplicate a Connection Right-click on the connection name and select Duplicate.
Copy to a Clipboard Allows you to copy the XML defining the step to the clipboard. You can then paste this step into another transformation. Double-click on the connection name in the tree or right-click on the connection name and select Copy to Clipboard.
Delete a Connection Double-click on the connection name in the tree or right-click on the connection name and select Delete.
SQL Editor To execute SQL command against an existing connection, right-click on the connection name and select SQL Editor.
Clear the Database Cache To speed up connections Pentaho Data Integration uses a database cache. When the information in the cache no longer represents the layout of the database, right-click on the connection in the tree and select Clear DB Cache.... This command is commonly used when databases tables have been changed, created or deleted.
Share a Connection Rather than redefining a connection each time you create a job or transformation on your local device, right-click and select Share to share the connection information among jobs and transformations.
Exploring the Database Double-click on the connection name in the tree or right-click on the connection name and select Explore.
Show dependencies Right-click a connection name and select Show dependencies to see all of the transformations and jobs that use this database connection.