The DI Server and workstations need the appropriate driver to connect to the database that stores your data. Your database administrator, Chief Intelligence Officer, or IT manager should be able to provide the appropriate driver. If not, you can download drivers from your database vendor's website. See the Supported Technologies to ensure that your database and its driver are supported by Pentaho.
Once you have the correct driver, copy it to these directories.
DI Server: /pentaho/server/data-integration-server/tomcat/webapps/pentaho-di/WEB-INF/lib/ .
You must restart Spoon for the driver to take effect.
There should be only one driver for your database in this directory. Ensure that there are no other versions of the same vendor's driver in this directory. If there are, back up the old driver files and remove them to avoid version conflicts. This is a concern when you are adding a driver for the same database type as your Pentaho solution repository. If you have any concerns about how to proceed, contact Pentaho support.
When the driver files are in place restart the server.
Connecting to a Microsoft SQL Server Using Integrated or Windows Authentication
If you are using a Microsoft SQL Server (MSSQL), you might need to use an alternative, non-vendor-supported driver called JTDS. Contact Pentaho support to ensure that you are adding the correct driver
For Microsoft Windows, most JDBC drivers support Type 2 integrated authentication through the integratedSecurity connection string property. To use integrated authentication, copy the sqljdbc_auth.dll file to all machines and directories to which you copied the JDBC driver. You can find this file in this location.
|If running:||Use the sqljdbc_auth.dll file here:|
|32-bit Java Virtual Machine (JVM) even if the operating system is version x64||x86 folder|
|64-bit JVM on a x64 processor||x64 folder|
|64-bit JVM on an Itanium processor||IA64 folder|
Create OCI Connections
- Start the web application and DI Servers, log into the Spoon, then click on Tools > Database > Explore. The Data Sources dialog box appears.
- Click the plus icon (+) on the right and select JDBC. The Database Connection dialog box appears with General highlighted in the left navigation pane.
- In the Connection Name field, enter a name that uniquely describes this connection. The name can have spaces, but it cannot have special characters, such as #, $, %, and alike.
- In the Database Type list, select Oracle.
- In the Access list, select OCI.
- Enter Settings as directed by the Oracle OCI documentation.
- In the SID field, enter the Oracle system ID that uniquely identifies the database on the system.
- In the Tablespace for Data field, enter the name of the tablespace where the data is stored.
- In the Tablespace for Indicies field, enter the name of the tablespace where the indicies are stored.
- Enter the User Name and Password required to access the database.
- Click Test. A success message appears if the connection is established.
- To save the connection, click OK twice. This connection name appears in the list of available data sources in the Data Sources dialog box. If you want to use Advanced, Options, or Pooling, refer to the Oracle OCI documentation to understand how to specify these settings.
Add Database-Specific Options
Add database-specific options by adding parameters to the generated URL.
- From within the Database Connection dialog box, select Options.
- Select the first available row in the parameter table.
- Choose the database type and enter a valid parameter name and its corresponding value. Note: For more database-specific configuration help, click Help. A new browser opens and displays additional information about configuring the JDBC connection for the currently selected database type.
- Click OK to save your entries.
Advanced Configuration of Database Connections
The Advanced option in the Database Connection dialog box allows you to configure properties that are, for most part, associated with how SQL is generated. These options allow you to set a standard across all of your SQL tools, ETL tools and design tools. All database table names and column names are always upper case or lower case no matter what users do in the tools.
|Supports boolean data types||Instructs PDI to use native boolean data types if supported by the database.|
|Quote all in database||Enables the databases to use a case-sensitive tablename (for example MySQL is case-sensitive on Linux but not case sensitive on Windows). If you quote the identifiers, the databases will use a case sensitive tablename.|
|Force all to lower case||Enables all identifiers to lower case.|
|Force all to upper case||Enables all identifiers to upper case.|
|Preferred schema name...||Enter the preferred schema name (for example, MYSCHEMA).|
|Enter SQL name...||Enter the SQL statement used to initialize a connection.|
Pentaho has implemented a database-specific quoting system that allows you to use any name or character acceptable to the supported databases' naming conventions.
Pentaho Data Integration contains a list of reserved words for most of the supported databases. To ensure that quoting behaves correctly, Pentaho has implemented a strict separation between the schema (user/owner) of a table and the table name itself. Doing otherwise makes it impossible to quote tables or fields with one or more periods in them correctly. Placing periods in table and field names is common practice in some ERP systems (for example, fields such as "V.A.T.")
To avoid quoting-related errors, a rule stops the Pentaho Data Integration from performing quoting activity when there is a start or end quote in the table name or schema. This allows you to specify the quoting mechanism yourself.
Define Connection Pooling
Instead of having a connection open for each individual step in a transformation, you can set up a connection pool and define options like the initial pool size, maximum pool size, and connection pool parameters. For example, you might start with a pool of ten or fifteen connections, and as you run jobs or transformations, the unused connections drop off. Pooling helps control database access, especially if you have transformations that contain many steps and that require a large number of connections. Pooling can also be implemented when your database licensing restricts the number of active concurrent connections.
This table shows descriptions of the pooling options.
|Enable connection pooling||Enables connection pooling|
|Pool Size||Sets the initial size of the connection pool; sets the maximum number of connections in the connection pool|
|Parameters||Allows you to define additional custom pool parameters; click Restore Defaults when appropriate|
|Description||Allows you to add a description for your parameters|
- Select Enable Connection Pooling.
- Type the initial pool size in the Initial: area and the maximum pool size in the Maximum: area.
- Select the parameters you need from within the Parameters: area. A Description of the parameter appears in the Description: area when you select a check box.
- Click OK to save your selections and close the Database Connection dialog box.
Connect to Clusters
This table contains information about other database-related connection tasks you can perform.
|Edit a Connection||Right-click on the connection name and select Edit.|
|Duplicate a Connection||Right-click on the connection name and select Duplicate.|
|Copy to a Clipboard||Allows you to copy the XML defining the step to the clipboard. You can then paste this step into another transformation. Double-click on the connection name in the tree or right-click on the connection name and select Copy to Clipboard.|
|Delete a Connection||Double-click on the connection name in the tree or right-click on the connection name and select Delete.|
|SQL Editor||To execute SQL command against an existing connection, right-click on the connection name and select SQL Editor.|
|Clear the Database Cache||To speed up connections Pentaho Data Integration uses a database cache. When the information in the cache no longer represents the layout of the database, right-click on the connection in the tree and select Clear DB Cache.... This command is commonly used when databases tables have been changed, created or deleted.|
|Share a Connection||Rather than redefining a connection each time you create a job or transformation on your local device, right-click and select Share to share the connection information among jobs and transformations.|
|Exploring the Database||Double-click on the connection name in the tree or right-click on the connection name and select Explore.|
|Show dependencies||Right-click a connection name and select Show dependencies to see all of the transformations and jobs that use this database connection.|