Some transformation steps and job entries use virtual file system (VFS) dialog boxes in place of traditional local file system windows. With VFS file dialog boxes, you can specify a VFS URL instead of a typical local path. The files are accessed using HTTP, with the URLs containing schema data that identify a protocol to use. See http://commons.apache.org/vfs/apidocs/index.html for VFS schema documentation. Your files can be local or remote. They can also reside in compressed formats such as .tar, .zip, or other compressed formats.
Perform the following steps to access your files with the VFS browser:
- Select File > Open URL in the PDI client to open the VFS browser as shown in the following figure:
- Choose the file system type of your Location. The following file systems are supported:
- Local – Opens files on your local machine. Use the folders in the Name panel of the Open File dialog box to select a resource.
- Hadoop Cluster – Opens files on any Hadoop cluster except S3. Click the Hadoop Cluster drop-down box to select your desired cluster, then the resource you want to access.
- S3 – (Simple Storage Service) accesses the resources on Amazon Web Services. For instructions on setting up AWS credentials, see Working with AWS Credentials.
- HDFS – Opens files on any Hadoop distributed file system except MapR. Select your desired cluster for the Hadoop Cluster option, then select the resource you want to access.
- MapRFS – Opens files on the MapR file system. Use the folders in the Name panel of the Open File dialog box to select a MapR resource.
- Google Cloud Storage – Opens files on the Google Cloud Storage file system.
- Google Drive – Opens files on the Google file system. You must configure PDI upon initial access into the Google file system. See Access to a Google Drive for more information.
The following addresses are VFS URL examples:
Access to a Google Drive
Perform the following set up steps to initially access your Google Drive:
- Turn on the Google Drive API, which results in a credentials.json file. See https://developers.google.com/drive/api/v3/quickstart/java for details.
- Rename your credentials.json file to client_secret.json and copy it into the data-integration/plugins/pentaho-googledrive-vfs/credentials directory, and restart PDI. The Google Drive option will not appear as a Location until you copy the client_secret.json file into the credentials directory and restart.
- Select Google Drive as your Location. You are prompted to login to your Google account.
- Once you have logged in, the Google Drive permission screen displays.
- Click Allow to access your Google Drive Resources.
Pentaho then stores a security token called StoredCredential under the data-integration/plugins/pentaho-googledrive-vfs/credentials directory. With this token, you can access your Google Drive resources whenever you are logged into your Google account. If this security token is ever deleted, you will be prompted again to log into your Google account after restarting PDI. If you ever change your Google account permissions, you must delete the token and repeat the above steps to generate a new token.
If you want to access your Google Drive via a transformation running directly on your Pentaho server, copy the StoredCredential and client_secret.json files into the pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-googledrive-vfs/credentials directory on your Pentaho Server.
Add and Delete Folders or Files
You can also use the VFS browser to delete files or folders on your file system. A default filter is applied so that initially Kettle transformation and job files display. To view other files, click the Filter drop-down and select the type of file you want to select. Once you have selected the file or folder you want to delete, click the X in the upper-right corner of the VFS browser to delete your selection. If you want to create a new folder, click the + in the upper-right corner of the VFS browser and enter your new folder name, and click OK.
Supported Steps and Entries
Supported transformation steps and job entries open the VFS browser instead of the traditional file open dialog box. With the VFS browser, you specify a VFS URL instead of a file path to access those resources.
The following steps and entries support the VFS browser:
- File Exists
- Mapping (sub-transformation)
- ETL Metadata Injection
- Hadoop Copy Files
- Hadoop File Input
- Hadoop File Output
- Avro Input
- Avro Output
- ORC Input
- ORC Output
- Parquet Input
- Parquet Output
VFS dialog boxes are configured through certain transformation parameters. Refer to Configure SFTP VFS for more information on configuring options for SFTP.
Configure VFS Options
The VFS browser can be configured to set variables as parameters for use at runtime. A VFS Configuration Sample.ktr sample transformation containing some examples of the parameters you can set is located in the data-integration/samples/transformations directory. For more information on setting variables, see VFS Properties. For an example of configuring an SFTP VFS connection, see Configure SFTP VFS.