Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Pentaho Data Service SQL support reference and other development considerations

Parent article

The Pentaho Data Service is designed to support a subset of SQL clauses and literals that are particularly useful for data blending, optimizations, and other scenarios. The following lists what we support. Limitations are noted near the end of this article.

Supported SQL literals

The Pentaho Data Service supports the following literals:

  • Strings have single quotation marks around them. The escape character for a single quote is another single quote, like this ''.
  • Dates have square brackets around them. The following formats are supported: [yyyy/MM/dd HH:mm:ss.SSS], [yyyy/MM/dd HH:mm:ss], and [yyyy/MM/dd].
  • For an IN list in a SQL statement, the date formats can have single quotation marks around them and dashes that replace slashes, like this: SELECT * FROM BUILDS WHERE BuildDate IN ('2015-03-18', '2015-03-22'). Note that you cannot surround a date format with a bracket date in an IN list.
  • Number and BigNumber should have no grouping symbols. Use a period to represent a decimal, like this: 123.45.
  • Integers should contain only digits.
  • Boolean values can be TRUE or FALSE.

Supported SQL clauses

The Pentaho Data Service supports the following clauses.

ClauseWhat is Supported
SELECT
  • COUNT(field)
  • COUNT(*)
  • COUNT(DISTINCT field)
  • DISTINCT <fields>
  • IIF (condition, true-value or field, false-value or field)
  • CASE WHEN condition THEN true-value ELSE false-value END
  • SUM
  • AVG
  • MIN
  • MAX
  • Aliases with both the AS keyword and with one or more spaces separated. For example: SUM(sales) AS "Total Sales" or SUM(sales) TotalSales
  • Constant expressions are possible. See the Supported SQL literals section for more details.
FROM
  • Only one Pentaho service name is permitted.
  • You can use aliases for the Pentaho service name.
  • You can omit the service name to query from an empty row or you can query from dual, for example SELECT 1 or SELECT 1 FROM dual are the same. (Dual is a special one row, one column table supported by some database vendors.)
WHERE
  • Nested brackets
  • AND, OR, NOT if followed by brackets. For example: NOT (A=5 OR C=3).
  • Precedence is considered.
  • Literals (String and Integer)
  • PARAMETER('parameter-name')='value'. Note that this always evaluates to TRUE in the condition.
  • =
  • <
  • >
  • <=, =<
  • >=, =>
  • <>
  • LIKE is supported. The standard % and ? wildcards are converted to .* and . regular expressions.
  • REGEX matches the regular expression.
  • IS NULL
  • IS NOT NULL
  • IN The syntax for multiple values is: value, value, value, ...
  • You can put a condition on the IIF expression or its alias if one is used. Use identical string literals for expressions.
  • DATE_TO_STR (date-field, <mask>). Masks are strings. For example: 'yyyy' and 'yyyy-MM-dd'. Note that the character '(apostrophe) should be escaped in masks by using two apostrophes instead of one.
GROUP BY
  • Group on fields are supported, not the IIF() function.
LIMIT
  • You can retrieve a specific amount of rows from a result set using the LIMIT keyword. For example, this query returns the first 10 rows (#1-10) from the result set: SELECT * FROM data_service LIMIT 10;
  • You can also specify an offset for returning as specific amount of rows using the LIMIT keyword or LIMIT/OFFSET keywords. This queries returns 10 rows from the result set, but the results are offset by five rows so you see results for rows #6 - 15: SELECT * FROM data_service LIMIT 5, 10;
  • You can also use the OFFSET keyword to retrieve the same results: SELECT * FROM data_service LIMIT 10 OFFSET 5;
HAVING
  • Conditions should be placed on the aggregate construct, not the alias.
  • Please use identical strings for the expressions including spaces. For example, if you use put spaces before and after the asterisk for COUNT( * ) in the SELECT clause, do the same for the HAVING clause.
  • Put HAVING conditions on aggregations that do not appear in the SELECT clause.
ORDER BY
  • You can order on any column even if it is not in the result.
  • You can order on IIF or CASE-WHEN expressions.

Other development considerations

There are a few considerations to keep in mind as you design your data service and transformation.

  • You cannot JOIN one data service virtual table to another.
  • Pentaho Data Services uses the Memory Group by step to group. This step keeps all the groups in memory to avoid sorts that can slow down the data service. But, if you plan to use many groups, watch your memory consumption on the server to make sure you don't exceed its limits.
  • You cannot specify the same field twice in the same SELECT clause.
  • Calculations and functions like string concatenation are not supported. But, you can do these things in the data service transformation.