WITH (Common Table Expression)
Syntax
WITH
with_name AS (select_statement)
[, ...]
select_statement;
Description
The WITH
clause is used for defining named subqueries that can be used as a common expression in other SELECT statements. These common expressions are referred to as Common Table Expressions (CTEs), and present a temporary view of the data that is projected from its select_statement
. As a result, CTEs modularize queries, making them more maintainable and versatile than subqueries.
The result of a CTE is effectively a DeltaStream Object that can be used just like any other relation defined with a DDL or subquery, and by definition takes precedence over other relations defined using a DDL.
Arguments
with_name
A name for the select_statement
that defines the CTE.
select_statement
See SELECT.
Examples
Using CTEs from other CTEs
CTEs are just like any other relations, and they can be used within the SELECT
statement of other CTEs. In this example, c1
projects the viewtime
of each pageid
from pageviews
, and c2
adds a processing time to the result of that before eventually projecting proc_time
, pageid
and viewtime
in the main SELECT
statement:
WITH
c1 AS (SELECT pageid, viewtime FROM pageviews),
c2 AS (SELECT NOW() AS proc_time, * FROM c1)
SELECT * FROM c2;
Joining a Stream CTE with a Changelog CTE
Each CTE represts a local Relation meaning that their grouping/aggregation or project reflects how they present the underlying data to the query that they're part of.
In the following example, c1
represents a Stream over pageviews
where Page_6
has been visited by a user, and c2
represents a Changelog over users
where its grouping the changes by the userid
column. When joining these two CTEs, the JOIN
operation treats this as a Stream-Changelog join, and doesn't require a WITHIN
window for the join criteria:
WITH
c1 AS (SELECT * FROM pageviews WHERE pageid = 'Page_6'),
c2 AS (
SELECT userid, count(interests) AS interest_count
FROM users
GROUP BY userid)
SELECT
p.userid AS pvid,
u.userid AS uid,
p.pageid,
u.interest_count AS interest_count
FROM
c1 p
JOIN
c2 u
ON u.userid = p.userid;
Self-joining CTEs
In this example, a single CTE is written to reshape the pageviews
stream, but used twice in the JOIN
operation to self-join for the resulting expanded data. The result of the joined data can be used as projected by the CTE's SELECT
statement — htat is, user ID as an integer:
WITH
c1 AS (
SELECT
viewtime,
CAST(SUBSTRING(pageid FROM 6) AS INTEGER) AS pid,
CAST(SUBSTRING(userid FROM 6) AS INTEGER) AS uid
FROM pageviews)
SELECT
pl.uid AS lid,
pr.uid AS rid,
pl.pid,
pr.viewtime AS viewtime
FROM
c1 pl
JOIN
c1 pr
WITHIN 1 MINUTE
ON pr.uid = pl.uid
WHERE pl.uid != 5;
Create a new stream from MATCH_RECOGNIZEd CTE
This example shows a real-world query pattern matching over bus trip updates (redefined with CTEs from our Analyzing NYC Bus Data blog). A local bus trip updates relation as defined in the trip_updates
CTE, which is then used in the MATCH_RECOGNIZE
update to find each vehicles average time at each stop:
CREATE STREAM trips_delay_increasing
WITH
trip_updates AS (
SELECT
trip,
"stopTimeUpdate",
vehicle,
CAST(FROM_UNIXTIME("timestamp") AS TIMESTAMP) AS ts,
"timestamp" AS epoch_secs,
delay
FROM
nyc_bus_trip_updates)
AS SELECT
trip,
vehicle,
CAST(
FROM_UNIXTIME((start_epoch_secs + end_epoch_secs) / 2)
AS TIMESTAMP
) AS avg_ts
FROM trip_updates
MATCH_RECOGNIZE(
PARTITION BY trip
ORDER BY "ts"
MEASURES
C.row_timestamp AS row_timestamp,
C.row_key AS row_key,
C.row_metadata AS row_metadata,
C.vehicle AS vehicle,
A.epoch_secs AS start_epoch_secs,
C.epoch_secs AS end_epoch_secs
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST C
PATTERN (A B C)
DEFINE
A AS delay > 0,
B AS delay > A.delay + 30,
C AS delay > B.delay + 30
) AS MR WITH ('timestamp'='ts');
Last updated