WITH (Common Table Expression)
Last updated
Last updated
The WITH
clause is used for defining named subqueries that can be used as a common expression in other SELECT statements. These common expressions are referred to as Common Table Expressions (CTEs), and present a temporary view of the data that is projected from it's select_statement
. As a result, CTEs modularize queries, making them more maintenable and versatile than subqueries.
The result of a CTE is effectively a Relation that can be used just like any other Relation defined with a DDL or subquery, and by definition takes precedence over other Relations defined using a DDL.
CTEs may not be used within a .
A name for the select_statement
that defines the CTE.
See SELECT.
CTEs are just like any other Relations, and they can be used within the SELECT
statement of other CTEs. In this example, c1
projects the viewtime
of each pageid
from pageviews
, and c2
adds a processing time to the result of that before eventually projecting proc_time
, pageid
and viewtime
in the main SELECT
statement:
Each CTE represts a local Relation meaning that their grouping/aggregation or project reflects how they present the underlying data to the query that they're part of.
In the following example, c1
represents a Stream over pageviews
where Page_6
has been visited by a user, and c2
represents a Changelog over users
where its grouping the changes by the userid
column. When joining these two CTEs, the JOIN
operation treats this as a Stream-Changelog join, and doesn't require a WITHIN
window for the join criteria:
In this example, a single CTE is written to reshape the pageviews
Stream, but used twice in the JOIN
operation to self-join for the resulting exanded data. The result of the joined data can be used as projected by the CTE's SELECT
statement, i.e. user ID as an Integer:
This example shows a real-world query pattern matching over bus trip updates (redefined with CTEs from our Analyzing NYC Bus Data blog). A local bus trip updates Relation is defined in the trip_updates
CTE, which is then used in the MATCH_RECOGNIZE
update to find each vehicles average time at each stop: