APPLICATION
Syntax
Description
By using APPLICATION, you can create a set of DSQL statements (including DDL and Query statements) and run them as a unit of work with all-or-nothing effect:
In the case of DDL statements, either all of them succeed or none do, depending on whether there is a failure in processing statements. This means the metastore is updated only if all statements in the application succeed.
In the case of Query statements, all queries run in a single job. This helps achieve better efficiency and resource utilization at runtime.
Note The order of statements in an application matters.
Supported Statements
DeltaStream supports the below statement types in application:
USE (
USE database
,USE schema
,USE store
)CREATE STREAM AS SELECT (virtual and non-virtual)
CREATE CHANGELOG AS SELECT (virtual and non-virtual)
Benefits of Application
Applications help you achieve better efficiency and overall cost reduction in two ways:
What Happens During an Application Failure
If a failure occurs when processing a DDL or Query statement in an application (for example, due to a parse error), then the whole application fails. This means no entities are created and no query will start.
If a runtime failure occurs when the application job is running (for example, due to an authentication error to access a store), then all queries in the application fail. However, any relation(s) created at the beginning of application via DDL statements remain.
Virtual Relation
Here is the syntax to define a virtual relation:
Virtual relations are for defining intermediate computation results used in one or more subsequent queries in the same application. They help simplify the computation logic. They also perform some common computation among queries only once and use it multiple times, at no extra cost.
Examples
For the following examples, assume a stream named pageviews
has been created using the below CREATE STREAM statement:
Application with multiple statements
In the example below, two new relations are created:
Two queries start and both run in a single job. One query ingests data into pageviews_2
and the other into view_log
. Note that since both queries read data from the same source relation (such as pageviews
), the application job reads pageviews
records once from its entity and uses that for both queries. This improves the resource utilization. It also reduces the storage and network overhead for the store on which that pageviews
is defined.
Application with multiple related statements
The example application below has 5 statements. New streams are defined, some of which are used in other queries in the same application. For example, the INSERT INTO
statement reads data from pv_copy,
which you define using a CREATE STREAM AS SELECT statement and write into pageviews2
, which is in turn defined with another CSAS in the same application. (Note that the order of statements matter in application.) Therefore, statements defining pv_copy
and pageviews2
should appear before the INSERT INTO
statement that uses them, in the application body. Moreover, the users
stream is defined via a CREATE STREAM statement and is used in the JOIN query and last CSAS statement. Again, data records are read once from pageviews
and users
entities and are used in the queries that are referring to them in their FROM clause.
All the queries in the application run within a single job.
Application with virtual relations
In the example application below, we assume the pageviews
stream is already defined in DeltaStream and we have a topic named users
in our store. In this application, we first create a changelog on the users
topic to track changes in users' information. Given that we are interested in pageviews done by 3 specific users, we can create a virtual stream: virtual.public.v1
to filter records for those users from pageviews
and use them in subsequent queries.
We use a Tumble window function on virtual.public.v1
first to find pageviews done by these 3 users every 30 seconds. We write the results into a new topic visit_freq
. Next we create a new virtual stream virtual.public.v2
using a temporal join between virtual.public.v1
and users_log
changelog to extend each record with its user's information. We then use virtual.public.v2
records in two subsequent queries:
One query to find the
popular_pages
visited by female users among our users of interest.A second query to find pages visited by different users in a given duration and then enrich the results by latest information about the visiting users.
Note that although we have 5 CSAS statements in this application, only 3 new topics are created in the configured store:
visit_freq
popular_pages
cross_region_pages
Virtual relations' records do not persist in any topic and are only fed to subsequent queries that refer them.
Last updated