Creating and Querying Materialized Views
A Materialized View is a Relation that captures the result of a query and is continuously updated as new records are ingested in the query’s source Relation(s).
In this tutorial, we show how Materialized Views are created and how queries can be run on them.
We can use the CREATE MATERIALIZED VIEW AS statement to create a new Materialized View. Currently, the SELECT Query in the CREATE MATERIALIZED VIEW AS statement can only generate columns with Primitive Data Types.
In our first example, assume we create a Stream called
pageviews
on the data stored in the ds_pageviews
topic in our default store. We can use this DDL statement to create this Stream: CREATE STREAM pageviews (
viewtime BIGINT,
userid VARCHAR,
pageid VARCHAR
) WITH ('topic'='ds_pageviews', 'value.format'='JSON');
Now, imagine we are interested in keeping track of how many pages each user is visiting every 30 seconds. This can be done by creating a Materialized View using a query with a window-function. We call this Materialized View
visits_count
, and DeltaStream takes care of keeping its content up-to-date in a real-time manner.CREATE MATERIALIZED VIEW
visits_count
AS SELECT
window_start,
window_end,
userid,
COUNT(pageid) AS pgcnt
FROM TUMBLE(pageviews, size 30 SECONDS)
GROUP BY window_start, window_end, userid;
You can go to
Databases
> Relations
tab and check the details on the above two Relations:
The Databases tab, as located in the lefthand menu.
Now that the
visits_count
Materialized View is created, we can run a top-k query against it. This query retrieves the top-3 users who have visited the most number of pages so far. It uses GROUP BY
and the aggregate function SUM
to calculate the total number of pages each user has visited across all-time windows. Results are ordered based on the total page counts and the top-3 records that are returned: SELECT userid, SUM(pgcnt) AS total
FROM visits_count
GROUP BY userid
ORDER BY total DESC
LIMIT 3;
Here is a sample result for the query:

Result of the query.
In the second example, we use a Materialized View to run different queries on
JOIN
results. Assume a new Stream, called users
, is created using the DDL below. This Stream captures some information about each user, such as the time a given user has registered along with the user’s interests. CREATE STREAM users_info (
registertime BIGINT,
userid VARCHAR,
regionid VARCHAR,
gender VARCHAR,
interests ARRAY<VARCHAR>,
contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR>
) WITH ('topic'='ds_users', 'value.format'='json');
Imagine we are interested in enriching the information we already captured in the
pageviews
stream by combining it with each user’s information from users
. For this purpose, we can run an interval-join-stream-stream query and store the result in a Materialized View called visits_info
. CREATE MATERIALIZED VIEW
visits_info
AS SELECT p.userid AS uid,
p.pageid,
u.registertime,
u.interests[1] AS top_interest
FROM pageviews p JOIN users u
WITHIN 1 minute
ON u.userid = p.userid;
Now that the
visits_info
Materialized View is created, its content keeps getting updated in real-time as new records are ingested in either the pageviews
or the users
. We can retrieve the latest records in the Materialized View by running queries directly on it.For example, the below query finds info on pages visited by users whose top interest is gaming.
SELECT *
FROM visits_info
WHERE top_interest = 'Game';
The output would look like this:

Output of a sample query.
As another example, we can find top interests of all the users who have visited a specific page (such as
Page_15
) so far and sort them according to the total users count per interest.SELECT top_interest, count(uid) AS cnt
FROM visits_info
WHERE pageid = 'Page_15'
GROUP BY top_interest
ORDER BY cnt DESC;
The query results would look like the below sample:

Sample query results.