Trial Quick Start
If you have signed up for DeltaStream’s free trial access, you can use this guide to build an end-to-end streaming application with the provided Kafka store in DeltaStream’s web application. To simplify getting started for trial users, we automatically create an organization,
trial-organization, along with a default store with synthetic data.
Using the available store and synthetic data, you will accomplish the following steps in this guide:
- 1.Inspect the data in the streaming store.
- 2.Create your first Database.
- 3.Create streams and changelogs for your Kafka topics.
- 4.Create new streams, changelogs, and materialized views using DeltaStream’s continuous queries.
To simplify getting started for trial users, every user will have a predefined store,
trial_store, available for them once they sign in. The predefined store is an AWS MSK cluster that has several topics with synthetic data producers that continuously publish messages into these topics. Once you sign into the trial account, you should see the trial store in the Stores page. The following image shows what you will see once you click on
You can inspect the data in each topic by clicking on the topic. This will take you to the page where you can click on the play button in the lower right corner of the page to see the live stream of data flowing to the topic. Here is an image of the content for the
Once you confirm the store connectivity and can inspect the content of the topics, we can start declaring databases and relations and write queries on the streaming data.
DeltaStream provides a relational model on top of your streaming data. Similar to other relational systems, DeltaStream uses databases and schemas for namespacing and organizing your data. To create a new database, go to the Databases page from the main menu and click the + button. Enter the database name, and click SAVE to create your first database.
For this guide, we named our database
TestDB. Note that you can create as many databases as you need. Once you create a database, it will also have a schema named
public, but you can add more schemas if you wish.
We will now create relations on top of our Kafka topics using DeltaStream’s DDL statements. If you want to treat your streaming data in a topic as an append only stream where each event is an independent event in your stream, you define it as a stream. In this guide, we declare a stream on the
pageviewstopic since each pageview event is an independent event. To do so, you can go to SQL page by choosing SQL from the main menu and write the DDL statement there. Here is the statement to create a pageviews stream.
CREATE STREAM pageviews (
Note that the above stream will be created in the currently used database and schema, which are
public, respectively. Also, since there is no store specified in the WITH clause, DeltaStream will use the default store that we declared above as the store where the
pageviewstopic is stored in.
We then declare a changelog for the
userstopic. A changelog indicates that we want to interpret events in a topic as UPSERT events; therefore, the events should have a primary key, and each event will be interpreted as an insert or update for the given primary key. Use the following statement in the SQL page to declare the
CREATE CHANGELOG users_log (
contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR>,
Once you declare the pageviews stream and users changelog you should be able to see them in the public schema of the
TestDBdatabase by navigating to the Databases page from the main menu.
Once you declared streams and changelogs, you can write continuous queries in SQL to process this streaming data in real time. Let’s start with interactive queries, where the result of such queries are streamed back to the user. Such queries can be used to inspect your streams and changelogs or build queries iteratively by inspecting the query result. As an example, let’s inspect the
pageviewsstream using the following interactive query:
SELECT * FROM pageviews;
Once you run this query, DeltaStream compiles it into a streaming job and runs it and streams the result into the web app. You should see something like this in your screen:
While interactive query results are streamed to the user, DeltaStream provides persistent queries that are continuous queries where the query results are stored back in a Store or Materialized View. As the first step, let’s write a persistent query that joins the
pageviewsStream with the
users_logChangelog to create an enriched
pageviewsStream that includes user details for each pageview event. We will also convert the epoch time to the timestamp with a timezone using the
CREATE STREAM enriched_pv
TO_TIMESTAMP_LTZ(viewtime, 3) AS viewtime,
p.userid AS userid,
TO_TIMESTAMP_LTZ(registertime, 3) AS registertime,
FROM pageviews p
JOIN users_log u ON u.userid = p.userid;
Under the covers, the persistent query above will create a new topic in the trial store. When creating a new topic in the trial store, a topic prefix name will be added to the name of the topic created. The prefix will be based on your trial email and some unique random characters. For example, the email [email protected] will result in a topic prefix like
t_testgmailcom_4evmsyg_and creating the topic
enriched_pvwill create the topic
t_testgmailcom_4evmsyg_enriched_pv, which can be viewed in the trial store topics list.
Note: This is a requirement only for the trial store we have set; prefixes will not be added if you are using any other store such as your own Apache Kafka or AWS Kinesis.
Once you write the above query in the SQL page and click RUN, you should see the confirmation of query success like the following:
Under the hood, DeltaStream will compile and launch the query as an Apache Flink streaming job. You should be able to see the query along with its status in the Query Management page by selecting Queries in the main menu. Once the query successfully runs, you will have a new Kafka topic named
enriched_pvin your Kafka cluster, and a new stream will also be added to the streams in your database,
TestDB. You can examine the contents of the new stream by running the following query from the SQL page in the web app:
SELECT * FROM enriched_pv;
The following image shows the result of running the above interactive continuous query. Note that the result of the interactive query is streamed to the client as shown below:
Now that we have the enriched pageviews stream, let’s build a materialized view where we compute the number of pageviews per user. Type the following statement in the SQL page to stream this materialized view.
CREATE MATERIALIZED VIEW user_view_count
COUNT(*) AS view_count
GROUP BY userid;
Once you run the above query, DeltaStream will launch a streaming job that runs the SELECT statement and materializes the result of the query. The resulting materialized view then can be queried the same way you would query a materialized view in traditional relational databases; however, in DeltaStream, the data in the materialized view is always kept fresh by the streaming job. The following is a simple query to get the current view count for a user with the userid of
SELECT * FROM user_view_count WHERE userid = 'User_2';
The result of the above query will be one row as shown below:
As you see, the number of pageviews for
User_2is 10 at the time of running the above query. Now run the query again. You should see an updated result for the pageview count for the user, which indicates that every time you run a query on a Materialized View, you will get the most up-to-date result. DeltaStream will make sure the data in the view is continuously updated using the continuous query that declared the materialized view. Here is an image that shows running the same query on the materialized view a few seconds later.
As you see, the result is updated to
10from the previous value of
You can see all of the relations such as streams, changelogs, and materialized views along with queries and their relationship with each other in the Stream 360 section of the DeltaStream web app. As you can see in the image below, we have a query that joins pageviews and users_log and creates a new stream called
enriched_pv. We also have another query that creates a materialized view named
Now that we have seen a basic use case, let’s clean up our environment. To do so we first terminate the queries and then drop the created streams, changelogs, and materialized views. To terminate your queries, go to the Queries page. Then you can go to the corresponding Database and Schema to drop the streams, changelogs, and materialized views. Note that if there is a query that uses a stream, changelog, or materialized view, you must terminate the query before dropping the relation.