Starting with the Web App
Last updated
Last updated
This guide steps you through the process of building an end-to-end streaming application with DeltaStream’s web application. By the end of this guide, you will have hands-on experience with foundational concepts in DeltaStream and be able to build similar applications.
The examples in this guide use topics in Apache Kafka. But the steps should be the same if you have your data in other streaming stores such as Amazon Kinesis or Redpanda.
Assumptions:
You already have a DeltaStream account and are signed in.
You already have created a DeltaStream organization or have joined an existing organization.
This guide uses Apache Kafka topics, but procedures are the same regardless of whether your data is in Kafka or in other streaming stores such as Amazon Kinesis or Redpanda. When you're done you'll have hands-on experience with foundational concepts in DeltaStream and will be able to build similar applications.
You will:
Connect to your streaming store (in this case, Apache Kafka) by creating a store in DeltaStream.
Create your first database.
Create streams and changelogs for your Kafka topics.
Create new streams, changelogs, and materialized views using DeltaStream’s continuous queries.
(Optional) Share your streaming data with other members of your organization.
Note Before you begin this guide, create your DeltaStream account, sign in to the Web app, and create an organization (or join an existing organization).
In DeltaStream, a store is a streaming service, such as Apache Kafka or Amazon Kinesis, where your streaming data resides.
To create and configure a new store:
Click + Add Store., and from the menu that displays click the store type you want (in this case, Kafka).
Be sure to match the store type with the streaming store where streaming data resides.
Choose a unique Name for the store. Use only alphanumeric characters, dashes, and underscores, and limit the name to a maximum of 255 characters. As this guide uses an Amazon MSK cluster, name the store mskconsumer.
Select the Amazon Availability Zone.
Availability zone information tells DeltaStream where to run a query that uses this store. Tip In practice, it's best here to select the same region as the one where your data is stored. That helps minimize data transfer costs.
Add at least one URI port to which to connect, then click +. Separate multiple entries with “,”
Select a Schema Registry.
Click Add.
Before you write any queries, you must configure DeltaStream to connect to the store where data resides. This tutorial shows how to use the web application to create a store. This is the first step to take before you process any data using DeltaStream SQL statements.
Note In addition to self-hosted services, for Kafka store types you can also configure Confluent Cloud and Amazon MSK in DeltaStream.
Click + Add Store, and from the menu that displays click the store type you want (in this case, Kafka). Note Be sure to match the store type with the streaming store where streaming data resides.
Choose a unique Name for the store. Use only alphanumeric characters, dashes, and underscores, and limit the name to a maximum of 255 characters.
Select the Amazon Availability Zone.
Availability zone information tells DeltaStream where to run a query that uses this store. Tip In practice, it's best here to select the same region as the one where your data is stored. That helps minimize data transfer costs.
Add at least one URI port to which to connect, then click +. Separate multiple entries with a comma.
Select a Schema Registry.
Optionally, complete the authentication options as appropriate for the store. See Creating Stores for Streaming Data for more information.
Click Add.
Your new store displays in the list of existing stores on the Resources page
To ensure your new store is set up correctly, click on it to expand it and display the Topics section. From here you can view the list of store entities.
In the list of topics, find the topic you want and click on it. Then click Print to see the messages that are coming to this topic in real time, as in the example below.
DeltaStream provides a relational model on top of your streaming data. Similar to other relational systems, DeltaStream uses databases and schemas for namespacing and organizing your data.
To create a new Database
At the top of the Catalog pane click + and then click Database.
At the prompt enter a unique name for the database and then click Add.
In this guide the database is labeled TestDB
.
Note You can create as many databases as you wish. Any database you create includes a schema labeled public
. But you can add more schemas if you wish.
Now use DeltaStream’s DDL statements to create relations on top of your Kafka topics .
Here is the statement to create a pageviews stream:
This stream is created in the currently-used database and schema -- in this case, TestDB
and public
, respectively. Also, as there is no store specified in the WITH
clause, DeltaStream uses the default store that you declared above as the store that contains the pageviews
topic.
Next, declare a changelog for the ds_users
topic. A changelog indicates you want to interpret events in an entity as UPSERT events. In this case the events should have a primary key; each event is interpreted as an insert or update for the given primary key.
Use the following statement in the Workspace SQL pane to declare the users
changelog:
When you have declared streams and changelogs, you can write continuous queries in SQL to process this streaming data in real time. To do this, start with interactive queries. In interactive queries, the query results are streamed back to the you. You can use these types of queries to inspect your streams and changelogs or build queries iteratively by inspecting the query result. Here's an example: inspect the pageviews
stream using the following interactive query:
When you run this query, DeltaStream compiles it into a streaming job, then runs the query and streams the result into the Web app. The results resemble this:
Try it out. Write a persistent query that joins the pageviews
stream with the users
changelog to create an enriched pageviews stream that includes user details for each pageview event. While you're at it, use the TO_TIMESTAMP_LTZ
function to convert the epoch time to a timestamp with a time zone:
Cick RUN. In the background DeltaStream compiles and launches your query as an Apache Flink streaming job, and displays a confirmation similar to the below when the query completes.
To examine the content of the new stream, run the following query from the SQL page in the Web app:
The following image shows the result of running the above interactive continuous query. The result of an interactive query streams to the client as shown below:
With the pageviews stream enriched, you can build a mnaterialized view to compute the number of pageviews per user. To do this, enter the following statement in the SQL pane of your workspace to stream this materialized view:
Note Materialized views are not available to trial users.
When you run this query, DeltaStream launches a streaming job that runs the SELECT
statement and materializes the result of the query. You can query the resulting materialized view the same way you would query a materialized view in traditional relational databases. But the difference here is that DeltaStream leverages the streaming job to always keep the data in the materialized view fresh.
The following is a simple query to get the current view count for a user with the userid of User_2
.
The result of this query displays in one row, as shown below:
Note that at the time of running the above query the number of pageviews for User_2
is 3
. Now run the query again. This time you should see an updated result for the pageview count for the user. This indicates that every time you run a query on a materialized view, you receive the most up-to-date result. DeltaStream ensures the data in the view is continuously updated, using the continuous query that declared the materialized view.
Here is an image of the same query run on the materialized view just a few seconds later:
The result is updated again -- in this case, to 11
from the previous value of 3.
It's always good to practice good hygiene when you're done! To clean up your DeltaStream environment:
Terminate the queries.
Navigate to the corresponding database and schema and drop the created streams, changelogs, and materialized views.
Tip If there's is a query that uses a stream, changelog, or materialized view, be sure to terminate the query before you drop the relation.
In the lefthand navigation, click Resources ( ) to display the Resources page.
Optionally, complete the authentication options as appropriate for the store. For details, see for details.
In the lefthand navigation, click Resources ( ) to display the Resources page.
In the lefthand navigation click Catalog ( ).
To work with your streaming data in an entity as an append-only stream, in which each event is an independent event in your stream, you define it as a Stream. In this guide you declare a stream on the ds_pageviews
topic, since each pageview is an independent event. To do this, in the lefthand navigation click Workspace ( ), and in the SQL pane write the DDL statement.
When you declare the pageviews
stream and users
changelog, they display in the public schema of the TestDB
database. To view them, click Catalog ( ) in the lefthand navigation, . navigate to Catalog > TestDB > Public >
DeltaStream also provides a , in which the query results are stored back in a store or materialized view instead of streaming back to you.
To view the query along, with its status, in the lefthand navigation click Queries ( ). After the query successfully runs there's a new Kafka topic named csas_enriched_pv
in your Kafka cluster, and a new stream added to the streams in your database, TestDB
.
In the lefthand navigation, click Queries ( ) to display the Queries page.