Starting with the Web App

How to get started with streaming analytics using the DeltaStream user interface

This guide steps you through the process of building an end-to-end streaming application with DeltaStream’s Web application. By the end of this guide, you will have hands-on experience with foundational concepts in DeltaStream and be able to build your own similar applications.

Here's what you'll do:

Connect to your streaming store (in this case, Apache Kafka) by creating a store in DeltaStream.
Create your first database.
Create streams and changelogs for your Kafka topics.
Create new streams, changelogs, and materialized views using DeltaStream’s continuous queries.
(Optional) Share your streaming data with other members of your organization.

Before you begin:

Create a DeltaStream account and sign in to the Web app.
Created a DeltaStream organization or join an existing organization.

Note The examples in this guide use topics in Apache Kafka. But the steps should be the same if you have your data in other streaming stores such as Amazon Kinesis or Redpanda.

1. Create a Data Store

In DeltaStream, a data store is a streaming service, such as Apache Kafka or Amazon Kinesis, where your streaming data resides.

Before you write any queries, you must configure DeltaStream to connect to the data store where data resides. This is the first step to take before you process any data using DeltaStream SQL statements.

To create and configure a new data store:

In the lefthand navigation, click Resources ( ) to display the Resources page.
Click + Add Data Store, and from the menu that displays click the store type you want (in this case, Kafka).

Notes Be sure to match the data store type with the streaming data store where streaming data resides. Also, in addition to self-hosted services, for Kafka data store types you can also configure Confluent Cloud and Amazon MSK in DeltaStream.

Choose a unique Name for the data store. Use only alphanumeric characters, dashes, and underscores, and limit the name to a maximum of 255 characters. To follow along here, name the store mskconsumer. This guide uses an Amazon MSK cluster.

Tip In practice, it's best to select the same region as the one where your data is stored. That helps minimize data transfer costs.

Add at least one URI port to which to connect, then click +. Separate multiple entries with “,”
Select a Schema Registry.
Optionally, complete the authentication options as appropriate for the store. For details, see Create Data Stores for Streaming Data #Authentication Parameters.
Click Add.

Your new data store displays in the list of existing stores on the Resources page

To ensure your new data store is set up correctly, click on it to expand it and display the Topics section. From here you can view the list of data store entities.

In the list of topics, find the topic you want and click on it. Then click Print to see the messages coming in to this topic in real time, as in the example below.

2. Create a Database

DeltaStream provides a relational model on top of your streaming data. Similar to other relational systems, DeltaStream uses databases and namespaces for organizing your data.

To create a new Database

In the lefthand navigation click Databases ( ). The Databases pane displays.
At the top of the Database pane click + and then click Database.
At the prompt enter a unique name for the database and then click Add.

In this guide the database is labeled DemoDB.

Note You can create as many databases as you wish. Any database you create includes a namespace labeled public. But you can add more namespaces if you wish.

3. Create Streams and Changelogs

Now use DeltaStream’s DDL statements to create relations on top of your Kafka topics .

To work with your streaming data in an entity as an append-only stream, in which each event is an independent event in your stream, you define it as a Stream. In this guide you declare a stream on the pageviews topic, since each pageview is an independent event. To define a stream, in the lefthand navigation click Workspace ( ), and in the SQL pane write the DDL statement.

Here is the statement to create a pageviews stream; you can copy it and paste it into your work:

CREATE STREAM pageviews (
    viewtime BIGINT, 
    userid VARCHAR, 
    pageid VARCHAR
)WITH (
    'topic'='pageviews', 
    'value.format'='JSON'
);

This stream is created in the currently-used database and namespace -- in this case, DemoDB and public, respectively. Also, there is no store specified in the WITH clause; instead, DeltaStream uses the default store you declared above as the store that contains the pageviews topic.

Next, declare a changelog for the users topic. A changelog indicates you want to interpret events in an entity as UPSERT events. In this case the events must have a primary key; each event is interpreted as an insert or update for the given primary key.

Use the following statement in the Workspace SQL pane to declare the users changelog:

CREATE CHANGELOG users_log(
    registertime BIGINT, 
    userid VARCHAR, 
    regionid VARCHAR, 
    gender VARCHAR, 
    interests ARRAY<VARCHAR>, 
    contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR>, 
    PRIMARY KEY(userid)
)WITH (
    'topic'='users', 
    'key.format'='json', 
    'key.type'='STRUCT<userid VARCHAR>', 
    'value.format'='json'
);

When you declare the pageviews stream and users changelog, the events display in the public namespace of the DemoDB database. To view them, click Databases ( ) in the lefthand navigation, then go to DemoDB > Public.

4. Run Queries

After you declare streams and changelogs, you can write continuous queries in SQL to process this streaming data in real time.

You can run 2 types of queries:

Interactive queries
Streaming or continuous queries

In interactive queries, the query results stream back to the you. Use these types of queries to inspect your streams and changelogs or build queries iteratively by inspecting the query result.

In a Query, the query results are stored back in a store or materialized view instead of streaming back to you.

Let's start with an interactive query. Here's an example: inspect the pageviews stream using the following interactive query:

SELECT * FROM pageviews;

When you run this query, DeltaStream compiles it into a streaming job, then runs the query and streams the result into the Web app. The results resemble this:

Now try a persistent query that joins the pageviews stream with the users changelog to create an enriched pageviews stream that includes user details for each pageview event. While you're at it, use the TO_TIMESTAMP_LTZ function to convert the epoch time to a timestamp with a time zone:

CREATE STREAM csas_enriched_pv AS 
SELECT 
    TO_TIMESTAMP_LTZ(viewtime, 3) AS viewtime,  
    p.userid AS userid, 
    pageid, 
    TO_TIMESTAMP_LTZ(registertime, 3) AS registertime, 
    regionid, 
    gender, 
    interests, 
    contactinfo
FROM pageviews p
    JOIN users_log u ON u.userid = p.userid;

Click RUN. In the background DeltaStream compiles and launches your query as an Apache Flink streaming job, and displays a confirmation similar to the below when the query completes.

To view the query, along with its status, in the lefthand navigation click Queries ( ). After the query successfully runs there's a new Kafka topic named csas_enriched_pv in your Kafka cluster, and a new stream added to the streams in your database, TestDB.

To examine the content of the new stream, run the following interactive continuous query from the SQL page in the Web app:

SELECT * FROM csas_enriched_pv;

The following image shows the result of running the above query. The result streams to the client as shown below:

With the pageviews stream enriched, you can build a materialized view to compute the number of pageviews per user. To do this, enter the following statement in the SQL pane of your workspace to stream this materialized view:

CREATE MATERIALIZED VIEW user_view_count AS 
SELECT
    userid, 
    COUNT(*) AS view_count 
FROM csas_enriched_pv 
GROUP BY userid;

Note Materialized views are not available to free trial users.

When you run this query, DeltaStream launches a streaming job that runs the SELECT statement and materializes the query result. You can query this materialized view the same way you would query a materialized view in a traditional relational database. But the difference here is that DeltaStream leverages the streaming job to always keep the data in the materialized view fresh.

The following is a simple query to get the current view count for a user with the userid of User_2.

SELECT * FROM user_view_count WHERE userid = 'User_2';

The result of this query displays in one row, as shown below:

Note that at the time of running the above query the number of pageviews for User_2 is 3. Now run the query again. This time you should see an updated result for the pageview count for the user. This demonstrates that every time you run a query on a materialized view, you receive the most up-to-date result. DeltaStream ensures the data in the view is continuously updated, using the continuous query that declared the materialized view.

Here is an image of the same query run on the materialized view just a few seconds later:

The result is updated again -- in this case, to 11 from the previous value of 3.

5. Clean Up

It's important to practice good hygiene when you're done! To clean up your DeltaStream environment:

In the lefthand navigation, click Queries ( ) to display the Queries page.
Click the query or queries you wish to terminate.
When the Query Details page displays, towards the top right click Terminate. Follow the prompt in the window that displays, and then click Terminate.
Navigate to the corresponding database and schema and drop the created streams, changelogs, and materialized views.

Tip If there's is a query that uses an object (stream, changelog, or materialized view), be sure to terminate the query before you drop the object.

PreviousFree Trial Quick Start NextStarting with the CLI

Last updated 1 month ago