What is DeltaStream?

DeltaStream is a serverless stream processing platform that integrates with streaming storage services including Apache Kafka and AWS Kinesis, Confluent Cloud, AWS MSK, and Redpanda. Think about it as the compute layer on top of your streaming storage.

DeltaStream is a relational platform on top of your streaming data stores to process, organize, secure, and share your streaming data.

DeltaStream provides a SQL-based interface where you can easily create stream processing applications such as streaming pipelines, materialized views, microservices, and many more. However, DeltaStream is more than just a query processing layer on top of Kafka or Kinesis. It brings relational database concepts to the data streaming world, including namespacing and role-based access control that enables you to securely access, process, and share your streaming data regardless of where they are stored. Unlike existing solutions that mainly focus on processing capabilities, DeltaStream provides a holistic solution for both processing and operational management of your streaming data.

DeltaStream’s primary capabilities make it uniquely suited for processing and managing data streams:

  • DeltaStream is serverless. No longer must you worry about clusters/servers, architecting, or scaling infrastructure to run real-time applications. Gone are the days of cluster sizing, keeping track of which cluster queries run in, or knowing how many tasks to allocate to your applications. In DeltaStream, queries run in isolation, can scale up/down independently, and seamlessly recover from failures! DeltaStream takes care of all those complexities. You can focus just on building your core products that bring value to you and your organization.

  • SQL as the primary interface. SQL is the primary interface for DeltaStream. From creating databases and streams, to running continuous queries, or building materialized views on these streams, you can do it all in a simple and familiar SQL interface. DeltaStream provides SQL extensions that enable you to express streaming concepts that don’t have equivalents in traditional SQL. Additionally, if your compute logic requires more than SQL, you can use DeltaStream’s UDFs/UDAFs to define and perform such computations.

  • Always up-to-date materialized views. Materialized view is a native capability in DeltaStream. You use continuous queries to build “always up-to-date” materialized views. And when you create a materialized view you can query it the same way you query materialized views in relational databases.

  • Unified view over multiple streaming stores. DeltaStream gives you a single view into all your streaming data across all your streaming stores. Whether you are using one or multiple Kafka clusters or multiple platforms such as Kafka and Kinesis, DeltaStream provides a unified view of the streaming data and you can write queries on these streams regardless of where they are stored.

  • Intuitive namespacing. Streaming storage systems such as Apache Kafka have a flat namespace— think of this as a file system with no folders! This makes it very challenging to organize streams in such systems. But by providing namespacing, DeltaStream enables you to organize your streams in databases and schemas similar to the way you'd organize your tables in relational databases. And with storage abstraction as described above, you can organize your streaming data across all your streaming storage systems.

  • Fine-grained security that you know and love. You can define fine-grained access privileges to determine who can access and perform which operations on objects in DeltaStream. With DeltaStream’s role-based access control (RBAC) you define roles and assign them to users. And you can do it all in SQL that you know and love. For instance, you can give read privileges on a specific stream to a given role with a one-line statement!

  • Break down silos for your streaming data with secure sharing. With the namespacing, storage abstraction, and role-based access control, DeltaStream breaks down silos for your streaming data and enables you to share streaming data securely across multiple teams in your organizations.

  • Push notifications. You can create notifications on results of your continuous queries and push them to a variety of services such as Slack, email, PagerDuty, or custom API calls. For instance, with a stream of sensor data from vehicles, you can write a query to compute the average speed of each vehicle and send a notification to the driver if the average is higher than a threshold for a given time window.

Interacting with DeltaStream

You can interact with DeltaStream through its REST API, a web application, or the CLI. The following figure displays a screenshot of the DeltaStream web application. Also, using our REST API, you can have your own application call the API or tools like GitHub Actions submit a set of statements that define an application or pipeline.

DeltaStream Web App workspace

When should you use DeltaStream?

With the aforementioned capabilities you can quickly and easily build streaming applications and pipelines on your streaming data. If you are already using a streaming storage service such as Apache Kafka, AWS Kinesis, Confluent Cloud, AWS MSK, or Redpanda, consider using DeltaStream.

Here are a couple of use cases:

Assume you have a vehicle information topic in your production Kafka cluster where you ingest real-time information such as GPS coordinates, speed, and other vehicle data. Consider you want to share this stream in real time with another team but only want to share information from vehicles in a certain geographic region and also obfuscate some of the data. Further, you don’t want to give access to the production Kafka cluster and wish to provide the shared information in a topic in a new Kafka cluster. You can easily write a SQL query, such as the one shown below in DeltaStream, where you read the original stream and perform the desired projection, transformations, and filtering. You can continuously write the result into a new stream backed by a topic in the new Kafka cluster that you have already declared as test_kafka.

CREATE STREAM resultStream WITH('store'='test_kafka') AS 
SELECT 
     vid, lat, lon, mask(pii, '*') 
FROM vehecleStream 
lWHERE isInGeoFence(lat, lon) = true;

When you have the results stream, you can use the following statement to grant read privilege for the team. They only see the result stream and never see the source stream or the production Kafka cluster!

GRANT USAGE, SELECT PRIVILEGE ON resultStream TO analyst;

As another example, consider a wiki service where all user interactions with every wiki page is streamed into a Kinesis stream. Let’s assume you wish to provide real time page statistics such as the number of edits per wiki page. You can easily build a materialized view in DeltaStream using an aggregate query, as in the following:

CREATE MATERIALIZED VIEW wiki_edit_count AS 
SELECT 
    page_id, count(*) AS edit_count 
FROM wiki_events 
WHERE wiki_event_type = 'edit' 
GROUP BY page_id;

This creates a materialized view in DeltaStream where we have the edit count per wiki page; every time an edit event is appended to the wiki_events stream, the view updates in real time. You can now show the up-to-date edit count for a wiki page every time it is loaded by querying the materialized view and including the edit count in the wiki page. DeltaStream ensures that every time someone opens a wiki page they see the latest up-to-date edit counts for that page.

Last updated