What is DeltaStream?
Last updated
Last updated
DeltaStream is a serverless stream processing platform that integrates with streaming storage services including Apache Kafka and AWS Kinesis, Confluent Cloud, AWS MSK, and Redpanda. Think about it as the compute layer on top of your streaming storage.
DeltaStream provides a SQL-based interface wherein you can easily create stream processing applications such as streaming pipelines, materialized views, microservices, and many more.
DeltaStream is more than simply a query processing layer on top of Kafka or Kinesis. It brings relational database concepts to the data streaming world, including namespacing and role-based access controls that enable you to securely access, process, and share your streaming data regardless of where it is stored. Unlike existing solutions that focus primarily on processing capabilities, DeltaStream provides a holistic solution for both processing and operational management of your streaming data.
DeltaStream’s primary capabilities make it uniquely suited for processing and managing data streams:
DeltaStream is serverless. No longer must you worry about clusters/servers, architecting, or scaling infrastructure to run real-time applications. Gone are the days of cluster sizing, keeping track of which cluster queries run in, or knowing how many tasks to allocate to your applications. DeltaStream removes much of the complexity; queries can
run in isolation
scale up/down independently
seamlessly recover from failures.
This enables you to focus just on building the core products that bring value to you and your organization.
SQL as the primary interface. Do all you need to do in a simple and familiar SQL interface:
Create databases and streams
Run continuous queries
Build materialized views on these streams.
DeltaStream provides SQL extensions that enable you to express streaming concepts that don’t have equivalents in traditional SQL. Additionally, if your compute logic requires more than SQL, you can use / DeltaStream’s UDFs/UDAFs to define and perform such computations.
Always up-to-date materialized views. Materialized view is a native capability in DeltaStream. You use continuous queries to build “always up-to-date” materialized views. Then when you create a materialized view you can query it the same way you query materialized views in relational databases.
Unified view over multiple streaming stores. DeltaStream gives you a single view into all your streaming data across all your streaming stores. Whether you are using one or multiple Kafka clusters or multiple platforms such as Kafka and Kinesis, DeltaStream provides a unified view of the streaming data. Further, you can write queries on these streams regardless of where they are stored.
Intuitive namespacing. Streaming storage systems such as Apache Kafka have a flat namespace — roughly analogous to a file system with no folders. This makes it challenging to organize streams in such systems. By providing namespacing, DeltaStream enables you to organize your streams in databases and schemas similar to the way you'd organize your tables in relational databases. Such storage abstraction enables you to organize your streaming data across all your streaming storage systems.
Fine-grained security that is familiar and straightforward. You can define fine-grained access privileges to determine who can access and perform which operations on objects in DeltaStream. With DeltaStream’s role-based access control (RBAC) you define roles and assign them to users. And you can do it all in familiar SQL. For instance, with just a one-line statement you can give read privileges on a specific stream to a given role.
Break down silos for your streaming data with secure sharing. With namespacing, storage abstraction, and role-based access control, DeltaStream breaks down silos for your streaming data and enables you to share streaming data securely across multiple teams in your organizations.
Push notifications. You can create notifications on results of your continuous queries and push them to a variety of services such as Slack, email, PagerDuty, or custom API calls. For instance, with a stream of sensor data from vehicles, you can write a query to compute the average speed of each vehicle and send a notification to the driver if the average is higher than a threshold for a given time window.
You can interact with DeltaStream through its REST API, a Web application, or the CLI. The following figure displays a screenshot of the DeltaStream Web application. Also, using our REST API, you can have your own application call the API, or have tools such as GitHub Actions submit a set of statements that define an application or pipeline.
With the aforementioned capabilities you can quickly and easily build streaming applications and pipelines on your streaming data. If you are already using a streaming storage service such as Apache Kafka, AWS Kinesis, Confluent Cloud, AWS MSK, or Redpanda, consider using DeltaStream.
Here are two use cases:
You have a vehicle information topic in your production Kafka cluster where you ingest real-time information such as GPS coordinates, speed, and other vehicle data. You need to share this stream in real time with another team, but only wish to share information from vehicles in a certain geographic region while also obfuscating some of the data. Further, you don’t want to give access to the production Kafka cluster and wish to provide the shared information in a topic in a new Kafka cluster.
To do this you can write a SQL query, such as the one shown below in DeltaStream, to read the original stream and perform the desired projection, transformations, and filtering. You can continuously write the result into a new stream backed by a topic in the new Kafka cluster that you have already declared as test_kafka
.
When you have the results stream, you can use the following statement to grant read privilege for the team. They only only the result stream and never see the source stream or the production Kafka cluster.
Next example: a wiki service where all user interactions with every wiki page is streamed into a Kinesis stream.
In this case assume you wish to provide real-time page statistics such as the number of edits per wiki page. You can easily build a materialized view in DeltaStream using an aggregate query, as in the following:
This creates a materialized view in DeltaStream that gives you the edit count per wiki page; every time an edit event is appended to the wiki_events
stream, the view updates in real time. To display the up-to-date edit count for a wiki page every time it is loaded, simply query the materialized view and include the edit count in the wiki page. DeltaStream ensures that every time someone opens a wiki page they see the latest up-to-date edit counts for that page.