Database

Data organization in DeltaStream

Databases are the foundation for organizing data in DeltaStream. They provide the building block of its namespacing model.

You create databases for logical groupings for different teams or projects. For instance, you can create one database for a logging project and another for an ads team.

Namespace

A namespace is a logical grouping of relational objects such as streams, changelogs, materialized views, and tables. Namespaces are grouped in a database. A combination of databases and namespaces enable you to organize their streams, changelogs, and other database objects in a hierarchical fashion in DeltaStream. Such hierarchies also are a basis for providing role-based access control (RBAC) in DeltaStream in the same way as with other relational databases.

DeltaStream Object

DeltaStream provides a relational model for streaming data wherein data is stored in objects. DeltaStream supports the following object types:

Stream
Changelog
Materialized View
Table

In DeltaStream, these objects are building blocks of your applications and pipelines. You can specify object names as fully- or partially-qualified names by specifying a database and/or namespace name in the format of [<database_name>.<namespace_name>.]<object_name>, like this:

db1.public.pageviews

Otherwise, DeltaStream uses the current database and namespace in the scope of a client to identify an object.

Stream

A stream is a sequence of immutable, partitioned, and partially-ordered events.

Tip DeltaStream uses the terms "events" and "records" synonymously.

A stream is a relational representation of data in streaming data stores, such as the data in a Kafka topic or a Kinesis stream.
The records in a stream are independent of each other; there is no correlation between two records in a stream.
A stream declares the schema of the records; this includes the column name, the column type, and optional constraints.

Changelog

As with a stream, a changelog is

a sequence of partitioned and partially-ordered events
a relational representation of data in the streaming data stores, such as the data in a Kafka topic or a Kinesis stream.

A changelog defines a PRIMARY KEY used to represent the change over time for records with the same primary key. Records in a changelog correlate with each other based on the PRIMARY KEY. This means a record in a changelog either is an insert (if it’s the first time the record with the given PRIMARY KEY is appended to the changelog) or an upsert (if a previous record with the same PRIMARY KEY has already been inserted into the changelog).

Materialized View

A materialized view creates a snapshot of a streaming query result and continuously updates the snapshot as records arrive to the query input(s). A materialized view is queryable in DeltaStream; when you query it the results are computed using the data in the snapshot at query runtime.

Note Queries on a materialized view are not streaming queries. They are the same as the queries on tables and materialized views in traditional relational databases.

Table

A table is similar to a materialized view in that it stores records from a streaming source. Unlike materialized views, however, tables do not support upserts. Rather, DeltaStream stores all records from a source or an upstream query operation (such as a JOIN or aggregation) as a sequence of records, as they are provided, for the sink that writes to the table. When you use a table with records that have a primary key -- for example, a changelog -- the resulting rows in the table represent the incremental changes to each record key.

Row Key

Each record in a stream or changelog can have a row key. (Defining a row key is optional for an object.) The value of a key for a given record is extracted from its corresponding message, which is read from the source relation’s entity. For example, if you use a Kafka topic as the object’s entity, Kafka messages’ key bytes assign row key values to the object’s records, based on the object’s row key definition (if any).

Note Some operations such as GROUP BY and JOIN impact the row key definition and add row keys to their results’ records.

When writing query results to a sink, the records’ keys are written as the messages’ keys into the sink relation’s entity. For example, when the result of a join query is written into a Kafka topic, the row keys of the resulting records are set as Kafka messages’ keys.

For more details, see Row Key Definition.

PreviousData Store NextFunction

Last updated 5 months ago