Database
Databases are the foundation for organizing data in DeltaStream. They provide the building block of its namespacing model.
You create databases for logical groupings for different teams or projects. For instance, you can create one database for a logging project and another for an ads team.
Schema
A schema is a logical grouping of relational objects such as streams, changelogs, materialized views, and tables. Schemas are grouped in a database. A combination of databases and schemas enable you to organize their streams, changelogs, and other database objects in a hierarchical fashion in DeltaStream. Such hierarchies also are one of the bases for providing role-based access control (RBAC) in DeltaStream in the same way as do other relational databases.
Relation
DeltaStream provides a relational model for streaming data wherein data is stored in relations. DeltaStream supports the following relation types:
Stream
Changelog
Materialized View
Table
In DeltaStream, these relations are building blocks of your applications and pipelines. You can specity relation names as fully- or partially-qualified names by specifying a database and/or schema name in the format of [<database_name>.<schema_name>.]<relation_name>
, like this:
db1.public.pageviews
Otherwise, DeltaStream uses the current database and schema in the scope of a client to identify a relation.
Stream
A stream is a sequence of immutable, partitioned, and partially-ordered events.
Tip DeltaStream uses the terms "events" and "records" synonymously.
A stream is a relational representation of data in streaming stores, such as the data in a Kafka topic or a Kinesis stream.
The records in a stream are independent of each other; there is no correlation between two records in a stream.
A stream declares the schema of the records; this includes the column name, the column type, and optional constraints.
Changelog
As with a stream, a changelog is
a sequence of partitioned and partially-ordered events
a relational representation of data in the streaming stores, such as the data in a Kafka topic or a Kinesis stream.
A changelog defines a PRIMARY KEY
used to represent the change over time for records with the same primary key. Records in a changelog correlate with each other based on the PRIMARY KEY
. This means a record in a changelog either is an insert (if it’s the first time the record with the given PRIMARY KEY
is appended to the changelog) or an upsert (if a previous record with the same PRIMARY KEY
has already been inserted into the changelog).
Materialized View
A materialized view creates a snapshot of a streaming query result and continuously updates the snapshot as records arrive to the query input(s). A materialized view is queryable in DeltaStream; when you query it the results are computed using the data in the snapshot at query runtime.
Note Queries on a materialized view are not streaming queries. They are the same as the queries on tables and materialized views in traditional relational databases.
Table
A table is similar to a materialized view in that it stores records from a streaming source. Unlike materialized views, however, tables do not support upserts. Rather, DeltaStream stores all records from a source or an upstream query operation (such as a JOIN
or aggregation) as a sequence of records, as they are provided, for the sink that writes to the table. When you use a table with records that have a primary key -- for example, a changelog -- the resulting rows in the table represent the incremental changes to each record key.
Row Key
Each record in a stream or changelog can have a row key. (Defining a row key is optional for a relation.) The value of a key for a given record is extracted from its corresponding message, which is read from the source relation’s entity. For example, if you use a Kafka topic as the relation’s entity, Kafka messages’ key bytes assign row key values to the relation’s records, based on the relation’s row key definition (if any).
Note Some operations such as GROUP BY
and JOIN
impact the row key definition and add row keys to their results’ records.
When writing query results to a sink, the records’ keys are written as the messages’ keys into the sink relation’s entity. For example, when the result of a join query is written into a Kafka topic, the row keys of the resulting records are set as Kafka messages’ keys.
For more details, see Row Key Definition.
Last updated