Database
Data organization in DeltaStream
Last updated
Data organization in DeltaStream
Last updated
Databases are the foundation for organizing data in DeltaStream. They provide the building block of its namespacing model.
You create databases for logical groupings for different teams or projects. For instance, you can create one database for a logging project and another for an ads team.
A schema is a logical grouping of relational objects such as streams, changelogs, materialized views, and tables. Schemas are grouped in a database. A combination of databases and schemas enable you to organize their streams, changelogs, and other database objects in a hierarchical fashion in DeltaStream. Such hierarchies also are one of the bases for providing role-based access control (RBAC) in DeltaStream in the same way as do other relational databases.
DeltaStream provides a relational model for streaming data wherein data is stored in relations. DeltaStream supports the following relation types:
Stream
Changelog
Materialized View
Table
In DeltaStream, these relations are building blocks of your applications and pipelines. You can specity relation names as fully- or partially-qualified names by specifying a database and/or name in the format of [<database_name>.<schema_name>.]<relation_name>
, like this:
db1.public.pageviews
Otherwise, DeltaStream uses the current database and schema in the scope of a client to identify a relation.
A stream is a sequence of immutable, partitioned, and partially-ordered events.
Tip DeltaStream uses the terms "events" and "records" synonymously.
A stream is a relational representation of data in streaming stores, such as the data in a Kafka topic or a Kinesis stream.
The records in a stream are independent of each other; there is no correlation between two records in a stream.
A stream declares the schema of the records; this includes the column name, the column type, and optional constraints.
As with a stream, a changelog is
a sequence of partitioned and partially-ordered events
a relational representation of data in the streaming stores, such as the data in a Kafka topic or a Kinesis stream.
A changelog defines a PRIMARY KEY
used to represent the change over time for records with the same primary key. Records in a changelog correlate with each other based on the PRIMARY KEY
. This means a record in a changelog either is an insert (if it’s the first time the record with the given PRIMARY KEY
is appended to the changelog) or an upsert (if a previous record with the same PRIMARY KEY
has already been inserted into the changelog).
A materialized view creates a snapshot of a streaming query result and continuously updates the snapshot as records arrive to the query input(s). A materialized view is queryable in DeltaStream; when you query it the results are computed using the data in the snapshot at query runtime.
For more details, see Row Key Definition.
A table is similar to a materialized view in that it stores records from a streaming source. Unlike materialized views, however, tables do not support upserts. Rather, DeltaStream stores all records from a source or an upstream query operation (such as a JOIN
or aggregation) as a sequence of records, as they are provided, for the sink that writes to the table. When you use a table with records that have a primary key -- for example, a -- the resulting rows in the table represent the incremental changes to each record key.
Each record in a or can have a row key. (Defining a row key is optional for a relation.) The value of a key for a given record is extracted from its corresponding message, which is read from the source relation’s . For example, if you use a Kafka topic as the relation’s entity, Kafka messages’ key bytes assign row key values to the relation’s records, based on the relation’s row key definition (if any).
When writing query results to a sink, the records’ keys are written as the messages’ keys into the sink relation’s . For example, when the result of a join query is written into a Kafka topic, the row keys of the resulting records are set as Kafka messages’ keys.