CREATE CHANGELOG
Syntax
Description
A Changelog is a sequence of partitioned and partially ordered events (we use events and records synonymously). A Changelog is a relational representation of data in the streaming Stores, such as the data in a Apache Kafka topic or an Amazon Kinesis stream.
A Changelog defines a PRIMARY KEY
for records that is used to represent the change over time for records with the same PRIMARY KEY
. Records in a Changelog correlate with each other based on the PRIMARY KEY
. This means that a record in a Changelog either is an insert record if it’s the first time the record with the given PRIMARY KEY
is appended to the Changelog or upsert records if a previous record with the same PRIMARY KEY
has been inserted into the Changelog.
A Changelog is a type of Relation. Each Relation belongs to a Schema in a Database, so the fully qualified name of the Relation would be <database>.<schema>.<relation>
.
Arguments
changelog_name
This specifies the name of the new Changelog. For case-sensitive names, the name must be wrapped in double quotes; otherwise, the lowercase name will be used.
column_name
This is the name of a column to be created in the new Changelog. For case-sensitive names, the name must be wrapped in double quotes; otherwise, the lowercase name will be used.
data_type
This refers to the data type of the column, which can include array specifiers. For more information on the data types supported by DeltaStream, refer to the data types reference.
NOT NULL
Defines a constraint on the column, ensuring it cannot contain NULL
values.
PRIMARY KEY (column_name [, …])
The PRIMARY KEY
constraint specifies that column(s) of a table can contain only unique (non-duplicate), non-null values.
WITH (changelog_parameter = value [, … ])
This clause specifies Changelog Parameters.
Changelog Parameters
Kafka Specific Parameters
Parameters to be used if the associated Store is type KAFKA
:
Kinesis Specific Parameters
Parameters to be used if the associated Store is type KINESIS
:
Format Specific Parameters
Avro
Parameters to be used when writing records into a Changelog if associated key.format
or value.format
is avro
and the default Avro schema generation needs to be changed using a base schema for the key and/or value.
When generating an Avro schema for a column using a base schema:
if the base schema has a field with the same name and data type as the column's, then the field's definition from the base is used in the generated schema. This includes retaining base schema's
doc
andlogicalType
for the field.if the base schema has a field with the same name as the column's, but a different data type, then an Avro schema type definition is generated from the column's data type with the field's
doc
taken from the its corresponding field in the base schema.
Currently supported Schema Registries are Confluent Cloud and Confluent Platform.
Known Limitation: Confluent Schema Registry must use the default TopicNameStrategy for creating subject names.
Check CREATE SCHEMA_REGISTRY for more details.
Examples
Create a new Changelog
The following creates a new Changelog, user_last_page
. This Changelog reads from a topic named pageviews
and has a value.format
of JSON
. Note that this query also specifies userid
as the PRIMARY KEY
for the Changelog:
Create a new Changelog for an existing Entity
Create a new Changelog with a multi-column Primary Key
The following creates a new Changelog, pagevisits
. This Changelog reads from an Entity named pageviews
and has a value.format
of JSON
. Note that this query also specifies (userid, pageid)
as the PRIMARY KEY
for the Changelog:
Create a new Changelog with specifying key and timestamp
The following creates a new Changelog, LatestPageVisitor
, in the Database, DataBase
, and Schema, Schema2
. This Changelog reads from a topic named case_sensitive_pageviews
from the store OtherStore
and has a value.format
of Avro and a key.format
of PROTOBUF
. Since the key.format
is included, it is required that the key.type
is also provided and the value in this example is STRUCT<pageid VARCHAR>
. Note that this query also specifies PageId
as the PRIMARY KEY
for the Changelog, and many of the columns are in quotes, indicating they are case-sensitive. The case-insensitive column named CaseInsensitiveCol
will be lowercase as caseinsensitivecol
when the Relation is created. In the parameters, the timestamp
for this Relation is also specified so queries processing data using this Relation as the source will refer to the timestamp
column ViewTime
as the event’s timestamp:
Create a new Changelog specifying Kafka delivery guarantee
The following creates a new Changelog, user_exactly_once
. This Changelog reads from an Entity named users
and has a delivery.guarantee
of exactly_once
. By specifying the delivery.guarantee
, we are overriding the default value of at_least_once
. A user may want to use this configuration if their application can tolerate higher latencies but cannot tolerate duplicate records. When this Changelog is used as the sink in an INSERT INTO query, the query will use the delivery.guarantee
specified here.
Create a new Changelog with `NOT NULL` column
The following creates a new Changelog, users_log
. Two columns in this Changelog are defined with the NOT NULL
constraint: registertime
and contactinfo
. This means in any valid record from this Changelog, these two columns are not allowed to contain null values.
Create a new Changelog with format specific properties for Avro
The following creates a new Changelog, usersInfo,
whose records' key and value are in avro
format. It uses subjects from a Store called sr_store
as the base Avro schema to generate Avro schema for usersInfo
's key and value. users_data-key
subject is used to generate key's Avro schema and users_data-value
subject is used to generate value's Avro schema for the records written into usersInfo.
Last updated