CREATE CHANGELOG

Syntax

CREATE CHANGELOG changelog_name (
   column_name data_type [NOT NULL] [, ...],
   PRIMARY KEY (column_name [, ...])
) WITH (changelog_parameter = value [, ...]);

CREATE CHANGELOG changelog_name 
PRIMARY KEY (column_name [, ...])
WITH (changelog_parameter = value [, ...]);

Description

A changelog is a sequence of partitioned and partially-ordered events. It's a relational representation of data in the streaming stores, such as the data in a Apache Kafka topic or an Amazon Kinesis stream.

Note DeltaStream uses the terms events and records synonymously.

A changelog defines a PRIMARY KEY for records that is used to represent the change over time for records with the same PRIMARY KEY. Records in a changelog correlate with each other based on the PRIMARY KEY. This means a record in a changelog is either an insert record or an upsert record.

It's an insert record if it’s the first time the record with the given PRIMARY KEY is appended to the changelog
It's an upsert record if a previous record with the same PRIMARY KEY has been inserted into the changelog.

In Deltastream a changelog is a type of relation. Each relation belongs to a schema in a database, so the fully-qualified name of the relation would be <database>.<schema>.<relation>.

Arguments

changelog_name

This specifies the name of the new changelog. If the name is case sensitive you must wrap it in double quotes; otherwise the system uses the lowercase name.

column_name

This is the name of a column to be created in the new changelog. If the name is case sensitive you must wrap it in double quotes; otherwise the system uses the lowercase name.

data_type

This refers to the data type of the column, which can include array specifiers. For more information on the data types supported by DeltaStream, refer to the data types page.

NOT NULL

Defines a constraint on the column, ensuring it cannot contain NULL values.

PRIMARY KEY (column_name [, …])

The PRIMARY KEY constraint specifies that column(s) of a table can contain only unique (non-duplicate), non-null values.

WITH (changelog_parameter = value [, … ])

This clause specifies Changelog Parameters.

Changelog Parameters

Parameter Name

Description

topic

Name of the Data Store that has the data for this changelog. If the entity doesn’t exist, an entity with this name is created in the corresponding store.

Required: No Default value: Lowercase changelog_name Type: String

store

Name of the store that hosts the entity for this changelog.

Required: No Default value: User’s default store name Type: String Valid values: See LIST STORES.

value.format

Format of message value in the Data Store. See Data Formats (Serialization) for more information regarding serialization formats.

Required: Yes Type: String Valid values: JSON, AVRO, PROTOBUF, PRIMITIVE

Valid values without column definition: AVRO, PROTOBUF

timestamp

Name of the column in the changelog to use as the timestamp. If not set, the timestamp of the message is used for time-based operations such as window aggregations and joins. If the type of this timestamp field is BIGINT, DeltaStream expects the values in epoch milliseconds UTC.

Required: No Default value: Record’s timestamp Type: String Valid values: Must be of type BIGINT or TIMESTAMP. See Data Types.

timestamp.format

The format to use for TIMESTAMP typed fields. See Data Types.

Required: No Default value: sql Type: String Valid values: sql, iso8601

Kafka-Specific Parameters

Parameters to be used if the associated Data Store is type KAFKA:

Parameter Name

Description

topic.partitions

The number of partitions to use when creating the entity, if applicable. If the entity already exists, then this value must be equal to the number of partitions in the existing Kafka entity.

Required: Yes, unless entity already exists Default value: Leftmost source relation Entity’s partition count Type: Integer Valid values: [1, ...]

topic.replicas

The number of replicas to use when creating the entity, if applicable. If the entity already exists, then this value must be equal to the number of replicas in the existing Kafka entity.

Required: Yes, unless entity already exists Default values: Leftmost source relation Entity's replica count Type: Integer Valid values: [1, ...]

kafka.topic.*

A configuration specific for the topic being created — for example, Kafka Entity Configuration for Confluent Platform. Required: No Default value: None Type: String Valid values: Kafka topic configuration specific to the underlying Data Store type.

key.format

Format of the message key in the Data Store. This value can be the same as or different from the value.format. See Data Formats (Serialization) for more information regarding serialization formats.

Required: No, unless key.type is set Default value: None Type: String Valid values: JSON, AVRO, PROTOBUF, PRIMITIVE

Valid values without column definition: AVRO, PROTOBUF

key.type

Declares the names and data types of key columns. The type is a STRUCT when key.format is a non-primitive value — for example,'key.type'='STRUCT<id BIGINT, name VARCHAR>'. For primitive values, the type is one of the Primitive Data Types — for example,'key.type'='VARCHAR'.

Required: No, unless Default value: None Type: String Valid values: See STRUCT in Data Types.

key.columns

Specifies the name(s) of the value columns, separated by commas, that are used to construct the record key.

For a non-primitive key.format, the record key is created as a STRUCT whose fields are the columns listed in this property.
For a primitive key.format, this property must contain exactly one column with a primitive data type.
Note: key.columns cannot be set if key.type is already defined.

Required: No Default: None Type: String Valid values: One or more valid column names from the relation’s column list, separated by commas.

value.columns.exclude

Specifies the name(s) of the columns, separated by commas, that should be excluded from the record’s value and included only in its key.

You can only set this property if key.columns is already defined.
The excluded columns must appear at the end of the object’s column list and must also be listed in key.columns.

Required: No Default: None Type: String Valid values: One or more valid column names from the relation’s column list, separated by commas.

delivery.guarantee

The fault tolerance guarantees applied when producing to this changelog.

Required: No Default value: at_least_once Type: String Valid values:

exactly_once: Produces to the changelog using Kafka transactions. These transactions are committed when the query takes a checkpoint. On the consumer side, when setting the Kafka consumer isolation.level configuration to read_committed, only the committed records are displayed. Since records aren’t committed until the query takes a checkpoint, there is some additional delay when using this setting.
at_least_once: Ensures that records are output to the changelog at least once. During query checkpointing, the query waits to receive a confirmation of successful writes from the Kafka broker. If there are issues with the query then duplicate records are possible as the query will try to reprocess old data.
none: There is no fault tolerance guarantee when producing to the changelog. If there are issues on the Kafka broker, then records may be lost, and if there are issues with the query then output records may be duplicated.

sink.timestamp.column

Specifies the name of the value column to be used to set the Kafka record’s timestamp when writing to the Kafka sink’s entity. If no timestamp column is specified, the Kafka producer record is created without an explicit timestamp, allowing the sink’s store to assign a timestamp according to its configured policy.

Required: No Default value: None Type: String Valid values: One of the column names from the sink relation’s column list. Must be of type BIGINT or TIMESTAMP or TIMESTAMP_LTZ. See Data Types.

Kinesis-Specific Parameters

Parameters to be used if the associated Data Store is type KINESIS:

Parameter Name

Description

topic.shards

The number of shards to use when creating the entity, if applicable. If the entity already exists, this value must be equal to the number of shards in the existing Kinesis data stream.

Required: Yes, unless entity already exists Default values: Leftmost source relation topic’s shard count Type: Integer Valid values: [1, ...] Alias: kinesis.shards

Kinesis stores provide a delivery guarantee of at_least_once when producing events into a sink Data Store.

Format-Specific Parameters

Avro

Parameters to be used when writing records into a changelog if associated key.format or value.format is avro and the default Avro schema generation must be changed using a base schema for the key and/or value.

When generating an Avro schema for a column using a base schema:

if the base schema has a field with the same name and data type as those of the column, then the field's definition from the base is used in the generated schema. This includes retaining the base schema's doc and logicalType for the field.
if the base schema has a field with the same name as that of the column but has a different data type, then an Avro schema type definition is generated from the column's data type with the field's doc taken from the its corresponding field in the base schema.

Notes

Currently supported schema registries are Confluent Cloud and Confluent Platform.
Known limitation: Confluent schema registry must use the default TopicNameStrategy for creating subject names.

Check CREATE SCHEMA_REGISTRY for more details.

Parameter Name

Description

avro.base.schema.store

Name of the store whose schema registry contains the Avro schema subject(s) to be used as the base schema for generating the Avro schema for the changelog's key and/or value.

Required: No Default values: Current session's store name Type: Identifier Valid values: See LIST STORES.

avro.base.subject.key

Name of the subject in the schema registry to obtain the base schema for generating Avro schema for changelog's key.

Required: No, unless key.format is set to avro and key.type is defined. Type: String

avro.base.subject.value

Name of the subject in the schema registry to obtain the base schema for generating Avro schema for changelog's value columns.

Required: No, unless value.format is set to avro . Type: String

Examples

Create a new changelog

The following creates a new changelog, user_last_page. This changelog reads from a topic named pageviews and has a value.format of JSON. Note that this query also specifies userid as the PRIMARY KEY for the changelog:

CREATE CHANGELOG user_last_page (
   viewtime BIGINT,
   userid VARCHAR,
   pageid VARCHAR,
   PRIMARY KEY(userid)
)
WITH (
   'topic'='pageviews',
   'value.format'='json'
);

Create a new changelog for an existing entity

The following creates a new users changelog for the existing users Data Store in the current Data Store. This DDL implies that the name of the changelog should be used as the name of the entity that hosts the records. This DDL also implies the original structure for the users entity with a PRIMARY KEY for updates:

CREATE CHANGELOG "users" (
    registertime BIGINT,
    userid VARCHAR,
    regionid VARCHAR,
    gender VARCHAR,
    interests ARRAY<VARCHAR>,
    contactinfo STRUCT<
        phone VARCHAR,
        city VARCHAR,
        "state" VARCHAR,
        zipcode VARCHAR>,
    PRIMARY KEY(userid)
) WITH ( 'value.format'='json' );

Create a new changelog with passthrough configuration for retention

CREATE CHANGELOG customers_log (
 ts BIGINT, customer_id VARCHAR, full_name BIGINT, region VARCHAR, 
PRIMARY KEY(customer_id)
) WITH (
  'store' = 'kafka_store',
  'topic.partitions' = 1, 
  'topic.replicas' = 2, 
  'kafka.topic.retention.ms' = '172800000');

Create a new changelog with a multi-column primary key

The following creates a new changelog, pagevisits. This changelog reads from an entity named pageviews and has a value.format of JSON. Note that this query also specifies (userid, pageid) as the PRIMARY KEY for the changelog:

CREATE CHANGELOG pagevisits (
   viewtime BIGINT,
   userid VARCHAR,
   pageid VARCHAR,
   PRIMARY KEY(userid, pageid)
) WITH ( 'topic'='pageviews', 'value.format'='json' );

Create a new changelog with specifying key and timestamp

The following creates a new changelog, LatestPageVisitor, in the database, DataBase, and schema, Schema2. This changelog reads from a topic named case_sensitive_pageviews from the store OtherStore and has a value.format of Avro and a key.format of PROTOBUF. Since the key.format is included, it also requires the key.type and the value in this example is STRUCT<pageid VARCHAR>. This query also specifies PageId as the PRIMARY KEY for the changelog. Also, many of the columns are in quotes, indicating they are case-sensitive. The case-insensitive column named CaseInsensitiveCol is in lowercase as caseinsensitivecol when the relation is created. In the parameters, the timestamp for this relation is also specified, so queries processing data using this relation as the source refer to the timestamp column ViewTime as the event’s timestamp:

CREATE CHANGELOG "DataBase"."Schema2"."LatestPageVisitor" (
   "ViewTime" BIGINT,
   "userID" VARCHAR,
   "PageId" VARCHAR,
   "CaseSensitiveCol" BIGINT,
   CaseInsensitiveCol BIGINT,
   PRIMARY KEY("PageId")
) WITH (
   'topic'='case_sensitive_pageviews',
   'store'='OtherStore',
   'value.format'='avro',
   'key.format'='protobuf',
   'key.type'='STRUCT<"PageId" VARCHAR>',
   'timestamp'='ViewTime'
);

Create a new changelog specifying Kafka delivery guarantee

The following creates a new changelog, user_exactly_once. This changelog reads from an entity named users and has a delivery.guarantee of exactly_once. By specifying the delivery.guarantee, you override the default value of at_least_once. You may wish to use this configuration if your application can tolerate higher latencies but cannot tolerate duplicate records. When you use this changelog as the sink in an INSERT INTO query, the query uses the delivery.guarantee specified here.

CREATE CHANGELOG user_exactly_once (
   viewtime BIGINT,
   userid VARCHAR,
   pageid VARCHAR,
   PRIMARY KEY(userid)
)
WITH (
   'topic'='users',
   'value.format'='json',
   'delivery.guarantee'='exactly_once'
);

Create a new changelog with `NOT NULL` column

The following creates a new changelog, users_log. Two columns in this changelog are defined with the NOT NULL constraint: registertime and contactinfo. As a result these two columns are not allowed to contain null values in any valid record from this changelog.

CREATE CHANGELOG users_log (
    registertime BIGINT NOT NULL,
    userid VARCHAR, 
    interests ARRAY<VARCHAR>,
    contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR> NOT NULL,
    PRIMARY KEY(userid)
)
WITH (
   'topic'='users', 
    'key.format'='json', 
    'key.type'='STRUCT<userid VARCHAR>', 
    'value.format'='json'
);

Create a new changelog with format specific properties for Avro

The following creates a new changelog, usersInfo, whose records' key and value are in avro format. It uses subjects from a store called sr_store as the base Avro schema to generate Avro schema for usersInfo's key and value. It uses users_data-key subject to generate key's Avro schema. It also uses users_data-value subject to generate the value's Avro schema for the records written into usersInfo.

CREATE CHANGELOG "usersInfo" (
    registertime BIGINT NOT NULL,
    userid VARCHAR, 
    interests ARRAY<VARCHAR>,
    contactinfo STRUCT<phone VARCHAR, city VARCHAR, "state" VARCHAR, zipcode VARCHAR> NOT NULL,
    PRIMARY KEY(userid)
)
WITH (
    'topic'='usersInfo', 
    'key.format'='avro',
    'key.type'='STRUCT<userid VARCHAR>', 
    'value.format'='avro',
    'avro.base.store.name' = sr_store,
    'avro.base.subject.key' = 'users_data-key',
    'avro.base.subject.value' = 'users_data-value'
);

Create a changelog with data in S3

The following creates changelog with data backed by an S3 store:

CREATE CHANGELOG pageviews_s3 (
    viewtime BIGINT, 
    userid VARCHAR, 
    pageid VARCHAR
) WITH (
    'store' = 's3_store', 
    's3.uri'='s3://ctan-playground-data/jsonl/', 
    's3.discovery.interval.seconds'=15, 
    'value.format'='jsonl'
);

Notes:

s3.uri is required
value.format: options[jsonl]
s3.discovery.interval.seconds: optional. Default = 10

Create a new changelog with key columns

The following creates a pageviews changelog. The key and value of records in this changelog are both in json format.

Value consists of 3 columns:

viewtime
userid
pageid

Key is a STRUCT with two fields:

userid
pageid

The values of these fields come from the corresponding columns.

CREATE CHANGELOG pageviews (
    viewtime BIGINT, 
    userid VARCHAR, 
    pageid VARCHAR,
    PRIMARY KEY (userid)
) WITH (
    'store' = 'kafka_store',
    'topic' = 'pageviews',
    'value.format' = 'json',
    'key.format' = 'json', 
    'key.columns'='userid,pageid'
);

Create a new changelog with key columns and value exclude columns

The following creates a pageviews changelog. The key and value of records in this changelog are both in json format. Key is a STRUCT with two fields:

userid
pageid

The values of these fields come from the corresponding columns. Sincepageid is set to be an excluded column from value, value in each record consists of 2 columns

viewtime
userid

CREATE CHANGELOG pageviews (
    viewtime BIGINT, 
    userid VARCHAR, 
    pageid VARCHAR,
    PRIMARY KEY (userid)
) WITH (
    'store' = 'kafka_store',
    'topic' = 'pageviews',
    'value.format' = 'json',
    'key.format' = 'json', 
    'key.columns'='userid,pageid',
    'value.columns.exclude'='pageid'
);

PreviousCREATE API_TOKEN NextCREATE COMPUTE_POOL

Last updated 9 hours ago

Syntax

Description

Arguments

changelog_name

column_name

data_type

NOT NULL

PRIMARY KEY (column_name [, …​])

WITH (changelog_parameter = value [, …​ ])

Changelog Parameters

Kafka-Specific Parameters

Kinesis-Specific Parameters

Format-Specific Parameters

Avro

Examples

Create a new changelog

Create a new changelog for an existing entity

Create a new changelog with passthrough configuration for retention

Create a new changelog with a multi-column primary key

Create a new changelog with specifying key and timestamp

Create a new changelog specifying Kafka delivery guarantee

Create a new changelog with `NOT NULL` column

Create a new changelog with format specific properties for Avro

Create a changelog with data in S3

Create a new changelog with key columns

Create a new changelog with key columns and value exclude columns

PRIMARY KEY (column_name [, …])

WITH (changelog_parameter = value [, … ])