LogoLogo
Start Trial
  • Overview
    • What is DeltaStream?
    • Core Concepts
      • Access Control
      • Region
      • SQL
      • Store
      • Database
      • Query
      • Visualizing Data Lineage
      • Function
  • Getting Started
    • Free Trial Quick Start
    • Starting with the Web App
    • Starting with the CLI
  • Tutorials
    • Managing Users and User Roles
      • Inviting Users to an Organization
      • Administering Users in your Organization
      • Using the CLI to Manage User Roles
      • Example: Setting Up Custom Roles for Production and Stage
    • Creating Stores for Streaming Data
    • Using Multiple Stores in Queries
    • Creating Relations to Structure Raw Data
    • Namespacing with Database and Schema
    • Creating and Querying Materialized Views
    • Creating a Function
    • Securing Your Connections to Data Stores
      • Introducing DeltaStream Private Links
      • Creating an AWS Private Link from DeltaStream to your Confluent Kafka Dedicated Cluster
      • Enabling Private Link Connectivity to Confluent Enterprise Cluster and Schema Registry
      • Creating a Private Link from DeltaStream to Amazon MSK
      • Creating a Private Link for RDS Databases
      • Deleting a Private Link
    • Integrations
      • Connecting to Confluent Cloud
      • Databricks
      • PostgreSQL
      • Snowflake
      • WarpStream
    • Serialization
      • Working with ProtoBuf Serialized Data and DeltaStream Descriptors
      • Working with Avro Serialized Data and Schema Registries
      • Configuring Deserialization Error Handling
  • Reference
    • Enterprise Security Integrations
      • Okta SAML Integration
      • Okta SCIM Integration
    • Metrics
      • Prometheus Integration
      • Built-In Metrics
      • Custom Metrics in Functions
    • SQL Syntax
      • Data Formats (Serialization)
        • Serializing with JSON
        • Serializing with Primitive Data Types
        • Serializing with Protobuf
      • Data Types
      • Identifiers and Keywords
      • Command
        • ACCEPT INVITATION
        • CAN I
        • COPY DESCRIPTOR_SOURCE
        • COPY FUNCTION_SOURCE
        • DESCRIBE ENTITY
        • DESCRIBE QUERY
        • DESCRIBE QUERY METRICS
        • DESCRIBE QUERY EVENTS
        • DESCRIBE QUERY STATE
        • DESCRIBE RELATION
        • DESCRIBE RELATION COLUMNS
        • DESCRIBE ROLE
        • DESCRIBE SECURITY INTEGRATION
        • DESCRIBE <statement>
        • DESCRIBE STORE
        • DESCRIBE USER
        • GENERATE COLUMNS
        • GENERATE TEMPLATE
        • GRANT OWNERSHIP
        • GRANT PRIVILEGES
        • GRANT ROLE
        • INVITE USER
        • LIST API_TOKENS
        • LIST DATABASES
        • LIST DESCRIPTORS
        • LIST DESCRIPTOR_SOURCES
        • LIST ENTITIES
        • LIST FUNCTIONS
        • LIST FUNCTION_SOURCES
        • LIST INVITATIONS
        • LIST METRICS INTEGRATIONS
        • LIST ORGANIZATIONS
        • LIST QUERIES
        • LIST REGIONS
        • LIST RELATIONS
        • LIST ROLES
        • LIST SCHEMAS
        • LIST SCHEMA_REGISTRIES
        • LIST SECRETS
        • LIST SECURITY INTEGRATIONS
        • LIST STORES
        • LIST USERS
        • PRINT ENTITY
        • REJECT INVITATION
        • REVOKE INVITATION
        • REVOKE PRIVILEGES
        • REVOKE ROLE
        • SET DEFAULT
        • USE
      • DDL
        • ALTER API_TOKEN
        • ALTER SECURITY INTEGRATION
        • CREATE API_TOKEN
        • CREATE CHANGELOG
        • CREATE DATABASE
        • CREATE DESCRIPTOR_SOURCE
        • CREATE ENTITY
        • CREATE FUNCTION_SOURCE
        • CREATE FUNCTION
        • CREATE INDEX
        • CREATE METRICS INTEGRATION
        • CREATE ORGANIZATION
        • CREATE ROLE
        • CREATE SCHEMA_REGISTRY
        • CREATE SCHEMA
        • CREATE SECRET
        • CREATE SECURITY INTEGRATION
        • CREATE STORE
        • CREATE STREAM
        • CREATE TABLE
        • DROP API_TOKEN
        • DROP CHANGELOG
        • DROP DATABASE
        • DROP DESCRIPTOR_SOURCE
        • DROP ENTITY
        • DROP FUNCTION_SOURCE
        • DROP FUNCTION
        • DROP METRICS INTEGRATION
        • DROP RELATION
        • DROP ROLE
        • DROP SCHEMA
        • DROP SCHEMA_REGISTRY
        • DROP SECRET
        • DROP SECURITY INTEGRATION
        • DROP STORE
        • DROP STREAM
        • DROP USER
        • UPDATE ENTITY
        • UPDATE SCHEMA_REGISTRY
        • UPDATE SECRET
        • UPDATE STORE
      • Query
        • APPLICATION
        • Change Data Capture (CDC)
        • CREATE CHANGELOG AS SELECT
        • CREATE STREAM AS SELECT
        • CREATE TABLE AS SELECT
        • Function
          • Built-in Functions
          • Row Metadata Functions
        • INSERT INTO
        • Materialized View
          • CREATE MATERIALIZED VIEW AS
          • SELECT (FROM MATERIALIZED VIEW)
        • Query Name and Version
        • Resume Query
        • RESTART QUERY
        • SELECT
          • FROM
          • JOIN
          • MATCH_RECOGNIZE
          • WITH (Common Table Expression)
        • TERMINATE QUERY
      • Sandbox
        • START SANDBOX
        • DESCRIBE SANDBOX
        • STOP SANDBOX
      • Row Key Definition
    • Rest API
Powered by GitBook
On this page
  • Create a descriptor
  • Update an Entity with the Descriptor
  • Queries with Descriptors and ProtoBuf
  1. Tutorials
  2. Serialization

Working with ProtoBuf Serialized Data and DeltaStream Descriptors

PreviousSerializationNextWorking with Avro Serialized Data and Schema Registries

Last updated 5 months ago

In streaming stores such as and , producers send data events as bytes that consumers of the data must interpret. The most popular formats for data serialization include , , and , and DeltaStream supports all of these. This tutorial focuses on and how to create and use a for data serialization/deserialization.

Begin with a Store with s whose data records are serialized with ProtoBuf. This means you also have ProtoBuf messages and file descriptors to serialize and deserialize these data events. In DeltaStream, you can create a DeltaStream -- which is a wrapper around your ProtoBuf file descriptor -- and associate it with any that requires the ProtoBuf file descriptor for serialization/deserialization.

Create a descriptor

When working with ProtoBuf, you first define a ProtoBuf message and then generate a ProtoBuf file descriptor from that message. DeltaStream then uses this ProtoBuf file descriptor to generate any code necessary for serializing and deserializing data that conforms to the ProtoBuf message structure.

In this example, the ProtoBuf message, which lives in the file p.proto, resembles the following:

message Pageviews {
  int64 viewtime = 1;
  string userid = 2;
  string pageid = 3;
}

You can generate a ProtoBuf descriptor in the file pageviews_value.desc from this ProtoBuf message in the file p.proto (see for more details):

$ protoc --descriptor_set_out pageviews_value.desc p.proto

Now create a DeltaStream from this ProtoBuf file descriptor. In the CLI you can do this using the CREATE DESCRIPTOR_SOURCE DDL. In the UI, follow these steps to add a descriptor:

  1. In the lefthand navigation click Resources ( ). When the Resources page displays, click Descriptor Sources and then click + Add Descriptor Source.

  2. Choose the file containing your ProtoBuf file descriptor (pageviews-descriptor in this example). When prompted, name your descriptor, and then click UPLOAD.

  3. Now you can click on the descriptor to view the message names it contains (in this example, Pageviews).

Update an Entity with the Descriptor

  1. Click the store you want, and when the store page displays, click the topic you want. In the example below, we selected the KafkaStore containing the ProtoBuf entity pageviews_key_descriptor:

  2. As this is a Kafka store, it allows for keys and enables you to assign a Key Descriptor and/or a Value Descriptor. For stores that don’t allow for keys, including Kinesis stores, you can only add a value descriptor.

  3. Click + Add Descriptors, and from the menus that display click the relevant descriptors to assign to this entity. In this example the Pageviews value descriptor is assigned, and the Key Descriptor is empty.

  4. That's it. You've assigned your descriptor to the relevant entity; now you can use this entity to run commands (such as PRINT ENTITY) and queries with relations.

Queries with Descriptors and ProtoBuf

With descriptors added, you can now create a relation that specifies a key.format or value.format of PROTOBUF as shown in the below DDL example. See CREATE STREAM for more details.

CREATE STREAM "pageviewsPB" (viewtime BIGINT, userid VARCHAR, pageid VARCHAR)
    WITH ('topic'='pageviews_pb', 'value.format'='PROTOBUF');

You can also create new relations using CREATE STREAM AS SELECT or CREATE CHANGELOG AS SELECT, specifying PROTOBUF as the data format for the sink relation. The below example shows how you can easily convert the JSON stream pageviews_json to a stream called pageviews_converted_to_proto with a ProtoBuf key and value format.

CREATE STREAM pageviews_converted_to_proto WITH (
  'value.format' = 'protobuf', 'key.format' = 'PROTOBUF'
) AS 
SELECT * FROM pageviews_json;

When the sink relation has a key or value format of PROTOBUF, the descriptor for the sink relation is automatically created and assigned to the entity. You can easily view your descriptors in the Descriptors tab or use the LIST DESCRIPTORS command in the CLI. To use the descriptor outside of DeltaStream, you can download the ProtoBuf descriptor via the COPY DESCRIPTOR_SOURCE command.

Finally, with regard to the PRINT ENTITY command:

  • If an entity in a store has a descriptor, the descriptor is for deserialization even if the store has a schema registry.

  • If the entity does not have a descriptor, the store checks whether the schema registry contains a schema for the entity, and uses it for deserialization.

  • If the entity doesn’t have a descriptor and the store doesn’t have a schema registry—or it has a schema registry, but there is no corresponding schema in the registry—DeltaStream attempts to deserialize the data in the entity as JSON.

Now you can associate your descriptor with any relevant that needs it for serialization/deserialization. To do this in the CLI, see UPDATE ENTITY. For the UI, follow these steps:

In the lefthand navigation, click Resources ( ) to display the Resources page

Apache Kafka
Amazon Kinesis
JSON
ProtoBuf
Apache Avro
ProtoBuf
ProtoBuf documentation
Descriptor
Descriptor
Descriptor
#entity
#entity
#entity