LogoLogo
Start Trial
  • Overview
    • What is DeltaStream?
    • Core Concepts
      • Access Control
      • Compute Pools
      • Data Store
      • Database
      • Function
      • Query
      • SQL
      • Visualizing Data Lineage
  • Getting Started
    • Free Trial Quick Start
    • Starting with the Web App
    • Starting with the CLI
  • How do I...?
    • Create and Manage Data Stores
      • Create Data Stores for Streaming Data
      • Explore Data Store and Topic Details
      • Use Multiple Data Stores in Queries
    • Manage Users and User Roles
      • Inviting Users to an Organization
      • Administering Users in your Organization
      • Using the CLI to Manage User Roles
      • Example: Setting Up Custom Roles for Production and Stage
    • Create DeltaStream Objects to Structure Raw Data
    • Use Namespacing for Organizing Data
    • Create and Query Materialized Views
    • Create a Compute Pool to Work with Iceberg
    • Create a Function
    • Secure my Connection to a Data Store
      • Introducing DeltaStream Private Links
      • Creating an AWS Private Link from DeltaStream to your Confluent Kafka Dedicated Cluster
      • Enabling Private Link Connectivity to Confluent Enterprise Cluster and Schema Registry
      • Creating a Private Link from DeltaStream to Amazon MSK
      • Creating a Private Link for RDS Databases
      • Deleting a Private Link
    • Serialize my Data
      • Working with ProtoBuf Serialized Data and DeltaStream Descriptors
      • Working with Avro Serialized Data and Schema Registries
      • Configuring Deserialization Error Handling
  • Integrations
    • Setting up Data Store Integrations
      • AWS S3
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Iceberg REST Catalog
      • PostgreSQL
      • Snowflake
      • WarpStream
  • Setting up Enterprise Security Integrations
    • Okta SAML Integration
    • Okta SCIM Integration
  • use cases
    • Using an AWS S3 Store as a Source to Feed an MSK Topic
  • Reference
    • Metrics
      • Prometheus Integration
      • Built-In Metrics
      • Custom Metrics in Functions
    • SQL Syntax
      • Data Formats (Serialization)
        • Serializing with JSON
        • Serializing with Primitive Data Types
        • Serializing with Protobuf
      • Data Types
      • Identifiers and Keywords
      • Command
        • ACCEPT INVITATION
        • CAN I
        • COPY DESCRIPTOR_SOURCE
        • COPY FUNCTION_SOURCE
        • DESCRIBE ENTITY
        • DESCRIBE QUERY
        • DESCRIBE QUERY METRICS
        • DESCRIBE QUERY EVENTS
        • DESCRIBE QUERY STATE
        • DESCRIBE RELATION
        • DESCRIBE RELATION COLUMNS
        • DESCRIBE ROLE
        • DESCRIBE SECURITY INTEGRATION
        • DESCRIBE <statement>
        • DESCRIBE STORE
        • DESCRIBE USER
        • GENERATE COLUMNS
        • GENERATE TEMPLATE
        • GRANT OWNERSHIP
        • GRANT PRIVILEGES
        • GRANT ROLE
        • INVITE USER
        • LIST API_TOKENS
        • LIST COMPUTE_POOLS
        • LIST DATABASES
        • LIST DESCRIPTORS
        • LIST DESCRIPTOR_SOURCES
        • LIST ENTITIES
        • LIST FUNCTIONS
        • LIST FUNCTION_SOURCES
        • LIST INVITATIONS
        • LIST METRICS INTEGRATIONS
        • LIST ORGANIZATIONS
        • LIST QUERIES
        • LIST RELATIONS
        • LIST ROLES
        • LIST SCHEMAS
        • LIST SCHEMA_REGISTRIES
        • LIST SECRETS
        • LIST SECURITY INTEGRATIONS
        • LIST STORES
        • LIST USERS
        • PRINT ENTITY
        • REJECT INVITATION
        • REVOKE INVITATION
        • REVOKE PRIVILEGES
        • REVOKE ROLE
        • SET DEFAULT
        • USE
        • START COMPUTE_POOL
        • STOP COMPUTE_POOL
      • DDL
        • ALTER API_TOKEN
        • ALTER SECURITY INTEGRATION
        • CREATE API_TOKEN
        • CREATE CHANGELOG
        • CREATE COMPUTE_POOL
        • CREATE DATABASE
        • CREATE DESCRIPTOR_SOURCE
        • CREATE ENTITY
        • CREATE FUNCTION_SOURCE
        • CREATE FUNCTION
        • CREATE INDEX
        • CREATE METRICS INTEGRATION
        • CREATE ORGANIZATION
        • CREATE ROLE
        • CREATE SCHEMA_REGISTRY
        • CREATE SCHEMA
        • CREATE SECRET
        • CREATE SECURITY INTEGRATION
        • CREATE STORE
        • CREATE STREAM
        • CREATE TABLE
        • DROP API_TOKEN
        • DROP CHANGELOG
        • DROP COMPUTE_POOL
        • DROP DATABASE
        • DROP DESCRIPTOR_SOURCE
        • DROP ENTITY
        • DROP FUNCTION_SOURCE
        • DROP FUNCTION
        • DROP METRICS INTEGRATION
        • DROP RELATION
        • DROP ROLE
        • DROP SCHEMA
        • DROP SCHEMA_REGISTRY
        • DROP SECRET
        • DROP SECURITY INTEGRATION
        • DROP STORE
        • DROP STREAM
        • DROP USER
        • START/STOP COMPUTE_POOL
        • UPDATE COMPUTE_POOL
        • UPDATE ENTITY
        • UPDATE SCHEMA_REGISTRY
        • UPDATE SECRET
        • UPDATE STORE
      • Query
        • APPLICATION
        • Change Data Capture (CDC)
        • CREATE CHANGELOG AS SELECT
        • CREATE STREAM AS SELECT
        • CREATE TABLE AS SELECT
        • Function
          • Built-in Functions
          • Row Metadata Functions
        • INSERT INTO
        • Materialized View
          • CREATE MATERIALIZED VIEW AS
          • SELECT (FROM MATERIALIZED VIEW)
        • Query Name and Version
        • Resume Query
        • RESTART QUERY
        • SELECT
          • FROM
          • JOIN
          • MATCH_RECOGNIZE
          • WITH (Common Table Expression)
        • TERMINATE QUERY
      • Sandbox
        • START SANDBOX
        • DESCRIBE SANDBOX
        • STOP SANDBOX
      • Row Key Definition
    • DeltaStream OpenAPI
      • Deltastream
      • Models
Powered by GitBook
On this page
  • Create a descriptor
  • Update an Entity with the Descriptor
  • Queries with Descriptors and ProtoBuf
  1. How do I...?
  2. Serialize my Data

Working with ProtoBuf Serialized Data and DeltaStream Descriptors

PreviousSerialize my DataNextWorking with Avro Serialized Data and Schema Registries

Last updated 7 days ago

In streaming data stores such as and , producers send data events as bytes that consumers of the data must interpret. The most popular formats for data serialization include , , and , and DeltaStream supports all of these. This article focuses on and how to create and use a for data serialization/deserialization.

Begin with a Data Store with entities whose data records are serialized with ProtoBuf. This means you also have ProtoBuf messages and file descriptors to serialize and deserialize these data events. In DeltaStream, you can create a DeltaStream -- which is a wrapper around your ProtoBuf file descriptor -- and associate it with any that requires the ProtoBuf file descriptor for serialization/deserialization.

Create a descriptor

When working with ProtoBuf, you first define a ProtoBuf message and then generate a ProtoBuf file descriptor from that message. DeltaStream then uses this ProtoBuf file descriptor to generate any code necessary for serializing and deserializing data that conforms to the ProtoBuf message structure.

In this example, the ProtoBuf message, which lives in the file p.proto, resembles the following:

message Pageviews {
  int64 viewtime = 1;
  string userid = 2;
  string pageid = 3;
}

You can generate a ProtoBuf descriptor in the file pageviews_value.desc from this ProtoBuf message in the file p.proto (see for more details):

$ protoc --descriptor_set_out pageviews_value.desc p.proto

Now create a DeltaStream from this ProtoBuf file descriptor. In the CLI you can do this using the CREATE DESCRIPTOR_SOURCE DDL. In the UI, follow these steps to add a descriptor:

  1. In the lefthand navigation click Resources ( ). When the Resources page displays, click Descriptor Sources and then click + Add Descriptor Source.

  2. Choose the file containing your ProtoBuf file descriptor (pageviews-descriptor in this example). When prompted, name your descriptor, and then click UPLOAD.

  3. Now you can click on the descriptor to view the message names it contains (in this example, Pageviews).

Update an Entity with the Descriptor

  1. Click the data store you want, and when the data store page displays, click the topic you want. In the example below, we selected the KafkaStore containing the ProtoBuf entity pageviews_key_descriptor:

  2. As this is a Kafka data store, it allows for keys and enables you to assign a Key Descriptor and/or a Value Descriptor. For data stores that don’t allow for keys, including Kinesis data stores, you can only add a value descriptor.

  3. Click + Add Descriptors, and from the menus that display click the relevant descriptors to assign to this entity. In this example the Pageviews value descriptor is assigned, and the Key Descriptor is empty.

  4. That's it. You've assigned your descriptor to the relevant entity; now you can use this entity to run commands (such as PRINT ENTITY) and queries with DeltaStream objects.

For more information, see Serializing with Protobuf.

Queries with Descriptors and ProtoBuf

With descriptors added, you can now create a DeltaStream object that specifies a key.format or value.format of PROTOBUF as shown in the below DDL example. See CREATE STREAM for more details.

CREATE STREAM "pageviewsPB" (viewtime BIGINT, userid VARCHAR, pageid VARCHAR)
    WITH ('topic'='pageviews_pb', 'value.format'='PROTOBUF');

You can also create new objects using CREATE STREAM AS SELECT or CREATE CHANGELOG AS SELECT, specifying PROTOBUF as the data format for the sink object. The below example shows how you can easily convert the JSON stream pageviews_json to a stream called pageviews_converted_to_proto with a ProtoBuf key and value format.

CREATE STREAM pageviews_converted_to_proto WITH (
  'value.format' = 'protobuf', 'key.format' = 'PROTOBUF'
) AS 
SELECT * FROM pageviews_json;

When the sink object has a key or value format of PROTOBUF, the descriptor for the sink object is automatically created and assigned to the entity. You can easily view your descriptors in the Descriptors tab or use the LIST DESCRIPTORS command in the CLI. To use the descriptor outside of DeltaStream, you can download the ProtoBuf descriptor via the COPY DESCRIPTOR_SOURCE command.

Finally, with regard to the PRINT ENTITY command:

  • If an entity in a data store has a descriptor, the descriptor is for deserialization even if the data store has a schema registry.

  • If the entity does not have a descriptor, the data store checks whether the schema registry contains a schema for the entity, and uses it for deserialization.

  • If the entity doesn’t have a descriptor and the data store doesn’t have a schema registry—or it has a schema registry, but there is no corresponding schema in the registry—DeltaStream attempts to deserialize the data in the entity as JSON.

Now you can associate your descriptor with any relevant that needs it for serialization/deserialization. To do this in the CLI, see UPDATE ENTITY. For the UI, follow these steps:

In the lefthand navigation, click Resources ( ) to display the Resources page

Apache Kafka
Amazon Kinesis
JSON
ProtoBuf
Apache Avro
ProtoBuf
ProtoBuf documentation
Descriptor
Descriptor
Descriptor
#entity
#entity