# Working with ProtoBuf Serialized Data and DeltaStream Descriptors

In streaming data stores such as [Apache Kafka](https://kafka.apache.org/) and [Amazon Kinesis](https://aws.amazon.com/kinesis/), producers send data events as bytes that consumers of the data must interpret. The most popular formats for data serialization include [JSON](https://www.json.org/json-en.html), [ProtoBuf](https://protobuf.dev/), and [Apache Avro](https://avro.apache.org/docs/), and DeltaStream supports all of these. This article focuses on [ProtoBuf](https://protobuf.dev/) and how to create and use a [Descriptor](/reference/sql-syntax/data-format-serialization.md#protocol-buffers-and-descriptors) for data serialization/deserialization.

Begin with a [Data Store](/overview/core-concepts/store.md) with entities whose data records are serialized with ProtoBuf. This means you also have ProtoBuf messages and file descriptors to serialize and deserialize these data events. In DeltaStream, you can create a DeltaStream [Descriptor](/reference/sql-syntax/data-format-serialization.md#protocol-buffers-and-descriptors) -- which is a wrapper around your ProtoBuf file descriptor -- and associate it with any [Data Store](/overview/core-concepts/store.md#entity) that requires the ProtoBuf file descriptor for serialization/deserialization.

## Create a descriptor

When working with ProtoBuf, you first define a ProtoBuf message and then generate a ProtoBuf file descriptor from that message. DeltaStream then uses this ProtoBuf file descriptor to generate any code necessary for serializing and deserializing data that conforms to the ProtoBuf message structure.

In this example, the ProtoBuf message, which lives in the file `p.proto`, resembles the following:

```protobuf
message Pageviews {
  int64 viewtime = 1;
  string userid = 2;
  string pageid = 3;
}
```

You can generate a ProtoBuf descriptor in the file `pageviews_value.desc` from this ProtoBuf message in the file `p.proto` (see [ProtoBuf documentation](https://protobuf.dev/programming-guides/techniques/#self-description) for more details):

```sh
$ protoc --descriptor_set_out pageviews_value.desc p.proto
```

Now create a DeltaStream [Descriptor](/reference/sql-syntax/data-format-serialization.md#protocol-buffers-and-descriptors) from this ProtoBuf file descriptor. In the CLI you can do this using the [CREATE DESCRIPTOR\_SOURCE](/reference/sql-syntax/ddl/create-descriptor_source.md) DDL. In the UI, follow these steps to add a descriptor:

1. In the lefthand navigation click **Resources** ( ![](/files/Zwq1BBdRyaRsv55N3KNm) ). When the **Resources** page displays, click **Descriptor Sources** and then click **+ Add Descriptor Source**.<br>

   <figure><img src="/files/5adLpbCQfIg1Z7JKGmKi" alt="" width="563"><figcaption></figcaption></figure>
2. Choose the file containing your ProtoBuf file descriptor (`pageviews-descriptor` in this example). When prompted, name your descriptor, and then click **UPLOAD**.<br>

   <figure><img src="/files/xFFQcdfEPs8pp9NGb4sB" alt="" width="563"><figcaption></figcaption></figure>
3. Now you can click on the descriptor to view the message names it contains (in this example, `Pageviews`).

<figure><img src="/files/46i17neqACEEc2WQZS5A" alt="" width="563"><figcaption></figcaption></figure>

## Update an Entity with the Descriptor

Now you can associate your descriptor with any relevant [Data Store](/overview/core-concepts/store.md#entity) that needs it for serialization/deserialization. To do this in the CLI, see [UPDATE ENTITY](/reference/sql-syntax/ddl/update-entity.md). For the UI, follow these steps:

1. In the lefthand navigation, click **Resources** ( ![](/files/Zwq1BBdRyaRsv55N3KNm) ) to display the **Resources** page
2. Click the data store you want, and when the data store page displays, click the topic you want. In the example below, we selected the `KafkaStore` containing the ProtoBuf entity `pageviews_key_descriptor`:

   <figure><img src="/files/faJMx5HuzM5f6xuIPRQT" alt="" width="563"><figcaption></figcaption></figure>
3. As this is a Kafka data store, it allows for keys and enables you to assign a **Key Descriptor** and/or a **Value Descriptor.** For data stores that don’t allow for keys, including Kinesis data stores, you can only add a value descriptor.<br>

   <div align="center"><figure><img src="/files/oeQvilmQZsvaUyVoZ43f" alt="" width="563"><figcaption></figcaption></figure></div>
4. Click **+ Add Descriptors**, and from the menus that display click the relevant descriptors to assign to this entity. In this example the `Pageviews` value descriptor is assigned, and the `Key Descriptor` is empty.<br>

   <figure><img src="/files/T3LAfCFJfLoPfyBUrnnF" alt="" width="563"><figcaption></figcaption></figure>
5. That's it. You've assigned your descriptor to the relevant entity; now you can use this entity to run commands (such as [PRINT ENTITY](/reference/sql-syntax/command/print-entity.md)) and queries with DeltaStream objects.

For more information, see [Serializing with Protobuf](/reference/sql-syntax/data-format-serialization/serializing-with-protobuf.md).

## Queries with Descriptors and ProtoBuf

With descriptors added, you can now create a DeltaStream object that specifies a `key.format` or `value.format` of `PROTOBUF` as shown in the below DDL example. See [CREATE STREAM](/reference/sql-syntax/ddl/create-stream.md) for more details.

```sql
CREATE STREAM "pageviewsPB" (viewtime BIGINT, userid VARCHAR, pageid VARCHAR)
    WITH ('topic'='pageviews_pb', 'value.format'='PROTOBUF');
```

You can also create new objects using [CREATE STREAM AS SELECT](/reference/sql-syntax/query/create-stream-as.md) or [CREATE CHANGELOG AS SELECT](/reference/sql-syntax/query/create-changelog-as.md), specifying `PROTOBUF` as the data format for the sink object. The below example shows how you can easily convert the JSON stream `pageviews_json` to a stream called `pageviews_converted_to_proto` with a ProtoBuf key and value format.

```sql
CREATE STREAM pageviews_converted_to_proto WITH (
  'value.format' = 'protobuf', 'key.format' = 'PROTOBUF'
) AS 
SELECT * FROM pageviews_json;
```

When the sink object has a key or value format of `PROTOBUF`, the descriptor for the sink object is automatically created and assigned to the entity. You can easily view your descriptors in the **Descriptors** tab or use the [LIST DESCRIPTORS](/reference/sql-syntax/command/list-descriptors.md) command in the CLI. To use the descriptor outside of DeltaStream, you can download the ProtoBuf descriptor via the [COPY DESCRIPTOR\_SOURCE](/reference/sql-syntax/command/copy-descriptor_source.md) command.

Finally, with regard to the `PRINT ENTITY` command:

* If an entity in a data store has a descriptor, the descriptor is for deserialization even if the data store has a schema registry.
* If the entity does not have a descriptor, the data store checks whether the schema registry contains a schema for the entity, and uses it for deserialization.
* If the entity doesn’t have a descriptor and the data store doesn’t have a schema registry—or it has a schema registry, but there is no corresponding schema in the registry—DeltaStream attempts to deserialize the data in the entity as JSON.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.deltastream.io/how-do-i.../serialization/working-with-protobuf-serialized-data-and-deltastream-descriptors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
