Working with ProtoBuf Serialized Data and DeltaStream Descriptors
In streaming stores such as Apache Kafka and Amazon Kinesis, producers send data events as Bytes that need to be interpreted by the consumers of the data. The most popular formats for data serialization include JSON, ProtoBuf, and Apache Avro, all of which DeltaStream supports. In this tutorial, we will focus on ProtoBuf and how to create and use a Descriptor for data serialization/deserialization.
Suppose you have a Store with Entitys whose data records are serialized with ProtoBuf. This means you also have ProtoBuf messages and file descriptors to serialize and deserialize these data events. In DeltaStream, you can create a DeltaStream Descriptor, which is a wrapper around your ProtoBuf file descriptor, and associate it with any Entitys that require the ProtoBuf file descriptor for serialization/deserialization.
Create a Descriptor
When working with ProtoBuf, you first define a ProtoBuf message and then generate a ProtoBuf file descriptor from that message. This ProtoBuf file descriptor is then used by DeltaStream to generate any necessary code for serializing and deserializing data that conforms to the ProtoBuf message’s structure. Let’s assume for this tutorial that our ProtoBuf message, which lives in the file p.proto
, looks like the following:
We can generate a ProtoBuf descriptor in the file pageviews_value.desc
from this ProtoBuf message in the file p.proto
(see ProtoBuf documentation for more details):
Now we can create a DeltaStream Descriptor from this ProtoBuf file descriptor. In the CLI, this can be done using the CREATE DESCRIPTOR_SOURCE DDL. In the UI, we can add a descriptor with the following steps:
From the left menu, click Descriptors > UPLOAD:\
After choosing the file containing your ProtoBuf file descriptor (
pageviews_value.desc
in our example), you have the option to name your descriptor. Then press UPLOAD.\Done! Now that we have successfully added a Descriptor, when you click on it you can see the message names contained in the Descriptor (e.g.
Pageviews
in this example).\
Update an Entity with the Descriptor
After we created a Descriptor, we can now associate it with any relevant Entitys that need this Descriptor for serialization/deserialization. To do this in the CLI, see UPDATE ENTITY. In the UI, we can add Descriptors to Entities with the following steps:
From the left menu, click Stores > Select the Store > Select the ProtoBuf Topic. In our example, we select the
KafkaStore
containing the ProtoBuf Entitypageviews_pb
:\Navigate to the DESCRIPTOR tab. Note that since this is a Kafka Store, which allows for keys, this page allows the user to assign a Key Descriptor and/or a Value Descriptor. For a Kinesis Store and other Stores that don’t allow for keys, there would only be a Value Descriptor option available.\
Select the dropdown menu, and choose the relevant Descriptors to assign to this Entity. In our example, we choose to assign the
Pageviews
Descriptor for the Value Descriptor and leave the Key Descriptor empty.\Done! We’ve assigned our Descriptor to the relevant Entity, and now we can successfully run commands, such as PRINT ENTITY, and run queries with Relations using this Entity.\
Queries with Descriptors and ProtoBuf
With Descriptors added to our Entity, we can now create a Relation for our Entity specifying a key.format
or value.format
of PROTOBUF
as shown in the below DDL example. See CREATE STREAM for more details.
We can also create new Relations using CREATE STREAM AS SELECT or CREATE CHANGELOG AS SELECT specifying PROTOBUF
as the data format for the sink Relation. In the below example, we show how we can easily convert the JSON Stream pageviews_json
to a Stream with a ProtoBuf key and value format called pageviews_converted_to_proto
.
When the sink Relation has a key or value format of PROTOBUF
, the Descriptor for the sink Relation will automatically be created and assigned to the Entity. You can easily view your Descriptors in the left menu’s Descriptors tab or use the LIST DESCRIPTORS command in the CLI. Then if you want to use the Descriptor outside of DeltaStream, you can download the ProtoBuf descriptor by using the COPY DESCRIPTOR_SOURCE command.
One last quick note. In the case of the PRINT ENTITY
command, if an Entity in a Store has a Descriptor, the Descriptor will be used for deserialization even if the Store has a Schema Registry. If the Entity does not have a Descriptor, the Store will check if the Schema Registry contains a Schema for the Entity and use it for deserialization. Finally, if the Entity doesn’t have a Descriptor and the Store doesn’t have a Schema Registry—or it has a Schema Registry, but there is no corresponding schema in the registry—the data in the Entity will attempted to be deserialized as JSON.
Last updated