Iceberg REST Catalog
Last updated
Last updated
Apache Iceberg is a high-performance table format that supports large analytic tables. An Apache Iceberg REST catalog is a service for managing and accessing Iceberg tables in a consistent way. It allows clients to interact with Iceberg table metadata without requiring direct access to the underlying storage. This enables multiple clients to safely use the same Iceberg tables.
This document walks through setting up an Iceberg catalog in DeltaStream.
You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. .
For the purposes of this tutorial we will use a REST catalog provided by Snowflake, but any compliant implementation will work.
Work with your internal engineering team to set up a Snowflake environment. You can start with the . Go through the overview and complete the Snowflake environment setup instructions. At that point you will have the following values:
`client_id`
`client_secret`
`principal_role_name`
`catalog_name`
`open_catalog_account_identifier`
S3 region that your storage bucket is located
2. For this setup guide you must also have created a stream defined in DeltaStream named pageviews, which is backed by a topic in an Apache Kafka data store.
To set up Iceberg REST
2. Click + Add Data Store. When the Choose a Data Store window opens, click Iceberg Rest. The Add Data Store window opens for Iceberg REST.
3. Enter the required authentication and connection values. These include:
Name. We suggest a self-describing name, such as iceberg_rest
.
S3 Region. The region where your AWS S3 bucket resides.
Catalog ID.
URIs.
Scope.
Client ID.
4. Inspect the data store to see the namespaces available within your REST catalog. To do this, navigate to Workspace and then examine the newly-created data store.
Tip When you view entities under a REST catalog data store, DeltaStream displays namespaces and tables, as shown below:
Create a namespace in opencatalog for the namespace to live in. To do this, return to the workspace to verify you can use your REST catalog. Run `CREATE ENTITY mynamespace;`
-This command creates a namespace called mynamespace
under your REST catalog.
In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from pageviews and output to a new table titled pageviews_iceberg_rest
.
Click Run.
The above statement performs several functions:
Creates a DeltaStream relation called pageviews_iceberg_rest
.This relation can be used by other queries
Creates a table in the underlying REST catalog in the namespace called mynamespace
.
Creates a long running query that reads data from Kafka and sinks to an Iceberg table.
To see more details about the status of the query, click the query row:
To view the new table created by the above CTAS, navigate to opencatalog → mynamespace → pageviews_iceberg_rest.
To view a sample of the data in your Iceberg table, click Print.
Now it’s time to query the data stored in Iceberg. To do this:
Define a compute_pool
to be able to query the iceberg table from above. Navigate to Resources > Compute Pools, and then click + Add Compute Pool.
If this is the first compute_pool
in the organization, DeltaStream sets it as your default pool.
Navigate to your DeltaStream workspace and run the following command:
Click opencatalog. The store page opens, displaying a list of namespaces and tables.
1. Log onto DeltaStream. In the lefthand navigation, click Resources ( ) to display a list of data stores in your organization.
In the lefthand navigation, click Workspace ( ).
Now view the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click Queries ( ).
In the left-hand navigation, click Resources ( ). This displays a list of the existing data stores.
In the lefthand navigation, click Resources ( ). This displays a list of the existing data stores.