# Iceberg REST Catalog

### **Iceberg REST Catalog** <a href="#t9s7itc6qdp9" id="t9s7itc6qdp9"></a>

Apache Iceberg is a high-performance table format that supports large analytic tables. An Apache Iceberg REST catalog is a service for managing and accessing Iceberg tables in a consistent way. It allows clients to interact with Iceberg table metadata without requiring direct access to the underlying storage. This enables multiple clients to safely use the same Iceberg tables.

This document walks through setting up an Iceberg catalog in DeltaStream.

{% hint style="info" %}
**Note** Iceberg is unique in DeltaStream in that, if you plan on reading from or querying Iceberg data, it requires you also define an object called a **compute pool**. A compute pool is a set of dedicated resources for running batch queries.
{% endhint %}

You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. [More information on compute pools](https://docs.deltastream.io/overview/core-concepts/compute-pools).

For the purposes of this tutorial we will use a REST catalog provided by Snowflake, but any compliant implementation will work.

### Before you Begin <a href="#id-88kjoidvndrz" id="id-88kjoidvndrz"></a>

1. Work with your internal engineering team to set up a Snowflake environment. You can start with the [Snowflake Open Catalog tutorial](https://other-docs.snowflake.com/opencatalog/tutorials/open-catalog-gs). Go through the overview and complete the Snowflake environment setup instructions. At that point you will have the following values:
   1. \`client\_id\`
   2. \`client\_secret\`
   3. \`principal\_role\_name\`
   4. \`catalog\_name\`
   5. \`open\_catalog\_account\_identifier\`
   6. S3 region that your storage bucket is located

2\. For this setup guide you must also have created a stream defined in DeltaStream named pageviews, which is backed by a topic in an Apache Kafka data store. [More details on creating a stream in DeltaStream.](https://docs.deltastream.io/reference/sql-syntax/ddl/create-stream)

### Adding an Iceberg REST data store <a href="#sgnhovkv8zzj" id="sgnhovkv8zzj"></a>

**To set up Iceberg REST**

1\. Log onto DeltaStream. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ) to display a list of data stores in your organization.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FvU1DotsaOaJ0sQZpLHM3%2FStoreList3Stores.png?alt=media&#x26;token=bbf809f6-6e31-48fb-adda-0404ad6f3c91" alt="" width="563"><figcaption></figcaption></figure>

2\. Click **+ Add Data Store**. When the **Choose a Data Store** window opens, click **Iceberg Rest**. The **Add Data Store** window opens for Iceberg REST.

![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FQ4rtAyv4Ki53YhiA8lPN%2FAddIcebergRESTStore.png?alt=media\&token=a9a6cc6d-04da-41cf-95d5-fa929c302836)

3\. Enter the required authentication and connection values. These include:

* **Name**. We suggest a self-describing name, such as `iceberg_rest`.
* **S3 Region**. The region where your AWS S3 bucket resides.
* **Catalog ID**.
* **URIs**.
* **Scope**.
* **Client ID**.

{% hint style="info" %}
**Note** You can also use the DeltaStream CLI to create an Iceberg\_REST data store. To do this, run the below statement:

```sql
CREATE STORE opencatalog WITH (
'type'=iceberg_rest,
'uris' = 'https://<opencatalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog',
'iceberg.catalog.id' = '<catalog_name>',
'iceberg.rest.client_id' = '<client_id>',
'iceberg.rest.client_secret' = '<client_secret>',
'iceberg.rest.scope' = 'PRINCIPAL_ROLE:<principal_role_name>',
'iceberg.rest.s3.region'='<my s3 region>');
```

{% endhint %}

4\. Inspect the data store to see the namespaces available within your REST catalog. To do this, navigate to **Workspace** and then examine the newly-created data store.

{% hint style="success" %}
**Tip** When you view entities under a REST catalog data store, DeltaStream displays namespaces and tables, as shown below:
{% endhint %}

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FYcn63yEBAWtAJY7A3E5m%2Fimage.png?alt=media&#x26;token=7709f1bb-d5dc-4b19-9367-f6c62f6e2f7f" alt="" width="563"><figcaption></figcaption></figure>

5. Create a namespace in opencatalog for the namespace to live in. To do this, return to the workspace to verify you can use your REST catalog. Run `` `CREATE ENTITY mynamespace;` `` -This command creates a namespace called `mynamespace` under your REST catalog.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FGDchDUGF4pQuDG5SsIIb%2F4.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

### Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg <a href="#w9qq5xy5zem9" id="w9qq5xy5zem9"></a>

1. In the lefthand navigation, click **Workspace** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F5EudXKVDSWZLByUuGbyy%2F5.png?alt=media) ).
2. In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from **pageviews** and output to a new table titled `pageviews_iceberg_rest`.

```sql
CREATE TABLE pageviews_iceberg_rest WITH (
'store' = 'opencatalog',
'iceberg.rest.catalog.namespace.name' = 'mynamespace',
'iceberg.rest.catalog.table.name' = 'pageviews_iceberg')
AS SELECT * FROM pageviews;
```

3. Click **Run**.

The above statement performs several functions:

* Creates a DeltaStream relation called `pageviews_iceberg_rest` .This relation can be used by other queries
* Creates a table in the underlying REST catalog in the namespace called `mynamespace`.
* Creates a long running query that reads data from Kafka and sinks to an Iceberg table.

4. Now view the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click **Queries** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FoeV9owwjCxALYll5NfSJ%2F6.png?alt=media) ).

{% hint style="info" %}
**Note** It may take a few moments for the query to transition into the **Running** state. Keep refreshing your screen until the query transitions.
{% endhint %}

To see more details about the status of the query, click the query row:

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FXdpiDwXtvm2OFqp3MV10%2F7.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

#### View the results <a href="#id-6ji5hutsgyrl" id="id-6ji5hutsgyrl"></a>

1. In the left-hand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.
2. To view the new table created by the above CTAS, navigate to **opencatalog** → **mynamespace** → **pageviews\_iceberg\_rest**.

To view a sample of the data in your Iceberg table, click **Print**.

![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FAiSK1y44qNgfSSuJmBWO%2F9.png?alt=media)

### Process Streaming Data From Your Iceberg Data Store <a href="#s717op33n5qa" id="s717op33n5qa"></a>

Now it’s time to query the data stored in Iceberg. To do this:

1. Define a `compute_pool` to be able to query the iceberg table from above. Navigate to **Resources > Compute Pools**, and then click **+ Add Compute Pool.**

![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FkvzpIwldTjXa4iKrd7RD%2FAddComputePoolRevised.png?alt=media\&token=e828f92a-5b93-46be-ad0b-460e42527285)

If this is the first `compute_pool` in the organization, DeltaStream sets it as your default pool.

2. Navigate to your DeltaStream workspace and run the following command:

```sql
`SELECT * FROM pageviews_iceberg_rest LIMIT 10;`
```

### Inspect the Iceberg Data Store <a href="#id-4iwsm2n8ry1x" id="id-4iwsm2n8ry1x"></a>

1. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FvU1DotsaOaJ0sQZpLHM3%2FStoreList3Stores.png?alt=media&#x26;token=bbf809f6-6e31-48fb-adda-0404ad6f3c91" alt="" width="563"><figcaption></figcaption></figure>

2. Click **opencatalog**. The store page opens, displaying a list of namespaces and tables.

### Clean up resources <a href="#m3xy0rksqxn3" id="m3xy0rksqxn3"></a>

```sql
STOP COMPUTE_POOL my pool;
TERMINATE QUERY <QUERY-ID);
```
