# Iceberg AWS Glue Catalog

### **Iceberg AWS GLUE** <a href="#t9s7itc6qdp9" id="t9s7itc6qdp9"></a>

Apache Iceberg is a high-performance table format that supports large analytic tables.

This document walks through setting up Iceberg in DeltaStream using the AWS Glue catalog.

{% hint style="info" %}
**Note** Iceberg is unique in DeltaStream in that, if you plan on reading from or querying Iceberg data, it requires you also define an object called a **compute pool**. A compute pool is a set of dedicated resources for running batch queries.

You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. [More information on compute pools.](https://docs.deltastream.io/overview/core-concepts/compute-pools)
{% endhint %}

### Before You Begin <a href="#id-88kjoidvndrz" id="id-88kjoidvndrz"></a>

Work with your internal engineering team to set up an AWS Glue account. You can start with the A[WS Glue documentation](https://docs.aws.amazon.com/glue/latest/dg/setting-up.html).

For this setup guide you must also have created a stream defined in DeltaStream labeled pageviews, which is backed by a topic in an Apache Kafka data Store. More [details on creating a stream in DeltaStream](https://docs.deltastream.io/reference/sql-syntax/ddl/create-stream).

### Adding an Iceberg AWS GLUE data store <a href="#sgnhovkv8zzj" id="sgnhovkv8zzj"></a>

**To set up Iceberg AWS Glue**

1\. Log onto DeltaStream. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ) and, when the list of data stores displays, click **+ Add Data Store**.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F0H5J0rQh2m0QFQa43vJv%2F1.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

2. When the **Choose a Data Store** window opens, click **Iceberg AWS Glue**.
3. Click **Next**. The **Add Data Store** window opens.

![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FYAwxlkobAFWTO7vLHL5g%2FAddIcebergGlueDataStore.png?alt=media\&token=43244c95-2008-423e-b3ef-539b5b277704)

3. Enter the requested authentication and connection values.

With the data store set up and the Kafka stream created, you can perform a simple filter on the `pageviews` stream and then sink the results back into Iceberg.

### Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg <a href="#w9qq5xy5zem9" id="w9qq5xy5zem9"></a>

Here we’re reading data from Kafka and writing to AWS Glue. This ensures we’re using the correct data store.

1. In the lefthand navigation, click **Workspace** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F5EudXKVDSWZLByUuGbyy%2F5.png?alt=media) ).
2. In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from **pageviews** and output to a new table titled **pageviews\_iceberg\_rest**.

```sql
CREATE TABLE pageviews_iceberg_glue WITH (
'store' = 'iceberg_glue_store',
'iceberg.aws.glue.db.name' = 'gradient',
'iceberg.aws.glue.table.name' = 'pageviews_iceberg'
AS SELECT * FROM pageviews;
```

{% hint style="info" %}
**Notes**

* `iceberg.aws.glue.db.name` is required. It creates the sink table in your DB.
* `iceberg.aws.glue.table.name` is optional. If you do not specify a table name, DeltaStream uses the object name on the first line – in this case, pageviews\_iceberg\_glue.
  {% endhint %}

3. Click **Run**.

View the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click **Queries** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FoeV9owwjCxALYll5NfSJ%2F6.png?alt=media) ).

{% hint style="info" %}
**Note** It may take a few moments for the query to transition into the **Running** state. Keep refreshing your screen until the query transitions.
{% endhint %}

To see more details about the status of the query, click the query row:

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FXdpiDwXtvm2OFqp3MV10%2F7.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

#### View the results <a href="#id-6ji5hutsgyrl" id="id-6ji5hutsgyrl"></a>

1. In the left-hand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.
2. To view the new table created by the above CTAS, navigate to the `pageviews_iceberg` table.

To view a sample of the data in your Iceberg table, click **Print**.

![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FAiSK1y44qNgfSSuJmBWO%2F9.png?alt=media)

### Process Streaming Data From Your Iceberg Data Store <a href="#s717op33n5qa" id="s717op33n5qa"></a>

Now it’s time to query the data stored in Iceberg.

1. Define a `compute_pool` to be able to query the iceberg table from above.

```sql
CREATE COMPUTE_POOL mypool 
WITH ( 'compute_pool.size' = 'small', 'compute_pool.timeout_min' = 3600');
```

The above statement creates and starts the `compute_pool`. If this is the first `compute_pool` in the organization, DeltaStream sets it as your default pool.

2. Run a batch query.

```sql
SELECT * FROM pv_table limit 10;
```

### Inspect the Iceberg Data Store <a href="#id-4iwsm2n8ry1x" id="id-4iwsm2n8ry1x"></a>

1. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FvU1DotsaOaJ0sQZpLHM3%2FStoreList3Stores.png?alt=media&#x26;token=bbf809f6-6e31-48fb-adda-0404ad6f3c91" alt="" width="563"><figcaption></figcaption></figure>

2. Click the Iceberg Glue data store. The store page opens, displaying a list of any existing databases in your account.
3. (Optional) Create a new database. To do this:

* Click **+ Add Database.** When prompted, enter a name for the new database and click **Add**. The new database displays in the list.

4. To view the tables that exist under a particular database, click the database name.

### Clean up resources <a href="#m3xy0rksqxn3" id="m3xy0rksqxn3"></a>

<pre class="language-sql"><code class="lang-sql"><strong>STOP COMPUTE_POOL mypool;
</strong>TERMINATE QUERY &#x3C;QUERY-ID>;
</code></pre>
