# Iceberg AWS Glue Catalog

### **Iceberg AWS GLUE** <a href="#t9s7itc6qdp9" id="t9s7itc6qdp9"></a>

Apache Iceberg is a high-performance table format that supports large analytic tables.

This document walks through setting up Iceberg in DeltaStream using the AWS Glue catalog.

{% hint style="info" %}
**Note** Iceberg is unique in DeltaStream in that, if you plan on reading from or querying Iceberg data, it requires you also define an object called a **compute pool**. A compute pool is a set of dedicated resources for running batch queries.

You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. [More information on compute pools.](/overview/core-concepts/compute-pools.md)
{% endhint %}

### Before You Begin <a href="#id-88kjoidvndrz" id="id-88kjoidvndrz"></a>

Work with your internal engineering team to set up an AWS Glue account. You can start with the A[WS Glue documentation](https://docs.aws.amazon.com/glue/latest/dg/setting-up.html).

For this setup guide you must also have created a stream defined in DeltaStream labeled pageviews, which is backed by a topic in an Apache Kafka data Store. More [details on creating a stream in DeltaStream](/reference/sql-syntax/ddl/create-stream.md).

### Adding an Iceberg AWS GLUE data store <a href="#sgnhovkv8zzj" id="sgnhovkv8zzj"></a>

**To set up Iceberg AWS Glue**

1\. Log onto DeltaStream. In the lefthand navigation, click **Resources (** ![](/files/Zwq1BBdRyaRsv55N3KNm) ) and, when the list of data stores displays, click **+ Add Data Store**.

<figure><img src="/files/AcSaIz3ewENIxOgQnpFf" alt="" width="563"><figcaption></figcaption></figure>

2. When the **Choose a Data Store** window opens, click **Iceberg AWS Glue**.
3. Click **Next**. The **Add Data Store** window opens.

![](/files/vYhFRQbZdfmxb4NVMDhv)

3. Enter the requested authentication and connection values.

With the data store set up and the Kafka stream created, you can perform a simple filter on the `pageviews` stream and then sink the results back into Iceberg.

### Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg <a href="#w9qq5xy5zem9" id="w9qq5xy5zem9"></a>

Here we’re reading data from Kafka and writing to AWS Glue. This ensures we’re using the correct data store.

1. In the lefthand navigation, click **Workspace** ( ![](/files/ZXcAkgugP7AuG9QFRXKO) ).
2. In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from **pageviews** and output to a new table titled **pageviews\_iceberg\_rest**.

```sql
CREATE TABLE pageviews_iceberg_glue WITH (
'store' = 'iceberg_glue_store',
'iceberg.aws.glue.db.name' = 'gradient',
'iceberg.aws.glue.table.name' = 'pageviews_iceberg'
AS SELECT * FROM pageviews;
```

{% hint style="info" %}
**Notes**

* `iceberg.aws.glue.db.name` is required. It creates the sink table in your DB.
* `iceberg.aws.glue.table.name` is optional. If you do not specify a table name, DeltaStream uses the object name on the first line – in this case, pageviews\_iceberg\_glue.
  {% endhint %}

3. Click **Run**.

View the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click **Queries** ( ![](/files/HOEvY09XthGMf2h6wEx6) ).

{% hint style="info" %}
**Note** It may take a few moments for the query to transition into the **Running** state. Keep refreshing your screen until the query transitions.
{% endhint %}

To see more details about the status of the query, click the query row:

<figure><img src="/files/QqGAd0MPcTA6c5mj3Eae" alt="" width="563"><figcaption></figcaption></figure>

#### View the results <a href="#id-6ji5hutsgyrl" id="id-6ji5hutsgyrl"></a>

1. In the left-hand navigation, click **Resources (** ![](/files/Zwq1BBdRyaRsv55N3KNm) ). This displays a list of the existing data stores.
2. To view the new table created by the above CTAS, navigate to the `pageviews_iceberg` table.

To view a sample of the data in your Iceberg table, click **Print**.

![](/files/bMBqVB6VocuuNOuhclpp)

### Process Streaming Data From Your Iceberg Data Store <a href="#s717op33n5qa" id="s717op33n5qa"></a>

Now it’s time to query the data stored in Iceberg.

1. Define a `compute_pool` to be able to query the iceberg table from above.

```sql
CREATE COMPUTE_POOL mypool 
WITH ( 'compute_pool.size' = 'small', 'compute_pool.timeout_min' = 3600');
```

The above statement creates and starts the `compute_pool`. If this is the first `compute_pool` in the organization, DeltaStream sets it as your default pool.

2. Run a batch query.

```sql
SELECT * FROM pv_table limit 10;
```

### Inspect the Iceberg Data Store <a href="#id-4iwsm2n8ry1x" id="id-4iwsm2n8ry1x"></a>

1. In the lefthand navigation, click **Resources (** ![](/files/Zwq1BBdRyaRsv55N3KNm) ). This displays a list of the existing data stores.

<figure><img src="/files/CMPH5Xad2RgJWBtFQeCt" alt="" width="563"><figcaption></figcaption></figure>

2. Click the Iceberg Glue data store. The store page opens, displaying a list of any existing databases in your account.
3. (Optional) Create a new database. To do this:

* Click **+ Add Database.** When prompted, enter a name for the new database and click **Add**. The new database displays in the list.

4. To view the tables that exist under a particular database, click the database name.

### Clean up resources <a href="#m3xy0rksqxn3" id="m3xy0rksqxn3"></a>

<pre class="language-sql"><code class="lang-sql"><strong>STOP COMPUTE_POOL mypool;
</strong>TERMINATE QUERY &#x3C;QUERY-ID>;
</code></pre>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.deltastream.io/integrations/setting-up-data-store-integrations/iceberg-aws-glue-catalog.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
