Iceberg AWS Glue Catalog
Last updated
Last updated
Apache Iceberg is a high-performance table format that supports large analytic tables.
This document walks through setting up Iceberg in DeltaStream using the AWS Glue catalog.
Work with your internal engineering team to set up an AWS Glue account. You can start with the A.
For this setup guide you must also have created a stream defined in DeltaStream labeled pageviews, which is backed by a topic in an Apache Kafka data Store. More .
To set up Iceberg AWS Glue
1. Log onto DeltaStream. In the lefthand navigation, click Resources ( ) and, when the list of data stores displays, click + Add Data Store.
When the Choose a Data Store window opens, click Iceberg AWS Glue.
Click Next. The Add Data Store window opens.
Enter the requested authentication and connection values.
With the data store set up and the Kafka stream created, you can perform a simple filter on the pageviews
stream and then sink the results back into Iceberg.
Here we’re reading data from Kafka and writing to AWS Glue. This ensures we’re using the correct data store.
In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from pageviews and output to a new table titled pageviews_iceberg_rest.
Click Run.
To see more details about the status of the query, click the query row:
To view the new table created by the above CTAS, navigate to the pageviews_iceberg
table.
To view a sample of the data in your Iceberg table, click Print.
Now it’s time to query the data stored in Iceberg.
Define a compute_pool
to be able to query the iceberg table from above.
The above statement creates and starts the compute_pool
. If this is the first compute_pool
in the organization, DeltaStream sets it as your default pool.
Run a batch query.
Click the Iceberg Glue data store. The store page opens, displaying a list of any existing databases in your account.
(Optional) Create a new database. To do this:
Click + Add Database. When prompted, enter a name for the new database and click Add. The new database displays in the list.
To view the tables that exist under a particular database, click the database name.
In the lefthand navigation, click Workspace ( ).
View the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click Queries ( ).
In the left-hand navigation, click Resources ( ). This displays a list of the existing data stores.
In the lefthand navigation, click Resources ( ). This displays a list of the existing data stores.