LogoLogo
Start Trial
  • Overview
    • What is DeltaStream?
    • Core Concepts
      • Access Control
      • Compute Pools
      • Data Store
      • Database
      • Function
      • Query
      • SQL
      • Visualizing Data Lineage
  • Getting Started
    • Free Trial Quick Start
    • Starting with the Web App
    • Starting with the CLI
  • How do I...?
    • Create and Manage Data Stores
      • Create Data Stores for Streaming Data
      • Explore Data Store and Topic Details
      • Use Multiple Data Stores in Queries
    • Manage Users and User Roles
      • Inviting Users to an Organization
      • Administering Users in your Organization
      • Using the CLI to Manage User Roles
      • Example: Setting Up Custom Roles for Production and Stage
    • Create DeltaStream Objects to Structure Raw Data
    • Use Namespacing for Organizing Data
    • Create and Query Materialized Views
    • Create a Compute Pool to Work with Iceberg
    • Create a Function
    • Secure my Connection to a Data Store
      • Introducing DeltaStream Private Links
      • Creating an AWS Private Link from DeltaStream to your Confluent Kafka Dedicated Cluster
      • Enabling Private Link Connectivity to Confluent Enterprise Cluster and Schema Registry
      • Creating a Private Link from DeltaStream to Amazon MSK
      • Creating a Private Link for RDS Databases
      • Deleting a Private Link
    • Serialize my Data
      • Working with ProtoBuf Serialized Data and DeltaStream Descriptors
      • Working with Avro Serialized Data and Schema Registries
      • Configuring Deserialization Error Handling
  • Integrations
    • Setting up Data Store Integrations
      • AWS S3
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Iceberg REST Catalog
      • PostgreSQL
      • Snowflake
      • WarpStream
  • Setting up Enterprise Security Integrations
    • Okta SAML Integration
    • Okta SCIM Integration
  • use cases
    • Using an AWS S3 Store as a Source to Feed an MSK Topic
  • Reference
    • Metrics
      • Prometheus Integration
      • Built-In Metrics
      • Custom Metrics in Functions
    • SQL Syntax
      • Data Formats (Serialization)
        • Serializing with JSON
        • Serializing with Primitive Data Types
        • Serializing with Protobuf
      • Data Types
      • Identifiers and Keywords
      • Command
        • ACCEPT INVITATION
        • CAN I
        • COPY DESCRIPTOR_SOURCE
        • COPY FUNCTION_SOURCE
        • DESCRIBE ENTITY
        • DESCRIBE QUERY
        • DESCRIBE QUERY METRICS
        • DESCRIBE QUERY EVENTS
        • DESCRIBE QUERY STATE
        • DESCRIBE RELATION
        • DESCRIBE RELATION COLUMNS
        • DESCRIBE ROLE
        • DESCRIBE SECURITY INTEGRATION
        • DESCRIBE <statement>
        • DESCRIBE STORE
        • DESCRIBE USER
        • GENERATE COLUMNS
        • GENERATE TEMPLATE
        • GRANT OWNERSHIP
        • GRANT PRIVILEGES
        • GRANT ROLE
        • INVITE USER
        • LIST API_TOKENS
        • LIST COMPUTE_POOLS
        • LIST DATABASES
        • LIST DESCRIPTORS
        • LIST DESCRIPTOR_SOURCES
        • LIST ENTITIES
        • LIST FUNCTIONS
        • LIST FUNCTION_SOURCES
        • LIST INVITATIONS
        • LIST METRICS INTEGRATIONS
        • LIST ORGANIZATIONS
        • LIST QUERIES
        • LIST RELATIONS
        • LIST ROLES
        • LIST SCHEMAS
        • LIST SCHEMA_REGISTRIES
        • LIST SECRETS
        • LIST SECURITY INTEGRATIONS
        • LIST STORES
        • LIST USERS
        • PRINT ENTITY
        • REJECT INVITATION
        • REVOKE INVITATION
        • REVOKE PRIVILEGES
        • REVOKE ROLE
        • SET DEFAULT
        • USE
        • START COMPUTE_POOL
        • STOP COMPUTE_POOL
      • DDL
        • ALTER API_TOKEN
        • ALTER SECURITY INTEGRATION
        • CREATE API_TOKEN
        • CREATE CHANGELOG
        • CREATE COMPUTE_POOL
        • CREATE DATABASE
        • CREATE DESCRIPTOR_SOURCE
        • CREATE ENTITY
        • CREATE FUNCTION_SOURCE
        • CREATE FUNCTION
        • CREATE INDEX
        • CREATE METRICS INTEGRATION
        • CREATE ORGANIZATION
        • CREATE ROLE
        • CREATE SCHEMA_REGISTRY
        • CREATE SCHEMA
        • CREATE SECRET
        • CREATE SECURITY INTEGRATION
        • CREATE STORE
        • CREATE STREAM
        • CREATE TABLE
        • DROP API_TOKEN
        • DROP CHANGELOG
        • DROP COMPUTE_POOL
        • DROP DATABASE
        • DROP DESCRIPTOR_SOURCE
        • DROP ENTITY
        • DROP FUNCTION_SOURCE
        • DROP FUNCTION
        • DROP METRICS INTEGRATION
        • DROP RELATION
        • DROP ROLE
        • DROP SCHEMA
        • DROP SCHEMA_REGISTRY
        • DROP SECRET
        • DROP SECURITY INTEGRATION
        • DROP STORE
        • DROP STREAM
        • DROP USER
        • START/STOP COMPUTE_POOL
        • UPDATE COMPUTE_POOL
        • UPDATE ENTITY
        • UPDATE SCHEMA_REGISTRY
        • UPDATE SECRET
        • UPDATE STORE
      • Query
        • APPLICATION
        • Change Data Capture (CDC)
        • CREATE CHANGELOG AS SELECT
        • CREATE STREAM AS SELECT
        • CREATE TABLE AS SELECT
        • Function
          • Built-in Functions
          • Row Metadata Functions
        • INSERT INTO
        • Materialized View
          • CREATE MATERIALIZED VIEW AS
          • SELECT (FROM MATERIALIZED VIEW)
        • Query Name and Version
        • Resume Query
        • RESTART QUERY
        • SELECT
          • FROM
          • JOIN
          • MATCH_RECOGNIZE
          • WITH (Common Table Expression)
        • TERMINATE QUERY
      • Sandbox
        • START SANDBOX
        • DESCRIBE SANDBOX
        • STOP SANDBOX
      • Row Key Definition
    • DeltaStream OpenAPI
      • Deltastream
      • Models
Powered by GitBook
On this page
  • Iceberg AWS GLUE
  • Before You Begin
  • Adding an Iceberg AWS GLUE data store
  • Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg
  • Process Streaming Data From Your Iceberg Data Store
  • Inspect the Iceberg Data Store
  • Clean up resources
  1. Integrations
  2. Setting up Data Store Integrations

Iceberg AWS Glue Catalog

Last updated 25 days ago

Iceberg AWS GLUE

Apache Iceberg is a high-performance table format that supports large analytic tables.

This document walks through setting up Iceberg in DeltaStream using the AWS Glue catalog.

Note Iceberg is unique in DeltaStream in that, if you plan on reading from or querying Iceberg data, it requires you also define an object called a compute pool. A compute pool is a set of dedicated resources for running batch queries.

You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables.

Before You Begin

Work with your internal engineering team to set up an AWS Glue account. You can start with the A.

For this setup guide you must also have created a stream defined in DeltaStream labeled pageviews, which is backed by a topic in an Apache Kafka data Store. More .

Adding an Iceberg AWS GLUE data store

To set up Iceberg AWS Glue

1. Log onto DeltaStream. In the lefthand navigation, click Resources ( ) and, when the list of data stores displays, click + Add Data Store.

  1. When the Choose a Data Store window opens, click Iceberg AWS Glue.

  2. Click Next. The Add Data Store window opens.

  1. Enter the requested authentication and connection values.

With the data store set up and the Kafka stream created, you can perform a simple filter on the pageviews stream and then sink the results back into Iceberg.

Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg

Here we’re reading data from Kafka and writing to AWS Glue. This ensures we’re using the correct data store.

  1. In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from pageviews and output to a new table titled pageviews_iceberg_rest.

CREATE TABLE pageviews_iceberg_glue WITH (
'store' = 'iceberg_glue_store',
'iceberg.aws.glue.db.name' = 'gradient',
'iceberg.aws.glue.table.name' = 'pageviews_iceberg'
AS SELECT * FROM pageviews;

Notes

  • iceberg.aws.glue.db.name is required. It creates the sink table in your DB.

  • iceberg.aws.glue.table.name is optional. If you do not specify a table name, DeltaStream uses the object name on the first line – in this case, pageviews_iceberg_glue.

  1. Click Run.

Note It may take a few moments for the query to transition into the Running state. Keep refreshing your screen until the query transitions.

To see more details about the status of the query, click the query row:

View the results

  1. To view the new table created by the above CTAS, navigate to the pageviews_iceberg table.

To view a sample of the data in your Iceberg table, click Print.

Process Streaming Data From Your Iceberg Data Store

Now it’s time to query the data stored in Iceberg.

  1. Define a compute_pool to be able to query the iceberg table from above.

CREATE COMPUTE_POOL mypool 
WITH ( 'compute_pool.size' = 'small', 'compute_pool.timeout_min' = 3600');

The above statement creates and starts the compute_pool. If this is the first compute_pool in the organization, DeltaStream sets it as your default pool.

  1. Run a batch query.

SELECT * FROM pv_table limit 10;

Inspect the Iceberg Data Store

  1. Click the Iceberg Glue data store. The store page opens, displaying a list of any existing databases in your account.

  2. (Optional) Create a new database. To do this:

  • Click + Add Database. When prompted, enter a name for the new database and click Add. The new database displays in the list.

  1. To view the tables that exist under a particular database, click the database name.

Clean up resources

STOP COMPUTE_POOL mypool;
TERMINATE QUERY <QUERY-ID>;

In the lefthand navigation, click Workspace ( ).

View the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click Queries ( ).

In the left-hand navigation, click Resources ( ). This displays a list of the existing data stores.

In the lefthand navigation, click Resources ( ). This displays a list of the existing data stores.

More information on compute pools.
WS Glue documentation
details on creating a stream in DeltaStream