LogoLogo
Start Trial
  • Overview
    • What is DeltaStream?
    • Core Concepts
      • Access Control
      • Compute Pools
      • Data Store
      • Database
      • Function
      • Query
      • SQL
      • Visualizing Data Lineage
  • Getting Started
    • Free Trial Quick Start
    • Starting with the Web App
    • Starting with the CLI
  • How do I...?
    • Create and Manage Data Stores
      • Create Data Stores for Streaming Data
      • Explore Data Store and Topic Details
      • Use Multiple Data Stores in Queries
    • Manage Users and User Roles
      • Inviting Users to an Organization
      • Administering Users in your Organization
      • Using the CLI to Manage User Roles
      • Example: Setting Up Custom Roles for Production and Stage
    • Create DeltaStream Objects to Structure Raw Data
    • Use Namespacing for Organizing Data
    • Create and Query Materialized Views
    • Create a Compute Pool to Work with Iceberg
    • Create a Function
    • Secure my Connection to a Data Store
      • Introducing DeltaStream Private Links
      • Creating an AWS Private Link from DeltaStream to your Confluent Kafka Dedicated Cluster
      • Enabling Private Link Connectivity to Confluent Enterprise Cluster and Schema Registry
      • Creating a Private Link from DeltaStream to Amazon MSK
      • Creating a Private Link for RDS Databases
      • Deleting a Private Link
    • Serialize my Data
      • Working with ProtoBuf Serialized Data and DeltaStream Descriptors
      • Working with Avro Serialized Data and Schema Registries
      • Configuring Deserialization Error Handling
  • Integrations
    • Setting up Data Store Integrations
      • AWS S3
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Iceberg REST Catalog
      • PostgreSQL
      • Snowflake
      • WarpStream
  • Setting up Enterprise Security Integrations
    • Okta SAML Integration
    • Okta SCIM Integration
  • use cases
    • Using an AWS S3 Store as a Source to Feed an MSK Topic
  • Reference
    • Metrics
      • Prometheus Integration
      • Built-In Metrics
      • Custom Metrics in Functions
    • SQL Syntax
      • Data Formats (Serialization)
        • Serializing with JSON
        • Serializing with Primitive Data Types
        • Serializing with Protobuf
      • Data Types
      • Identifiers and Keywords
      • Command
        • ACCEPT INVITATION
        • CAN I
        • COPY DESCRIPTOR_SOURCE
        • COPY FUNCTION_SOURCE
        • DESCRIBE ENTITY
        • DESCRIBE QUERY
        • DESCRIBE QUERY METRICS
        • DESCRIBE QUERY EVENTS
        • DESCRIBE QUERY STATE
        • DESCRIBE RELATION
        • DESCRIBE RELATION COLUMNS
        • DESCRIBE ROLE
        • DESCRIBE SECURITY INTEGRATION
        • DESCRIBE <statement>
        • DESCRIBE STORE
        • DESCRIBE USER
        • GENERATE COLUMNS
        • GENERATE TEMPLATE
        • GRANT OWNERSHIP
        • GRANT PRIVILEGES
        • GRANT ROLE
        • INVITE USER
        • LIST API_TOKENS
        • LIST COMPUTE_POOLS
        • LIST DATABASES
        • LIST DESCRIPTORS
        • LIST DESCRIPTOR_SOURCES
        • LIST ENTITIES
        • LIST FUNCTIONS
        • LIST FUNCTION_SOURCES
        • LIST INVITATIONS
        • LIST METRICS INTEGRATIONS
        • LIST ORGANIZATIONS
        • LIST QUERIES
        • LIST RELATIONS
        • LIST ROLES
        • LIST SCHEMAS
        • LIST SCHEMA_REGISTRIES
        • LIST SECRETS
        • LIST SECURITY INTEGRATIONS
        • LIST STORES
        • LIST USERS
        • PRINT ENTITY
        • REJECT INVITATION
        • REVOKE INVITATION
        • REVOKE PRIVILEGES
        • REVOKE ROLE
        • SET DEFAULT
        • USE
        • START COMPUTE_POOL
        • STOP COMPUTE_POOL
      • DDL
        • ALTER API_TOKEN
        • ALTER SECURITY INTEGRATION
        • CREATE API_TOKEN
        • CREATE CHANGELOG
        • CREATE COMPUTE_POOL
        • CREATE DATABASE
        • CREATE DESCRIPTOR_SOURCE
        • CREATE ENTITY
        • CREATE FUNCTION_SOURCE
        • CREATE FUNCTION
        • CREATE INDEX
        • CREATE METRICS INTEGRATION
        • CREATE ORGANIZATION
        • CREATE ROLE
        • CREATE SCHEMA_REGISTRY
        • CREATE SCHEMA
        • CREATE SECRET
        • CREATE SECURITY INTEGRATION
        • CREATE STORE
        • CREATE STREAM
        • CREATE TABLE
        • DROP API_TOKEN
        • DROP CHANGELOG
        • DROP COMPUTE_POOL
        • DROP DATABASE
        • DROP DESCRIPTOR_SOURCE
        • DROP ENTITY
        • DROP FUNCTION_SOURCE
        • DROP FUNCTION
        • DROP METRICS INTEGRATION
        • DROP RELATION
        • DROP ROLE
        • DROP SCHEMA
        • DROP SCHEMA_REGISTRY
        • DROP SECRET
        • DROP SECURITY INTEGRATION
        • DROP STORE
        • DROP STREAM
        • DROP USER
        • START/STOP COMPUTE_POOL
        • UPDATE COMPUTE_POOL
        • UPDATE ENTITY
        • UPDATE SCHEMA_REGISTRY
        • UPDATE SECRET
        • UPDATE STORE
      • Query
        • APPLICATION
        • Change Data Capture (CDC)
        • CREATE CHANGELOG AS SELECT
        • CREATE STREAM AS SELECT
        • CREATE TABLE AS SELECT
        • Function
          • Built-in Functions
          • Row Metadata Functions
        • INSERT INTO
        • Materialized View
          • CREATE MATERIALIZED VIEW AS
          • SELECT (FROM MATERIALIZED VIEW)
        • Query Name and Version
        • Resume Query
        • RESTART QUERY
        • SELECT
          • FROM
          • JOIN
          • MATCH_RECOGNIZE
          • WITH (Common Table Expression)
        • TERMINATE QUERY
      • Sandbox
        • START SANDBOX
        • DESCRIBE SANDBOX
        • STOP SANDBOX
      • Row Key Definition
    • DeltaStream OpenAPI
      • Deltastream
      • Models
Powered by GitBook
On this page
  • Iceberg REST Catalog
  • Before you Begin
  • Adding an Iceberg REST data store
  • Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg
  • Process Streaming Data From Your Iceberg Data Store
  • Inspect the Iceberg Data Store
  • Clean up resources
  1. Integrations
  2. Setting up Data Store Integrations

Iceberg REST Catalog

PreviousDatabricksNextPostgreSQL

Last updated 7 days ago

Iceberg REST Catalog

Apache Iceberg is a high-performance table format that supports large analytic tables. An Apache Iceberg REST catalog is a service for managing and accessing Iceberg tables in a consistent way. It allows clients to interact with Iceberg table metadata without requiring direct access to the underlying storage. This enables multiple clients to safely use the same Iceberg tables.

This document walks through setting up an Iceberg catalog in DeltaStream.

Note Iceberg is unique in DeltaStream in that, if you plan on reading from or querying Iceberg data, it requires you also define an object called a compute pool. A compute pool is a set of dedicated resources for running batch queries.

You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. .

For the purposes of this tutorial we will use a REST catalog provided by Snowflake, but any compliant implementation will work.

Before you Begin

  1. Work with your internal engineering team to set up a Snowflake environment. You can start with the . Go through the overview and complete the Snowflake environment setup instructions. At that point you will have the following values:

    1. `client_id`

    2. `client_secret`

    3. `principal_role_name`

    4. `catalog_name`

    5. `open_catalog_account_identifier`

    6. S3 region that your storage bucket is located

2. For this setup guide you must also have created a stream defined in DeltaStream named pageviews, which is backed by a topic in an Apache Kafka data store.

Adding an Iceberg REST data store

To set up Iceberg REST

2. Click + Add Data Store. When the Choose a Data Store window opens, click Iceberg Rest. The Add Data Store window opens for Iceberg REST.

3. Enter the required authentication and connection values. These include:

  • Name. We suggest a self-describing name, such as iceberg_rest.

  • S3 Region. The region where your AWS S3 bucket resides.

  • Catalog ID.

  • URIs.

  • Scope.

  • Client ID.

Note You can also use the DeltaStream CLI to create an Iceberg_REST data store. To do this, run the below statement:

CREATE STORE opencatalog WITH (
'type'=iceberg_rest,
'uris' = 'https://<opencatalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog',
'iceberg.catalog.id' = '<catalog_name>',
'iceberg.rest.client_id' = '<client_id>',
'iceberg.rest.client_secret' = '<client_secret>',
'iceberg.rest.scope' = 'PRINCIPAL_ROLE:<principal_role_name>',
'iceberg.rest.s3.region'='<my s3 region>');

4. Inspect the data store to see the namespaces available within your REST catalog. To do this, navigate to Workspace and then examine the newly-created data store.

Tip When you view entities under a REST catalog data store, DeltaStream displays namespaces and tables, as shown below:

  1. Create a namespace in opencatalog for the namespace to live in. To do this, return to the workspace to verify you can use your REST catalog. Run `CREATE ENTITY mynamespace;` -This command creates a namespace called mynamespace under your REST catalog.

Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into Iceberg

  1. In the SQL pane of your workspace, write the CREATE TABLE AS SELECT (CTAS) query to ingest from pageviews and output to a new table titled pageviews_iceberg_rest.

CREATE TABLE pageviews_iceberg_rest WITH (
'store' = 'opencatalog',
'iceberg.rest.catalog.namespace.name' = 'mynamespace',
'iceberg.rest.catalog.table.name' = 'pageviews_iceberg')
AS SELECT * FROM pageviews;
  1. Click Run.

The above statement performs several functions:

  • Creates a DeltaStream relation called pageviews_iceberg_rest .This relation can be used by other queries

  • Creates a table in the underlying REST catalog in the namespace called mynamespace.

  • Creates a long running query that reads data from Kafka and sinks to an Iceberg table.

Note It may take a few moments for the query to transition into the Running state. Keep refreshing your screen until the query transitions.

To see more details about the status of the query, click the query row:

View the results

  1. To view the new table created by the above CTAS, navigate to opencatalog → mynamespace → pageviews_iceberg_rest.

To view a sample of the data in your Iceberg table, click Print.

Process Streaming Data From Your Iceberg Data Store

Now it’s time to query the data stored in Iceberg. To do this:

  1. Define a compute_pool to be able to query the iceberg table from above. Navigate to Resources > Compute Pools, and then click + Add Compute Pool.

If this is the first compute_pool in the organization, DeltaStream sets it as your default pool.

  1. Navigate to your DeltaStream workspace and run the following command:

`SELECT * FROM pageviews_iceberg_rest LIMIT 10;`

Inspect the Iceberg Data Store

  1. Click opencatalog. The store page opens, displaying a list of namespaces and tables.

Clean up resources

STOP COMPUTE_POOL my pool;
TERMINATE QUERY <QUERY-ID);

1. Log onto DeltaStream. In the lefthand navigation, click Resources ( ) to display a list of data stores in your organization.

In the lefthand navigation, click Workspace ( ).

Now view the existing queries, including the query from the step immediately prior. To do this, in the left-hand navigation click Queries ( ).

In the left-hand navigation, click Resources ( ). This displays a list of the existing data stores.

In the lefthand navigation, click Resources ( ). This displays a list of the existing data stores.

More information on compute pools
Snowflake Open Catalog tutorial
More details on creating a stream in DeltaStream.