Compute Pools

Get real-time analytics from data stored in Iceberg tables.

A compute pool is a set of dedicated resources to run batch queries. Compute pools are Apache Spark clusters you use to perform real-time analytics on data you read from Iceberg tables. They are similar to Databricks’ all-purpose compute.

You create a compute pool much like you’d create any new DeltaStream object. Define it at the organization level and follow the same access control rules. Specific to compute pools, however, you also must select a pool size – S, M, or L. DeltaStream auto-configures and instantiates the pool based on your selection. You never manage or interact with Spark directly.

From there, you can execute queries, joins, and so on just as you would with any other data store in DeltaStream. (Joins currently apply only from Iceberg to Iceberg.) Depending on your use case, you can perform your analysis without using an external query engine such as AWS Athena or Trino. Also, as with DeltaStream data stores, you can create multiple compute pools per organization.

Note You do not need a compute pool if you are only writing to Iceberg – if, for example, you’re streaming filtered Kafka data into Iceberg tables. Compute pools are necessary only if you wish to read from/query Iceberg data.

Using SQL DDL with Compute Pools in DeltaStream

Create Compute_Pool

Update Compute_Pool

Create Store

List Compute_Pool

Drop Compute_Pool

Start Compute_Pool

Stop Compute_Pool

Last updated