Compute Pools
Get real-time analytics from data stored in Iceberg tables.
Last updated
Get real-time analytics from data stored in Iceberg tables.
Last updated
A compute pool is a set of dedicated resources to run batch queries. Compute pools are Apache Spark clusters you use to perform real-time analytics on data you read from Iceberg tables. They are similar to Databricks’ .
You create a compute pool much like you’d create any new DeltaStream object. Define it at the organization level and follow the same access control rules. Specific to compute pools, however, you also must select a pool size – S, M, or L. DeltaStream auto-configures and instantiates the pool based on your selection. You never manage or interact with Spark directly.
From there, you can execute queries, joins, and so on just as you would with any other data store in DeltaStream. (Joins currently apply only from Iceberg to Iceberg.) Depending on your use case, you can perform your analysis without using an external query engine such as AWS Athena or Trino. Also, as with DeltaStream data stores, you can create multiple compute pools per organization.