Namespacing with Database and Schema
Much like the relational databases, Databases and #_schemas in DeltaStream are used for logically organizing data, regardless of the Store the data resides in.
This logical separation can be used to namespace Query statements based on the scope of the query and what Stores are involved. This is specially very useful where Stores are accessible by more than one team and collaboration on data is required.
In DeltaStream, all #_relations are hosted in a Database and Schema. So in this tutorial we look at how we can use this to organize our data.
Create a Database
Besides Creating Stores for Streaming Data, we also need to create a Database to process any data.
From the left menu, click Catalog > New Database:
Give the Database a unique name:
Click SAVE.
See LIST DATABASES for available Databases.
Using Database Schemas
Schemas can be used to further organize Relations within a Database, making it possible to manage complex projects.
Let's say we need to refine our pageviews
events before handing it off to the Business Analytics team for further analysis. First, we aggregate the page visits for each user and write that into a my-db.analytics.user_visits
Stream:
Then, we can use the new page_visits
stream and enrich it with the user’s latest location for a full picture on how our users are visiting pages on our website. Here, we use our users’ #_changelog Relation to get the location information. We finally write this into the Analytics team’s public Schema in the analytics-db
, analytics-ds.public.user_visit_location
:
Taming the Chaos
Using Databases and Schemas, we can reduce the clutter in the analytics-db
by first refining the necessary data in the my-db
database, then using the refinement to push an enriched set of records to the public
Schema of the analytics-db
.
It is recommended to use this namespacing approach to organize where data is written into and resides, while ensuring that the right consumer has access to it across our company.
Last updated