# ClickHouse

[ClickHouse](https://clickhouse.com/) is a fast and resource-efficient real-time data warehouse and open-source database.

This document walks through setting up ClickHouse to be used as a data store in DeltaStream.

Step through the following demo to get an overview. Detailed instructions are below the demo.

{% @storylane/embed subdomain="deltastream" url="<https://deltastream.storylane.io/share/qhiujgcjlwmq>" linkValue="qhiujgcjlwmq" %}

## Setting up a ClickHouse Account

To set up your ClickHouse Cloud account, please review the [ClickHouse Cloud Quick Start](https://clickhouse.com/docs/cloud/get-started/cloud-quick-start) guide.

## Adding ClickHouse as a DeltaStream Data Store

1. Open DeltaStream. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ) and towards the top right click + **Add Data Store**.
2. When the **Choose a Data Store** window opens, click **ClickHouse** and then click **Next**.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F8KTcU32MNR8uRbjbF332%2FChooseADataStoreCHChosen.png?alt=media&#x26;token=382eeb88-300d-4c03-9fe5-17f21d2f85b3" alt="" width="307"><figcaption></figcaption></figure>

3\. The **Add Data Store** window displays.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FtJBXAc3JPqGUqDYu5kXY%2FAddClickHouseDataStore.png?alt=media&#x26;token=beed8adb-f864-461b-9557-ba2fbd16e3a6" alt="" width="461"><figcaption></figcaption></figure>

4. Enter the authentication and connection parameters. These include:

* **Name** – A name that identifies your ClickHouse store.
* **URL** – JDBC URL from the ClickHouse Cloud console. Be sure to preface this URL with a `clickhouse://` scheme:

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FpSc5TrCFJPajRosaMukM%2Fimage.png?alt=media&#x26;token=c58245f9-f4b3-4392-a0f3-c602398a1a4b" alt="" width="375"><figcaption></figcaption></figure>

* **Username** – User login name for the ClickHouse account.
* **Password** – Password for the login name of the ClickHouse account.

4\. Click **Add** to save and create the store.

*For instructions on creating the store using DSQL, see* [create-store](https://docs.deltastream.io/reference/sql-syntax/ddl/create-store "mention").

## Process Streaming Data and Sink it to ClickHouse

{% hint style="warning" %}
**Important** Before you proceed, you must already have:

* A [stream](https://docs.deltastream.io/overview/core-concepts/databases#stream) defined in DeltaStream labeled `pageviews,` which is backed by a topic in an Apache Kafka Store.
* A ClickHouse store labeled `clickhouse_store` (see [#adding-clickhouse-as-a-deltastream-store](#adding-clickhouse-as-a-deltastream-store "mention")).

With these two items in place you can perform a simple filter on the `pageviews` stream and then sink the results into ClickHouse.
{% endhint %}

### Inspect the ClickHouse Store

1. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.<br>

   <figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FvU1DotsaOaJ0sQZpLHM3%2FStoreList3Stores.png?alt=media&#x26;token=bbf809f6-6e31-48fb-adda-0404ad6f3c91" alt="" width="563"><figcaption></figcaption></figure>
2. Click `clickhouse_store.` The store page opens, displaying a list of the existing databases in your ClickHouse account:<br>

   <figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F1KeTCm7J4JlHwsL6uH9r%2Fimage.png?alt=media&#x26;token=b3255e5b-93c6-469e-8462-3b0a8c2c9a40" alt=""><figcaption></figcaption></figure>
3. (Optional) Create a new database. To do this:
   * Click **+ Add Database**. When prompted, enter a name for the new database and click **Add**. The new database displays in the list.
4. To see the tables that exist under a particular database, click the database name.

### Write a CTAS (CREATE TABLE AS SELECT) Query to Sink Data into ClickHouse

1. In the lefthand navigation, click **Workspace** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F5EudXKVDSWZLByUuGbyy%2F5.png?alt=media) ).
2. In the SQL pane of your workspace, write the [create-table-as](https://docs.deltastream.io/reference/sql-syntax/query/create-table-as "mention") (CTAS) query to ingest from **pageviews** and output to a new table titled **ch\_pv\_table**.<br>

   ```sql
   CREATE TABLE ch_pv_table WITH (
     'store' = 'clickhouse_store',
     'clickhouse.db.name' = 'abc'
   ) AS
   SELECT
     viewtime,
     pageid,
     userid
   FROM
     pageviews
   WHERE
     pageid != 'Page_3';
   ```
3. Click **Run**.
4. View the existing queries, including the query from the step immediately prior. To do this, in the lefthand navigation click **Queries** ( ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FoeV9owwjCxALYll5NfSJ%2F6.png?alt=media) ). It may take a few moments for the query to transition into the **Running** state.
5. To see more details about the status of the query, click the query row :<br>

   <figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2Fk9db9tRHRP15bHp2uDwr%2Fimage.png?alt=media&#x26;token=ca915a8f-e055-46a5-9de0-6aa02ea9f255" alt=""><figcaption></figcaption></figure>

### View the results

1. In the lefthand navigation, click **Resources (** ![](https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2F9gG1xxfNSfFRjO6aS5Ou%2F0.png?alt=media) ). This displays a list of the existing data stores.
2. To view the new table created by the above CTAS, navigate to `clickhouse_store` → `Databases` → `abc` → `ch_pv_table`.
3. To view a sample of the data in your ClickHouse table, click **Print**.

<figure><img src="https://1288764042-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fdbd9e6ZJodkgF1H6AVay%2Fuploads%2FNOid76iujs1vfieP1fv2%2Fimage.png?alt=media&#x26;token=8ee1704a-9c59-4b62-acb8-9f1b0e34f5d7" alt="" width="563"><figcaption></figcaption></figure>

### Data Type Handling

At DeltaStream, all data types are nullable by default. But in ClickHouse, data types are not-null by default. For a smooth conversion between SQL and ClickHouse data types, DeltaStream provides a reasonable default for nullable columns to prevent `Nullable()` ClickHouse data types.

Data type defaults are as followed:

<table><thead><tr><th width="331.209228515625" align="center">SQL Data Type</th><th width="147.1939697265625" align="center">ClickHouse Data Type</th><th align="center">Nullable Column Default</th></tr></thead><tbody><tr><td align="center">BOOLEAN</td><td align="center">Boolean</td><td align="center">false</td></tr><tr><td align="center">CHAR</td><td align="center">String</td><td align="center">''</td></tr><tr><td align="center">VARCHAR</td><td align="center">String</td><td align="center">''</td></tr><tr><td align="center">VARBINARY</td><td align="center">String</td><td align="center">''</td></tr><tr><td align="center">TINYINT</td><td align="center">Int8</td><td align="center">0</td></tr><tr><td align="center">SMALLINT</td><td align="center">Int16</td><td align="center">0</td></tr><tr><td align="center">INTEGER</td><td align="center">Int32</td><td align="center">0</td></tr><tr><td align="center">BIGINT</td><td align="center">Int64</td><td align="center">0</td></tr><tr><td align="center">DECIMAL</td><td align="center">Decimal(P, S)</td><td align="center">0.00x</td></tr><tr><td align="center">FLOAT</td><td align="center">Float32</td><td align="center">0.0</td></tr><tr><td align="center">DOUBLE</td><td align="center">Float64</td><td align="center">0.0</td></tr><tr><td align="center">DATE</td><td align="center">Date32</td><td align="center">toDate('1970-01-01')</td></tr><tr><td align="center">TIMESTAMP</td><td align="center">DateTime64(P)</td><td align="center">toDateTime64('1970-01-01 00:00:00', P)</td></tr><tr><td align="center">TIMESTAMP_WITH_LOCAL_TIME_ZONE</td><td align="center">DateTime64(P)</td><td align="center">toDateTime64('1970-01-01 00:00:00', P)</td></tr><tr><td align="center">ARRAY</td><td align="center">Array(T)</td><td align="center">[]</td></tr><tr><td align="center">MAP</td><td align="center">Map(K, V)</td><td align="center">map()</td></tr><tr><td align="center">STRUCT</td><td align="center">Tuple(T1, T2, ...)</td><td align="center">[]</td></tr></tbody></table>

If you must use a specific default that is different than what's provided in the table above, you can use a `CASE` function to alter the value at query execution:

```sql
SELECT
  viewtime,
  userid,
  CASE
    WHEN pageid IS NULL
    THEN 'Page_0'
    END AS pageid_my_default
FROM pageviews;
```

In the above query, whenever the original `pageviews` column `pageid` is `null`, the value `Page_0` is projected to the results. This effectively eliminates all `null` values for the column and also allows you to define the resulting data type for the column as `pageid VARCHAR NOT NULL`.
