# CREATE MATERIALIZED VIEW AS

## Syntax <a href="#synopsis" id="synopsis"></a>

```sql
CREATE MATERIALIZED VIEW
  materialized_view_name
  [WITH (materialized_view_parameter = value [, ... ])]
AS select_statement;
```

## Description <a href="#description" id="description"></a>

`CREATE MATERIALIZED VIEW AS` is a statement that:

* Generates a DDL statement to create a new [#materialized\_view](https://docs.deltastream.io/overview/core-concepts/databases#materialized_view "mention").
* Launches a new query to write the results of the [select](https://docs.deltastream.io/reference/sql-syntax/query/select "mention") statement into the newly created Materialized View.

The specified `select_statement` generates records with columns of a primitive data type. See [select](https://docs.deltastream.io/reference/sql-syntax/query/select "mention") for the statement’s parameters and properties.

If the result of the statement is a [#stream](https://docs.deltastream.io/overview/core-concepts/databases#stream "mention"), new records are added to the Materialized View in the append mode, where new records are appended and there is no update of the existing records. Otherwise, if the statement creates a [#changelog](https://docs.deltastream.io/overview/core-concepts/databases#changelog "mention"), new records are added to the Materialized View in the upsert mode, where a new record updates an existing record if there is one with matching value(s) on the `PRIMARY KEY` column(s). See [create-changelog](https://docs.deltastream.io/reference/sql-syntax/ddl/create-changelog "mention") for further information on `PRIMARY KEY` column(s).

#### How to control Materialized View's size?

When creating a Materialized View, you can define a retention period for the rows to determine how long the Materialized View retains a row before deleting it. By default, a Materialized View keeps all the rows and never deletes any of them. Therefore, size of the Materialized View keeps growing as new rows are added. Defining an appropriate retention period for the Materialized View allows you to control its size and let the older rows be purged automatically. Once a row is expired (according to the retention period) and purged, it is deleted forever and its purging can not be reverted. Currently, retention period has to be defined at the creation time for a Materialized View and user can not enable/disable purging, or change the retention period later. See [#example-1](#example-1 "mention") to see how retention period can be defined.

### Arguments <a href="#example" id="example"></a>

#### materialized\_view\_name

This specifies the name of the new Materialized View. Optionally, use `<database_name>.<schema_name>` as the prefix to the name to create the Relation in that scope.

#### WITH (\<materialized\_view\_parameter> = \<value> \[, …​ ])

Optionally, this clause specifies [#example-1](#example-1 "mention").

#### select\_statement

This statement specifies the [select](https://docs.deltastream.io/reference/sql-syntax/query/select "mention") statement to run.

### Materialized View Parameters <a href="#example" id="example"></a>

<table><thead><tr><th>Parameter Name</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><code>retention.millis</code></td><td><p>The duration (in milliseconds) to determine how long Materialized View retains rows before they are purged. If not set, the rows will not be purged.</p><p><br><strong>Required:</strong> No<br><strong>Default value:</strong> Infinity<br><strong>Type:</strong> Long</p></td><td></td></tr><tr><td><code>timestamp</code></td><td><p>Name of the column in the Materialized View to use as the timestamp. If not set, the timestamp of the message, coming from (one of) the Materialized view's sources, is used as the timestamp.<br></p><p><strong>Required:</strong> No<br><strong>Default value:</strong> Record’s timestamp.<br><strong>Type:</strong> String<br><strong>Valid values:</strong> Chosen column must be of type <code>BIGINT</code> or <code>TIMESTAMP</code>. See <a data-mention href="../../data-types">data-types</a>.</p></td><td></td></tr><tr><td></td><td></td><td></td></tr></tbody></table>

## Examples <a href="#example" id="example"></a>

For the following examples, assume a Stream named `pageviews` has been created using the below DDL statement:

```sql
CREATE STREAM pageviews (
  viewtime BIGINT,
  userid VARCHAR,
  pageid VARCHAR
) WITH (
  'topic' = 'pageviews',
  'value.format' = 'json'
);
```

#### SELECT with a filter

The following statement creates a Materialized View, named `visits`, by selecting `userid` and `pageid` columns from all records in the `pageviews` Stream except those for `User_5`. Note that the result of the `SELECT` statement is a Stream. Therefore, new rows are simply added to the Materialized View with no impact on the existing ones.

```sql
CREATE MATERIALIZED VIEW
  visits
AS SELECT
  userid, pageid
FROM pageviews
WHERE userid <> 'User_5';
```

The `visits` Materialized View will have two columns, `userid` and `pageid`, both with the `VARCHAR` data type. For the details on how this view can be queried, check [select-from-materialized-view](https://docs.deltastream.io/reference/sql-syntax/query/materialized-view/select-from-materialized-view "mention").

#### Run grouping and aggregation <a href="#having_clause" id="having_clause"></a>

The following statement creates a Materialized View, named `pagevisits`, by grouping records from the `pageviews` Stream using the `pageid` column and reporting the total number of records in each group as `cnt`. The result of the `SELECT` statement is a Changelog with the `pageid` column as the Primary Key. Therefore, new rows are added to the Materialized View in the upsert mode. Each `pageid` value has only one row in `pagevists` and a new row with matching `pageid` value overwrites the existing one, if any.

```sql
CREATE MATERIALIZED VIEW
  pagevisits
AS SELECT
  pageid, count(userid) as cnt 
FROM pageviews
GROUP BY pageid;
```

The `pagevisits` Materialized View will have two columns, `pageid` with the `VARCHAR` data type and `cnt` with the `BIGINT` data type. For the details on how this view can be queried, check [select-from-materialized-view](https://docs.deltastream.io/reference/sql-syntax/query/materialized-view/select-from-materialized-view "mention").

#### Run grouping and aggregation with window function

The following statement creates a Materialized View, named `visit_stats`, by aggregating the total number of pages a given user visited every 30 seconds. Given that the `SELECT` statement generates results as a Changelog with the Primary Key of (`window_start, window_end, userid)`, new rows are added to the Materialized View in the upsert mode, and there is only one row for any given `userid` in each 30-second interval.

```sql
CREATE MATERIALIZED VIEW
  visit_stats
AS SELECT 
  window_start, 
  window_end,
  userid, 
  COUNT(pageid) AS pgcnt 
FROM TUMBLE(pageviews, size 30 SECONDS) 
GROUP BY window_start, window_end, userid;
```

The `visit_stats` Materialized View will have four columns, `window_start` and `window_end` with the `TIMESTAMP(3)` data type, as well as `userid` with the `VARCHAR` data type and `pgcnt` with the `BIGINT` data type. For the details on how this view can be queried, check [select-from-materialized-view](https://docs.deltastream.io/reference/sql-syntax/query/materialized-view/select-from-materialized-view "mention").

#### Create a Materialized View with retention period

The following statement creates a Materialized View named `visits_rate`, by aggregating the total number of pages a given user visits every 10 seconds. Moreover, the `retention.millis` parameter is set to `600000` milliseconds (i.e. 10 minutes). This means the retention period for any row is 10 minutes and the row is purged after that. This will help controlling the total size of `visits_rate` by deleting older rows which are no longer needed.

```sql
CREATE MATERIALIZED VIEW
  visits_rate
WITH ('retention.millis' = 600000)
AS SELECT 
  window_start AS start_time, 
  window_end AS end_time, 
  userid, 
  count(pageid) AS cnt
FROM Tumble(pageviews, size 10 seconds)
GROUP BY userid, window_start, window_end;
```

#### Create a Materialized View with custom timestamp field

The following statement creates a Materialized View named `visits_rate_ts`, by aggregating the total number of pages a given user visits every 10 minutes. Moreover, the `timestamp` parameter is set to `start_time` column (i.e. `window_start` column). This means the Materialized View relation can purge data based on when the window was created, plus any other operation of the View that relies on a pivoting timestamp field of `TIMESTAMP(3)` in this case. This will help controlling the total number of windows created for `visits_rate` by deleting older ones which are no longer needed.

```sql
CREATE MATERIALIZED VIEW
  visits_rate_ts
WITH ('timestamp' = 'start_time')
AS SELECT 
  window_start AS start_time, 
  window_end AS end_time, 
  userid, 
  count(pageid) AS cnt
FROM Tumble(pageviews, size 10 MINUTES)
GROUP BY userid, window_start, window_end;
```
