# MATCH\_RECOGNIZE

## Syntax

```sql
MATCH_RECOGNIZE (
    [PARTITION BY expression [, ...]]
    ORDER BY expression ASC [, expression [ASC | DESC], ...]
    MEASURES expression AS alias [, ...]
    [ONE ROW PER MATCH]
    [AFTER MATCH strategy]
    PATTERN (variable [, ...])
    [WITHIN size time_unit]
    DEFINE variable AS expression [, ...]
)
```

## Description

The `MATCH_RECOGNIZE` clause is used for searching patterns in a set of events from a [#\_relation](https://docs.deltastream.io/overview/core-concepts/databases#_relation "mention")as its input within the [from](https://docs.deltastream.io/reference/sql-syntax/query/select/from "mention") clause.

A `MATCH_RECOGNIZE` clause enables advanced event processing using the following tasks:

* Logical partitioning of events using `PARTITION BY`
* Logical ordering of events for each partition based on one or many ordering columns using `ORDER BY`
* Definition of variables in the `DEFINE` subclause that are accessible from `PATTERN`, similar to a regular expression syntax
* Definition of one or many `MEASURE` columns using the `DEFINE` variables, within the scope of matched events in `PATTERN`
* Zero or one event per pattern recognition expression defined in `PATTERN`

A [](https://docs.deltastream.io/reference/sql-syntax/query/select "mention") with `*` projection is not supported on a `MATCH_RECOGNIZE` operator.

### Arguments

#### PARTITION BY expression \[, ...]

Optionally, this subclause can be used to parition the data to be looked at by the `PATTERN`. This subclause groups the underlying data based on the paritioning expression(s), and optimizes DeltaStream's compute resources for parallel processing of the source data.

If `PARTITION BY` subclause is not provided, all data is scanned sequentially to ensure global ordering for `ORDER BY`, hence, it's recommended to use the `PARTITION BY` subclause with at least one column.

The columns mentioned in this subclause contribute to the final schema of the `MATCH_RECOGNIZE` operator. See [#measures-expression-as-alias-...](#measures-expression-as-alias-... "mention") for definition of the full schema.

#### ORDER BY expression ASC \[, expression \[ASC | DESC], ...]

This cause defines the order in which the data should be sorted for each partition before they are passed to the pattern recognition in the `MATCH_RECOGNIZE` operator.

The first column for the `ORDER BY` subclause must be an event-time column defined for the input Relation in ascending order. The event-time column can be the record time at which a record was added to an Entity, referred to as `rowtime()`. This can also be a user-defined timestamp column from a list of value columns. See [#\_source\_properties](https://docs.deltastream.io/reference/sql-syntax/query/from#_source_properties "mention") for a user-defined timestamp column.

#### MEASURES expression AS alias \[, ...]

This subclause is used for providing the output schema for the `MATCH_RECOGNIZE` operator, much like the columns provided as a result of a [](https://docs.deltastream.io/reference/sql-syntax/query/select "mention") statement. This means that `MEASURES` can project source columns from the `PATTERN`, or evaluated expressions that are based on projection columns.

The final schema of the `MATCH_RECOGNIZE` is defined by the optional `PARTITION BY` columns followed by the `MEASURES` columns. [#one-row-per-match](#one-row-per-match "mention") defines how many rows to output for this schema, per match.

Since the `MATCH_RECOGNIZE` operator alters the projection columns of its input [#\_relation](https://docs.deltastream.io/overview/core-concepts/databases#_relation "mention"), it is required to provide which variable from `DEFINE` would provide the row metadata columns for the projected `MATCH_RECOGNIZE` columns. See [row-functions](https://docs.deltastream.io/reference/sql-syntax/query/functions/row-functions "mention") for more information on how the row metadata columns are used.

Aggregation or offset functions may be used to do further processing on the variable definitions. See [built-in-functions](https://docs.deltastream.io/reference/sql-syntax/query/functions/built-in-functions "mention") for further function information.

#### ONE ROW PER MATCH

Optionally, this output mode can be provided for completeness of a `MATCH_RECOGNIZE` operator, but is the only option for how many rows to output per match.

#### AFTER MATCH strategy

Optionally, start from a different position that is different than the `AFTER MATCH SKIP PAST LAST ROW` `strategy`, after a successful match.

`strategy` can be set to:

* `SKIP TO NEXT ROW`: Continues matching starting at the first row after the current match.
* `SKIP PAST LAST ROW`: As the default, continues matching for new rows starting after the last row of the current match.
* `SKIP TO FIRST variable`: Continues matching starting at the first row that matched the provided `variable`. See `DEFINES` for variable definitions.
* `SKIP TO LAST variable`: Continues matching starting at the last row that matched the provided `variable`. See `DEFINES` for variable definitions.

#### PATTERN (variable \[, ...])

This subclause is the core processing power for the `MATCH_RECOGNIZE` operator, and it explains what pattern the rows should be matched against. `DEFINE` variables in addition to quantifiers or modifiers may build a complex pattern.

A **quantifier** is used to denote the number of `DEFINE` variable or operation to be considered in the matching process. Supported quantifiers include:

* `+`: 1 or more rows
* `*`: 0 or more rows
* `?`: 0 or 1 row
* `{n}`: Exactly `n` rows, where `n` > 0
* `{n,}`: `n` or more rows, where `n` is a non-negative number
* `{,m}`: 0 to `m` rows, inclusive, where `m` is a positive number
* `{n,m}`: `n` to `m` rows, where `n` is a non-negative number and `m` is greater than `n`

By default, quantifiers are greedy unless they are defined to be reluctant using the `?` operation.

#### WITHIN size time\_unit

Optionally, this subclause is used to provide a time constraint for potential matches. While the `WITHIN` clause is not part of the SQL standard for `MATCH_RECOGNIZE`, we provide this subclause as a way to manage the amount of state that your query needs to maintain. If the `WITHIN` subclause is not provided, then the default is that there is no time constraint for matches.

All of a match's records must fall within the time constraint to be considered a valid match. For example, if the time between the first and last record in a match is longer than the time constraint, then that match would be invalid and not produced as an output. The time of a record is determined by its event time (see `timestamp` under [#\_source\_properties](https://docs.deltastream.io/reference/sql-syntax/query/from#_source_properties "mention")).

Note that it is generally recommended to provide the `WITHIN` clause. Many `MATCH_RECOGNIZE` queries, especially those that use a greedy quantifier, will keep track of all the records it consumes in memory. Without providing a `WITHIN` time constraint, these queries will eventually run out of memory and fail. The `WITHIN` time constraint allows the query to safely clean up its memory of records that have fallen out of the time interval.

#### DEFINE variable AS expression \[, ...]

This subclause is used as the building block for the entire `MATCH_RECOGNIZE` operator for providing name for expressions that are used in building a `PATTERN`. By default, a variable that is not defined, but used in a `PATTERN`, evaluates to true for all rows that contibute to a recognized pattern.

Aggregation or offset functions may be used to do further processing on the variable definitions. See [built-in-functions](https://docs.deltastream.io/reference/sql-syntax/query/functions/built-in-functions "mention") for further function information.

## Examples

#### Aggregated price trends over time

One of the common use cases for pattern recognition is detecting market price changes over time for a specific symbol to allow downstream notification for an end user or application:

```sql
CREATE STREAM ticker_average_drop
AS SELECT symbol, start_tstamp, end_tstamp, avg_price
FROM market_ticker
  MATCH_RECOGNIZE (
    PARTITION BY symbol
    ORDER BY rowtime()
    MEASURES
      A.row_timestamp AS row_timestamp,
      A.row_key AS row_key,
      A.row_metadata AS row_metadata,
      FIRST(A.ts) AS start_tstamp,
      LAST(A.ts) AS end_tstamp,
      AVG(A.price) AS avg_price
    ONE ROW PER MATCH
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (A+ B)
    DEFINE
      A AS AVG(A.price) < 15
  ) MR;
```

This CSAS statement detects when a `symbol`'s average price drops under `15`. When such an event is detected, it projects the time at which this trend started and ended with, in addition to what average price the `symbol` ended with.

These events are orderd by the time at which each price change was reported to `market_ticker`.

#### Patterns within a time interval

When recognizing a pattern over a continuous stream, it's important to recognize that matched patterns are recognized with the help of an internal state over the life of the `PATTERN` definition. Hence, the more events are involved in a pattern, the larger the state of the query becomes, and the larger the state of the query, the more risk there is to the health of the operations involved.

For that matter, the state of a pattern can be controlled over time, by providing a `WITHIN` subclause, which dictates the amount of time a `PATTERN` should be tracking against:

```sql
CREATE STREAM ticker_price_fluctuation
AS SELECT symbol, start_price, last_price
FROM market_ticker
  MATCH_RECOGNIZE(
    PARTITION BY symbol
    ORDER BY rowtime()
    MEASURES
      C.row_timestamp AS row_timestamp,
      C.row_key AS row_key,
      C.row_metadata AS row_metadata,
      A.price AS start_price,
      C.price AS last_price
    ONE ROW PER MATCH
    AFTER MATCH SKIP TO LAST C
    PATTERN (A B* C)
    WITHIN 1 HOUR
    DEFINE
      A AS price > 10,
      B AS price < 15,
      C AS price > 15
  );
```

This example looks for a pattern starting with `price > 10`, with zero to many events where price is under `15`, and ends with an event where `price > 15` within the last hour. It's worth mentioning that this pattern recognition is true iff this pattern occurs within the last hour. If the events matching this pattern span more than an hour, the series won't be matched and no output will be recroded for it. See [#within-size-time\_unit](#within-size-time_unit "mention") for more information.

One more thing to point out here is that we used `AFTER MATCH SKIP TO LAST C` to ensure we're capturing continuous patterns, instead of `AFTER MATCH SKIP PAST LAST ROW` used for isolating matched pattern from one another.

#### Non-overlapping PATTERNs with multiple ORDER BY columns

When matching patterns over more complex [#\_relation](https://docs.deltastream.io/overview/core-concepts/databases#_relation "mention")s, we can use multiple ordering columns to narrow down and shape the pattern in a way that matters the most to the pattern we're looking for. This example uses `ORDER BY ts, price` to order the events by the defined timestamp column `ts` and the `price` column:

```sql
CREATE STREAM ticker_average_drop
AS SELECT start_time, row_timestamp AS end_time, symbol, last_price
FROM market_ticker
  MATCH_RECOGNIZE(
    PARTITION BY symbol
    ORDER BY ts, price
    MEASURES
      C.row_timestamp AS row_timestamp,
      C.row_key AS row_key,
      C.row_metadata AS row_metadata,
      A.row_timestamp AS start_time,
      C.price AS last_price
    ONE ROW PER MATCH
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (A B* C)
    DEFINE
      A AS price > 10,
      B AS price < 15,
      C AS price > 12
  ) AS MR WITH ('timestamp'='ts');
```

We're defining our timestamp column to be `ts` to ensure ordering on when a price change was reported by the upstream source rather than the time an event was recorded in `market_ticker`. See `timestamp` under [#\_source\_properties](https://docs.deltastream.io/reference/sql-syntax/query/from#_source_properties "mention") for more information on defining an event time for source Relation.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.deltastream.io/reference/sql-syntax/query/select/match_recognize.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
