INSERT INTO

Syntax

INSERT INTO
    relation_name
    select_statement;

Description

For an existing Relation, INSERT INTO runs the given query (i.e. a SELECT statement) and adds its results into the sink Relation. The list of columns of the sink Relation and the SELECT columns list in the query should be data type compatible. Moreover, the Relation type of a sink Relation should match with the Relation type of query results. For example, the results of a query that uses grouping aggregation cannot be inserted into a Database as the result type for a query with GROUP BY as a Database.

INSERT INTO does not support MATERIALIZED VIEW as the sink Relation.

Arguments

relation_name

This specifies the name of the Relation to add results to. Relation names can be specified as fully/partially qualified names via the specifying database_name and/or schema_name in the format [<database_name>.<schema_name>.]<relation_name> (such as db1.public.pageviews). Otherwise, the current Database and Schema will be used to identify the Relation. For case-sensitive names, the name must be wrapped in double quotes; otherwise, the lowercase name will be used.

select_statement

This clause specifies the SELECT statement to run; see SELECT for more information.

Examples

Select all INSERT INTO

The following copies all data from the source Relation and inserts it into a preexisting Relation.

INSERT INTO pageviews2 SELECT * FROM pageviews;

INSERT INTO with grouping and aggregation

The following runs a query that finds the average ViewTime in a 5 second window and inserts the results into the already existing Relation Aggr Pageviews2.

INSERT INTO
  "Aggr Pageviews2" 
SELECT 
  window_start, 
  window_end, 
  avg("ViewTime") AS "AvgTime", 
  "UserID", 
  "pageId" 
FROM TUMBLE("CaseSensitivePageviews", size 5 second) 
GROUP BY 
  window_start, 
  window_end, 
  "UserID", 
  "pageId";

Combine multiple queries’ results with INSERT INTO

INSERT INTO can be used to combine the results of multiple queries into a single sink Relation, as long as:

  • Every query has the same sink Relation type.

  • The SELECT columns list in every query has the same number of columns, with similar data types, in the same order.

For example, assume two Changelogs are created from the users Stream to collect stats on the total number of users in different cities in Europe and the U.S.

CREATE CHANGELOG users_eu
AS SELECT contactinfo->city AS city, count(userid) AS ucount
FROM users
WHERE regionid = 'EUROPE'
GROUP BY contactinfo->city;

CREATE CHANGELOG users_us
AS SELECT contactinfo->city AS city, count(userid) AS ucount
FROM users
WHERE regionid = 'US'
GROUP BY contactinfo->city;

Moreover, assume we are interested in keeping track of cities in Europe or the U.S. with more than a thousand users, in a single Relation. We can create a third Changelog, named total_users, with the below DDL and use the following two INSERT INTO statements to combine results from the above Changelogs and add them to the total_users Changelog:

CREATE CHANGELOG total_users (
   city VARCHAR,
   total_cnt BIGINT,
   PRIMARY KEY(city)
)
WITH (
   'topic'='total_users',
   'value.format'='json'
);
INSERT INTO total_users
SELECT * FROM users_eu
WHERE ucount > 1000;
INSERT INTO total_users
SELECT * FROM users_us
WHERE ucount > 1000;

Last updated