INSERT INTO
Syntax
INSERT INTO
relation_name
select_statement;
Description
For an existing Relation, INSERT INTO
runs the given query (i.e. a SELECT
statement) and adds its results into the sink Relation. The list of columns of the sink Relation and the SELECT
columns list in the query should be data type compatible. Moreover, the Relation type of a sink Relation should match with the Relation type of query results. For example, the results of a query that uses grouping aggregation cannot be inserted into a Database as the result type for a query with GROUP BY
as a Database.
Arguments
relation_name
This specifies the name of the Relation to add results to. Relation names can be specified as fully/partially qualified names via the specifying database_name and/or schema_name in the format [<database_name>.<schema_name>.]<relation_name>
(such as db1.public.pageviews
). Otherwise, the current Database and Schema will be used to identify the Relation. For case-sensitive names, the name must be wrapped in double quotes; otherwise, the lowercase name will be used.
select_statement
This clause specifies the SELECT
statement to run; see SELECT for more information.
Examples
Select all INSERT INTO
INSERT INTO
The following copies all data from the source Relation and inserts it into a preexisting Relation.
INSERT INTO pageviews2 SELECT * FROM pageviews;
INSERT INTO
with grouping and aggregation
INSERT INTO
with grouping and aggregationThe following runs a query that finds the average ViewTime
in a 5 second window and inserts the results into the already existing Relation Aggr Pageviews2
.
INSERT INTO
"Aggr Pageviews2"
SELECT
window_start,
window_end,
avg("ViewTime") AS "AvgTime",
"UserID",
"pageId"
FROM TUMBLE("CaseSensitivePageviews", size 5 second)
GROUP BY
window_start,
window_end,
"UserID",
"pageId";
Combine multiple queries’ results with INSERT INTO
INSERT INTO
INSERT INTO
can be used to combine the results of multiple queries into a single sink Relation, as long as:
Every query has the same sink Relation type.
The
SELECT
columns list in every query has the same number of columns, with similar data types, in the same order.
For example, assume two Changelogs are created from the users
Stream to collect stats on the total number of users in different cities in Europe and the U.S.
CREATE CHANGELOG users_eu
AS SELECT contactinfo->city AS city, count(userid) AS ucount
FROM users
WHERE regionid = 'EUROPE'
GROUP BY contactinfo->city;
CREATE CHANGELOG users_us
AS SELECT contactinfo->city AS city, count(userid) AS ucount
FROM users
WHERE regionid = 'US'
GROUP BY contactinfo->city;
Moreover, assume we are interested in keeping track of cities in Europe or the U.S. with more than a thousand users, in a single Relation. We can create a third Changelog, named total_users
, with the below DDL and use the following two INSERT INTO
statements to combine results from the above Changelogs and add them to the total_users
Changelog:
CREATE CHANGELOG total_users (
city VARCHAR,
total_cnt BIGINT,
PRIMARY KEY(city)
)
WITH (
'topic'='total_users',
'value.format'='json'
);
INSERT INTO total_users
SELECT * FROM users_eu
WHERE ucount > 1000;
INSERT INTO total_users
SELECT * FROM users_us
WHERE ucount > 1000;
Last updated