Data Types

DeltaStream's data types are similar to SQL standard's data types. They are used to declare the value type for columns in relations or input/output parameters in functions.

Primitive Data Types

Boolean

BOOLEAN

Boolean value representing true or false.

Character string

VARCHAR VARCHAR(n)

Variable-length character string. Type parameter n represents maximum length of string (number of code points) and has to be an integer between 1and 2,147,483,647. If no length specified, maximum length is picked.

Numeric

TINYINT

1-byte signed integer. It can hold an integer value between -128 and 127.

SMALLINT

2-byte signed integer. It can hold an integer value between -32,768 and 32,767.

INTEGER

4-byte signed integer. It can hold an integer value between -2^31 and 2^31-1.

BIGINT

8-byte signed integer. It can hold an integer value between -2^63 and 2^63-1

FLOAT

4-byte variable precision floating point number.

DOUBLE

8-byte variable precision floating point number.

DECIMAL DECIMAL(p, s)

Decimal number with fixed precision and scale. Decimal type accepts two type parameters: p: for precision and s for scale. Precision is the maximum total number of digits to be stored in the number (an integer value between 1 and 38). Scale is the number of decimal digits to the right of the decimal point (an integer value between 0 and p). If no type parameters specified, default precision of 10 and default scale of 0 are picked.

Date and Time

DATE

Calendar date consisting of year-month-day with values between 0000-01-01 and 9999-12-31.

TIME TIME(p)

Time of day, without time zone, consisting of hour:minute:second[.fractional] (up to nanosecond precision) with values between 00:00:00.000000000 and 23:59:59.999999999. Type parameter p represents precision which is the number of digits of fractional seconds (an integer between 0 and 9). If no precision specified, default precision of 0 is picked.

TIMESTAMP TIMESTAMP(p)

Timestamp, without time zone, consisting of year-month-day hour:minute:second[.fractional] (up to nanosecond precision) with values between 0000-01-01 00:00:00.000000000 and 9999-12-31 23:59:59.999999999. Type parameter p represents precision which is the number of digits of fractional seconds (an integer between 0 and 9). If no precision specified, default precision of 6 is picked. There are two types of formats for timestamps that are supported – sql and iso8601.

TIMESTAMP WITH LOCAL TIME ZONE TIMESTAMP(p) WITH LOCAL TIME ZONE TIMESTAMP_LTZ TIMESTAMP_LTZ(p)

Timestamp with local time zone consisting of year-month-day hour:minute:second[.fractional] zone (up to nanosecond precision) with values between 0000-01-01 00:00:00.000000000Z to 9999-12-31 23:59:59.999999999Z. Values for this type assume Instant semantics in the UTC time zone. Type parameter p represents precision which is the number of digits of fractional seconds (an integer between 0 and 9). If no precision specified, default precision of 6 is picked. TIMESTAMP_LTZ is a synonym for TIMESTAMP WITH LOCAL TIME ZONE. There are two types of formats for timestamps that are supported – sql and iso8601.

Binary string

VARBINARY VARBINARY(n) BYTES

Variable-length sequence of bytes. Type parameter n represents maximum length of sequence (number of bytes) and has to be an integer between 1and 2,147,483,647. If no length specified, default length of 1 is picked. BYTES is a synonym for VARBINARY with maximum length (VARBINARY(2147483647)).

Constructed Data Types

ARRAY<t>

Array of elements with same data type. Type parameter t is required and represents data type of array elements. The maximum number of elements of an array can not be specified and is fixed at 2,147,483,647. Similar to SQL, arrays are one-indexed (first element is at index 1) and a given element can be accessed using the [] operator with the index.

Example. In a relation DDL statement, the expression employees ARRAY<VARCHAR> declares a column named employees with values of type array holding character strings. employees[6] is used to access the 6th element in a given value from this column.

MAP<k,v>

Associative array that maps keys to values. No duplicate keys can exist in a map and each key can map to at most one value. Type parameters k and v are required and they represent data type of the key elements and the value elements, respectively, throughout the map.

Order of keys within a MAP is not guaranteed, and may use one of ARRAY or STRUCT to guaranteed ordering within a set of queries. For a given key, the associated value can be accessed using the [] operator with the key. Example. In a relation DDL statement, the expression manager MAP<VARCHAR, VARCHAR> declares a column named manager with values of type map holding key value pairs of character string types. manager['HR'] is used to access the value associated with the key HR in a given value instance from this column.

STRUCT<fn ft, ...>

Strongly typed structured data type that represents an ordered collection of one or more fields. A field consists of a name and a data type and captures a specific value in a given instance of struct. Sequence of type parameter pair(s) fn ft represent name and data type for each and every field, in order. A given field of a struct instance is accessed using the -> operator with the field name. Example. In a relation DDL statement, the expression address STRUCT<city VARCHAR, zipcode VARCHAR> declares a column named address with values of type struct holding two fields of character string types, named city and zipcode. address->city is used to access the city field in a given value instance from this column.

PreviousSerializing with Protobuf NextIdentifiers and Keywords

Last updated 5 months ago