全部产品
云市场
云游戏

数据类型

更新时间:2020-06-18 10:28:59

TSDB 2.0中, 数据会根据数据类型, 组织为不同的格式进行存储. 数据类型在数据的插入,存储和查询过程中属于关键的特征. 合理的设置数据类型是正确使用TSDB数据库的关键。

TSDB 2.0 中, 数据类型的名称被视为关键词, 如果column的名称与数据类型重名, 需要对该名称进行转义。

类型分类

基础类型

基础类型描述具备原子性(atomic),不可再分割的基础元素.

Boolean

基础的布尔值, 接受truefalse两个枚举值.

  1. cr> create table my_bool_table (
  2. ... first_column boolean
  3. ... );
  4. CREATE OK, 1 row affected (... sec)

Text

包含多个unicode字符的字符串

  1. cr> create table my_table2 (
  2. ... first_column text
  3. ... );
  4. CREATE OK, 1 row affected (... sec)

注意: text文本也支持 创建全文索引, 支持模糊搜索。

字符串的长度被限制为32766字节(编码为UTF-8). 但当字符串被全文索引或者列存(Column store)模式被禁用时除外.

Numeric

名称 大小 描述 范围
smallint 2 bytes 小范围整数 -32,768 - 32,767
integer 4 bytes 整数 -2^31 - 2^31-1
bigint 8 bytes 大范围整数 -2^63 - 2^63-1
real 4 bytes 可变精度 6位精度浮点数
double precision 8 bytes 可变精度 15位精度浮点数

realdouble precision 类型是变长存储,非精确的数值类型, 他们内部是尽量保持浮点数精度. 因此针对这些浮点数的存储,计算和查询均不保证其提供精确的浮点值精度. 因此, 诸如比较浮点数, 对浮点数做聚合(sum/avg)可能会产生非精确的结果.

特殊浮点数引用

TSDB 2.0支持对realdouble precision类型的浮点值进行特殊值引用, 引用标准遵循 IEEE 754.支持的特殊浮点数应用: NaN, Infinity, -Infinity(negative infinity) 以及 -0(signed zero)

  1. cr> SELECT 0.0 / 0.0, 1.0 / 0.0, 1.0 / -0.0;
  2. +-------------+-------------+---------------+
  3. | (0.0 / 0.0) | (1.0 / 0.0) | (1.0 / - 0.0) |
  4. +-------------+-------------+---------------+
  5. | NaN | Infinity | -Infinity |
  6. +-------------+-------------+---------------+
  7. SELECT 1 row in set (... sec)

这些特殊浮点值的引用也可以在数据插入的使用。

  1. cr> create table my_table3 (
  2. ... first_column integer,
  3. ... second_column bigint,
  4. ... third_column smallint,
  5. ... fourth_column double precision,
  6. ... fifth_column real,
  7. ... sixth_column char
  8. ... );
  9. CREATE OK, 1 row affected (... sec)
  1. cr> INSERT INTO my_table3 (fourth_column, fifth_column)
  2. ... VALUES ('NaN', 'Infinity');
  3. INSERT OK, 1 row affected (... sec)

IP

IP类型支持将IPv4或IPv6地址进行高效存储和查询. 在内部,IP地址会转化为bigint进行存储. 支持高效排序,过滤和聚合。

  1. cr> create table my_table_ips (
  2. ... fqdn text,
  3. ... ip_addr ip
  4. ... );
  5. CREATE OK, 1 row affected (... sec)
  1. cr> insert into my_table_ips (fqdn, ip_addr)
  2. ... values ('localhost', '127.0.0.1'),
  3. ... ('router.local', '0:0:0:0:0:ffff:c0a8:64');
  4. INSERT OK, 2 rows affected (... sec)
  1. cr> insert into my_table_ips (fqdn, ip_addr)
  2. ... values ('localhost', 'not.a.real.ip');
  3. SQLActionException[ColumnValidationException: Validation failed for ip_addr: Cannot cast 'not.a.real.ip' to type ip]

Ip 地址类型支持二进制操作符 << 来进行 CIDR 子网标记. 语法为 [ip address/prefix_length]. e.g. ‘192.168.1.5’ << ‘192.168.1/24’.

日期/时间

名称 大小 描述 范围
timestamp with time zone 8 bytes 带时区的时间戳 292275054BC - 292278993AD
timestamp without time zone 8 bytes 不带时区的时间戳 292275054BC - 292278993AD
时间戳标记

时间戳由 日期和时间共同组成, 并且可选的带上一个时区信息. 在内部, 时间戳存储为从 1970-01-01T00:00:00Z 至今的UTC 毫秒计数器(bigint). 因此, 时间戳可以合法的与bigint进行转换。

时间戳的标记语法:

  1. date-element [time-separator [time-element [offset]]]
  2. time-separator: 'T' | ' '
  3. date-element: yyyy-MM-dd
  4. time-element: HH:mm:ss [fraction]
  5. fraction: '.' digit+
  6. offset: {+ | -} HH [:mm] | 'Z'

注意, 当写入的 timestamp 小于 -999999999999999 (-29719-04-05T22:13:20.001Z) 或者大于999999999999999 (33658-09-27T01:46:39.999Z) 时, 将会出现溢出问题.

由于内部的时间解析存在限制, 不是所有bigint的数值范围均受到支持, 目前支持的合理解析范围为 (292275054BC - 292278993AD).

带时区的时间戳标记

带时间信息的时间戳将会被转换为时间偏移后的UTC时间戳存储。

  1. cr> select '1970-01-02T00:00:00+0100'::timestamp with time zone as ts_z,
  2. ... '1970-01-02T00:00:00Z'::timestamp with time zone ts_z,
  3. ... '1970-01-02T00:00:00'::timestamp with time zone ts_z,
  4. ... '1970-01-02 00:00:00'::timestamp with time zone ts_z_sql_format;
  5. +----------+----------+----------+-----------------+
  6. | ts_z | ts_z | ts_z | ts_z_sql_format |
  7. +----------+----------+----------+-----------------+
  8. | 82800000 | 86400000 | 86400000 | 86400000 |
  9. +----------+----------+----------+-----------------+
  10. SELECT 1 row in set (... sec)
不带时区信息的时间戳标记
  1. cr> select '1970-01-02T00:00:00+0200'::timestamp without time zone as ts,
  2. ... '1970-01-02T00:00:00+0400'::timestamp without time zone as ts,
  3. ... '1970-01-02T00:00:00Z'::timestamp without time zone as ts,
  4. ... '1970-01-02 00:00:00Z'::timestamp without time zone as ts_sql_format;
  5. +----------+----------+----------+---------------+
  6. | ts | ts | ts | ts_sql_format |
  7. +----------+----------+----------+---------------+
  8. | 86400000 | 86400000 | 86400000 | 86400000 |
  9. +----------+----------+----------+---------------+
  10. SELECT 1 row in set (... sec)

Interval

时间间隔标识一段时间范围, 例如 年-月.

  1. <interval_literal> ::=
  2. INTERVAL [ <sign> ] <string_literal> <interval_qualifier>
  3. <interval_qualifier> ::=
  4. <start_field> [ TO <end_field>]
  5. <start_field> ::= <datetime_field>
  6. <end_field> ::= <datetime_field>
  7. <datetime_field> ::=
  8. YEAR
  9. | MONTH
  10. | DAY
  11. | HOUR
  12. | MINUTE
  13. | SECOND
year-month

year-month 包含年/月信息。

  1. <year_month_literal> ::=
  2. INTERVAL [ {+ | -} ]'yy' <interval_qualifier> |
  3. INTERVAL [ {+ | -} ]'[ yy- ] mm' <interval_qualifier>
  1. cr> select INTERVAL '01-02' YEAR TO MONTH;
  2. +--------------------------------+
  3. | INTERVAL '01-02' YEAR TO MONTH |
  4. +--------------------------------+
  5. | 1 year 2 mons 00:00:00 |
  6. +--------------------------------+
  7. SELECT 1 row in set (... sec)
day-time
  1. <day_time_literal> ::=
  2. INTERVAL [ {+ | -} ]'dd [ <space> hh [ :mm [ :ss ]]]' <interval_qualifier>
  3. INTERVAL [ {+ | -} ]'hh [ :mm [ :ss [ .nn ]]]' <interval_qualifier>
  4. INTERVAL [ {+ | -} ]'mm [ :ss [ .nn ]]' <interval_qualifier>
  5. INTERVAL [ {+ | -} ]'ss [ .nn ]' <interval_qualifier>
  1. cr> select INTERVAL '10 23:10' DAY TO MINUTE;
  2. +-----------------------------------+
  3. | INTERVAL '10 23:10' DAY TO MINUTE |
  4. +-----------------------------------+
  5. | 1 weeks 3 days 23:10:00 |
  6. +-----------------------------------+
  7. SELECT 1 row in set (... sec)
字符串标记
  1. cr> select INTERVAL '1-2 3 4:5:6';
  2. +---------------------------------+
  3. | CAST('1-2 3 4:5:6' AS interval) |
  4. +---------------------------------+
  5. | 1 year 2 mons 3 days 04:05:06 |
  6. +---------------------------------+
  7. SELECT 1 row in set (... sec)
ISO-8601 标记
  1. cr> select INTERVAL 'P1Y2M3DT4H5M6S';
  2. +------------------------------------+
  3. | CAST('P1Y2M3DT4H5M6S' AS interval) |
  4. +------------------------------------+
  5. | 1 year 2 mons 3 days 04:05:06 |
  6. +------------------------------------+
  7. SELECT 1 row in set (... sec)
PostgreSQL 标记
  1. cr> select INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds';
  2. +------------------------------------------------------------------------+
  3. | CAST('1 year 2 months 3 days 4 hours 5 minutes 6 seconds' AS interval) |
  4. +------------------------------------------------------------------------+
  5. | 1 year 2 mons 3 days 04:05:06 |
  6. +------------------------------------------------------------------------+
  7. SELECT 1 row in set (... sec)

时间计算

时间戳支持简单的加减计算。

Operand Operator Operand
timestamp - timestamp
interval + timestamp
timestamp +, - interval
interval +, - interval

空间类型

空间形状(geo_shape)

空间形状类型存储地理图形, 标识为GeoJSON对象。一个空间形状列(coumn)支持存储不同类型的GeoJSON地理图形对象.(例如, 同时存储LineString或MultiPolygon图形)。

类型定义
  1. <columnName> geo_shape

为了快速查询空间地理信息, 需要同时指定空间索引.

  1. <columnName> geo_shape INDEX USING geohash WITH (precision='50m', distance_error_pct=0.025)

常见的空间索引结构为 geohash (默认) 和 quadtree.

复杂类型

对象(object)

对象类型允许定义复杂的嵌套结构。每个对象可以包含一个或多个任意类型的字段(field). 对象自身的定义可以是动态类型(schemaless)或者强制地理对象结构(schema). 对象本身也可以定义为JSON二进制块。

定义语法
  1. <columnName> OBJECT [ ({DYNAMIC|STRICT|IGNORED}) ] [ AS ( <columnDefinition>* ) ]
  1. cr> create table my_table11 (
  2. ... title text,
  3. ... col1 object,
  4. ... col3 object(strict) as (
  5. ... age integer,
  6. ... name text,
  7. ... col31 object as (
  8. ... birthday timestamp with time zone
  9. ... )
  10. ... )
  11. ... );
  12. CREATE OK, 1 row affected (... sec)
Strict

每个field可以声明为强制模式(strict), 这意味着新写入的对象必须要包含对象声明式标记为strict的field。

  1. cr> create table my_table12 (
  2. ... title text,
  3. ... author object(strict) as (
  4. ... name text,
  5. ... birthday timestamp with time zone
  6. ... )
  7. ... );
  8. CREATE OK, 1 row affected (... sec)
Dynamic

另外一个field选项为动态标记(dynamic),这意味着该字段可以在生命周期内动态添加字段信息。

  1. cr> create table my_table14 (
  2. ... title text,
  3. ... author object(dynamic) as (
  4. ... name text,
  5. ... birthday timestamp with time zone
  6. ... )
  7. ... );
  8. CREATE OK, 1 row affected (... sec)

对象标记

  1. { [ ident = expr [ , ... ] ] }

空对象

  1. {}

包含Boolean的对象

  1. { my_bool_column = true }

包含文本的对象

  1. { my_str_col = 'this is a text value' }

包含数值的对象

  1. { my_int_col = 1234, my_float_col = 5.6 }

包含集合的对象

  1. { my_array_column = ['v', 'a', 'l', 'u', 'e'] }

驼峰标记必须引用

  1. { "CamelCaseColumn" = 'this is a text value' }

包含嵌套的对象

  1. { nested_obj_colmn = { int_col = 1234, str_col = 'text value' } }

包含占位符的对象

  1. { my_other_column = ? }

组合示例:

  1. { id = 1, name = 'foo', tags = ['apple', 'banana', 'pear'], size = 3.1415, valid = ? }

数组

  1. []
  2. [null]
  3. [1, 2, 3, 4, 5, 6, 7, 8]
  4. ['Zaphod', 'Ford', 'Arthur']
  5. [?]
  6. ARRAY[true, false]
  7. ARRAY[column_a, column_b]
  8. ARRAY[ARRAY[1, 2, 1 + 2], ARRAY[3, 4, 3 + 4]]

类型转换

可以使用 cast 关键词完成类型之间的转换。

  1. cast(expression as type)
  2. expression::type
  1. cr> select cast(port['http'] as boolean) from sys.nodes limit 1;
  2. +-------------------------------+
  3. | CAST(port['http'] AS boolean) |
  4. +-------------------------------+
  5. | TRUE |
  6. +-------------------------------+
  7. SELECT 1 row in set (... sec)
  1. cr> select (2+10)/2::text;
  2. +------------------------------+
  3. | ((2 + 10) / CAST(2 AS text)) |
  4. +------------------------------+
  5. | 6 |
  6. +------------------------------+
  7. SELECT 1 row in set (... sec)

同时, 也支持将array转换为其他类型:

  1. cr> select cast([0,1,5] as array(boolean)) as
  2. ... active_threads from sys.nodes limit 1;
  3. +---------------------+
  4. | active_threads |
  5. +---------------------+
  6. | [false, true, true] |
  7. +---------------------+
  8. SELECT 1 row in set (... sec)

此外, 用户可以用 try_cast 关键词来进行类型转换, try_cast在转换失败时返回 null

  1. try_cast(expression as type)
  1. cr> select try_cast('true' as boolean) from sys.nodes limit 1;
  2. +-----------------------------+
  3. | TRY_CAST('true' AS boolean) |
  4. +-----------------------------+
  5. | TRUE |
  6. +-----------------------------+
  7. SELECT 1 row in set (... sec)
  1. cr> select try_cast(name as integer) from sys.nodes limit 1;
  2. +---------------------------+
  3. | TRY_CAST(name AS integer) |
  4. +---------------------------+
  5. | NULL |
  6. +---------------------------+
  7. SELECT 1 row in set (... sec)

类型别称

为了提供与PostgreSQL的类型兼容性, TSDB 2.0的类型也支持别名。

别名 真实类型
int2 smallint
short smallint
int integer
int4 integer
int8 bigint
long bigint
string text
name text
byte char
float real
double double precision
timestamp timestamp with time zone
timestamptz timestamp with time zone