8.7 常用函数使用指南

更新时间:2018-10-09 07:43:27

ADS支持丰富的函数,本文档通过使用示例来介绍ADS数据库中常用的SQL函数使用及函数功能,更多的函数情况请参阅ADS在线文档。

ADS包含丰富的SQL函数,部分函数需要在MPP计算引起下运行。需要在SQL开头增加:/+engine=MPP/

为了更好的说明函数的含义和功能,首先创建如下一个事实表t_fact_customers。

  1. CREATE TABLE t_fact_customers (
  2. customer_id bigint COMMENT '',
  3. customer_name varchar COMMENT '',
  4. sex int COMMENT '',
  5. age float COMMENT '',
  6. birth_day date COMMENT '',
  7. phone_number varchar COMMENT '',
  8. home_addr varchar COMMENT '',
  9. first_login timestamp COMMENT '',
  10. PRIMARY KEY (customer_id)
  11. )
  12. PARTITION BY HASH KEY (customer_id) PARTITION NUM 8
  13. TABLEGROUP ads_demo
  14. OPTIONS (UPDATETYPE='realtime')
  15. COMMENT ''

插入如下10行测试数据:

  1. INSERT INTO t_fact_customers
  2. (customer_id, customer_name, sex, age, birth_day, phone_number, home_addr, first_login)
  3. VALUES
  4. (1, '王小二', 0, 20.1, '1997-10-01', '11126411777', '北京市朝阳区望京xxxx', '2017-01-10 10:00:11'),
  5. (2, '张大山', 0, 21.4, '1998-02-22', '15126411778', '北京市海淀区中关村xxxx', '2016-02-13 12:08:13'),
  6. (3, '李春梅', 1, 32.6, '1985-11-06', '16126411548', '北京市西城区复兴门xxxx', '2014-11-22 23:04:45'),
  7. (4, '孙大帅', 0, 43.2, '1974-08-04', '17126415478', '北京市东城区王府井xxxx', '2015-09-28 02:03:26'),
  8. (5, '韩美美', 1, 23.8, '1994-12-18', '18126414478', '北京市海淀区中关村xxxx', '2012-08-14 09:18:23'),
  9. (6, '胡衣衣', 1, 54.9, '1963-05-29', '16126412378', '北京市海淀区定慧寺xxxx', '2014-02-09 16:28:42'),
  10. (7, '知文君', 0, 25.9, '1992-10-17', '12126414578', '北京市东城区东郊胡同xxxx', '2017-06-05 12:38:533'),
  11. (8, '段宗宝', 0, 46.1, '1971-07-04', '13126423778', '北京市海淀区杏石口xxxx', '2014-04-28 15:45:19'),
  12. (9, '石小英', 1, 22.3, '1995-04-22', '14126423778', '北京市西城区白云路xxxx', '2015-06-23 22:09:25'),
  13. (10, '林瑜', 0, 28.6, '1989-02-09', '15126423778', '北京市东城区SOHUxxxx', '2016-12-18 15:23:24')
  14. ;

1.比较函数

ANY

  1. /*+engine=MPP*/
  2. select
  3. customer_id, customer_name, birth_day
  4. from t_fact_customers
  5. where customer_id < ANY( VALUES 3,5)
  6. order by customer_id ASC;

返回结果:

  1. customer_id customer_name birth_day
  2. 1 王小二 1997-10-01
  3. 2 张大山 1998-02-22
  4. 3 李春梅 1985-11-06
  5. 4 孙大帅 1974-08-04

ALL

  1. /*+engine=MPP*/
  2. select
  3. customer_id, customer_name, birth_day
  4. from t_fact_customers
  5. where customer_id < ALL( VALUES 3,5)
  6. order by customer_id ASC;

返回结果:

  1. customer_id customer_name birth_day
  2. 1 王小二 1997-10-01
  3. 2 张大山 1998-02-22

SOME

  1. /*+engine=MPP*/
  2. select
  3. customer_id, customer_name, birth_day
  4. from t_fact_customers
  5. where customer_id < SOME( VALUES 3,5)
  6. order by customer_id ASC;

返回结果:

  1. customer_id customer_name birth_day
  2. 1 王小二 1997-10-01
  3. 2 张大山 1998-02-22
  4. 3 李春梅 1985-11-06
  5. 4 孙大帅 1974-08-04

case when

  1. select
  2. customer_id, customer_name, birth_day,
  3. case sex when 0 then '男' when 1 then '女' ELSE '空' end as sex
  4. from t_fact_customers
  5. where customer_id <=3
  6. order by customer_id ASC;

返回结果:

  1. customer_id customer_name birth_day sex
  2. 1 王小二 1997-10-01
  3. 2 张大山 1998-02-22
  4. 3 李春梅 1985-11-06

另外,ADS支持比较函数有:IF,IFNULL,NULLIF。

2.类型转换函数

ADS 会尝试隐式转换数值类型和字符类型值到正确的类型,但是不会在数值类型和字符类型之间自动转换,比如查询select * from t_fact_customers where phone_number>15126411778; 则返回错误信息:varchar和bigint类型转换错误,则需要写成phone_number>’15126411778’。可以显示的通过类型转换函数CAST(),TRY_CAST()来进行类型转换。可以通过TYPEOF()函数获得列的数据类型。

CAST

  1. select
  2. customer_id, birth_day,typeof(birth_day) as col_type1,
  3. '生日:'||CAST(birth_day as VARCHAR) as birth_day,
  4. phone_number, typeof(phone_number) as col_type2,
  5. cast(phone_number as BIGINT) as phone_number
  6. from t_fact_customers
  7. where customer_id <=3
  8. order by customer_id ASC;

返回结果:

  1. 1997-10-01 date 生日:1997-10-01 11126411777 varchar 11126411777
  2. 1998-02-22 date 生日:1998-02-22 15126411778 varchar 15126411778
  3. 1985-11-06 date 生日:1985-11-06 16126411548 varchar 16126411548

TRY_CAST

如尝试将phone_number转换为date数据类型,将提示错误。但有些情况下,满足能转换的值,不能转换的返回null,则可以使用TRY_CAST()函数。

  1. /*+engine=MPP*/
  2. select
  3. customer_id, try_cast('15126411778' as BIGINT),try_cast('abc' as BIGINT)
  4. from t_fact_customers
  5. where customer_id = 1

返回结果:

  1. 1 15126411778 NULL

3.数学函数

ADS支持丰富的数学函数,这里只通过使用示例说明几个函数的使用。

  1. select mod(14,3),random(),random(10),round(10.2293,2)
  2. from t_fact_customers
  3. where customer_id=1;

返回结果:

  1. 2 0.667940130299521 3 10.23

4.字符串函数

ADS字符串使用单引号: ‘string’ 。

字符串连接

可以通过 || 将2个字符串连接,也可以通过concat函数。

  1. select
  2. customer_id, customer_name,'name='||customer_name,concat('name=',customer_name)
  3. from t_fact_customers
  4. where customer_id <=3
  5. order by customer_id ASC;

返回结果:

  1. 1 王小二 name=王小二 name=王小二
  2. 2 张大山 name=张大山 name=张大山
  3. 3 李春梅 name=李春梅 name=李春梅

字符串处理函数

  1. SELECT LENGTH('AliADS'),LOWER('AliADS'),UPPER('AliADS'),
  2. lpad('Ali',6,'ADServer'),rpad('Ali',6,'ADSever'),
  3. LTRIM(' AliADS'),RTRIM('AliADS '),TRIM(' AliADS ')
  4. from t_fact_customers
  5. where customer_id =1;

返回结果:

  1. 6 aliads ALIADS ADSAli AliADS AliADS AliADS AliADS
  1. SELECT
  2. REPLACE('AliADS','AnalyticDB'),REPLACE('AliADS','ADS','AnalyticDB'),
  3. REVERSE('AliADS')
  4. from t_fact_customers
  5. where customer_id =1;

返回结果:

  1. AliADS AliAnalyticDB SDAilA

5.日期时间函数

系统当前时间

  1. select sysdate() from t_fact_customerswhere customer_id =1;

返回结果:

  1. 2017-11-15 15:18:24
  1. /*+engine=MPP*/
  2. select current_date,localtimestamp
  3. from t_fact_customers
  4. where customer_id =1;

返回结果:

  1. 2017-11-15 2017-11-15 15:22:02

时间条件过滤

first_login为timestamp类型,查询条件中直接使用’’传参。

  1. SELECT
  2. customer_id,first_login
  3. from t_fact_customers
  4. where first_login >'2016-01-10 10:00:11'
  5. order by first_login ;

返回结果:

  1. 2 2016-02-13 12:08:13
  2. 10 2016-12-18 15:23:24
  3. 1 2017-01-10 10:00:11
  4. 7 2017-06-05 12:46:53

时间格式转换函数

  1. SELECT
  2. DATE_FORMAT('2017-11-11 11:11:11', '%Y-%m-%d') ,
  3. DATE_FORMAT('2017-11-11 11:11:11', '%Y-%m-%d %H:%i:%s') ,
  4. DATE_FORMAT('2017-11-11 11:11:11', '%Y-%m-%d %T') ,
  5. DATE_FORMAT('2017-11-11 11:11:11', '%W %M %Y')
  6. from t_fact_customers
  7. where customer_id =1;

返回结果:

  1. 2017-10-11 2017-10-11 23:45:11 2017-10-11 11:11:11 Saturday November 2017

时间和字符串转换

  1. SELECT
  2. to_char(ADDDATE(curdate(),-6),'%Y-%m-%d'),
  3. to_char(curdate(),'%Y%m%d')
  4. from t_fact_customers
  5. WHERE customer_id=1

返回结果:

  1. 2017-11-11 20171117

时间计算函数

  1. SELECT
  2. first_login,
  3. ADDDATE(first_login,INTERVAL 2 HOUR),
  4. ADDDATE(first_login,INTERVAL 2 SECOND),
  5. ADDDATE('2017-01-12 10:00:11',INTERVAL 11222 SECOND)
  6. from t_fact_customers
  7. where customer_id <=3;

返回结果:

  1. 2017-01-10 10:00:11 2017-01-10 12:00:11 2017-01-10 10:00:13 2017-01-12 13:07:13
  2. 2016-02-13 12:08:13 2016-02-13 14:08:13 2016-02-13 12:08:15 2017-01-12 13:07:13
  3. 2014-11-22 23:04:45 2014-11-23 01:04:45 2014-11-22 23:04:47 2017-01-12 13:07:13

6.聚集函数

  1. select count(1),count(*),count(customer_id),count(case when customer_id>2 then 1 end),
  2. avg(customer_id),sum(customer_id),
  3. max(customer_id),min(customer_id)
  4. from t_fact_customers
  5. WHERE customer_id>1;

返回结果:

  1. 9 9 9 8 6 54 10 2

如果查询count(distinct X)情况,由于数据分布在不同节点,特别是参与计算的记录数查过一定数据量情况,需要MPP查询,除非按分区键做distinct,否则计算结果不准确。

  1. /*+engine=mpp*/
  2. select count(DISTINCT age)
  3. from t_fact_customers
  4. WHERE customer_id>=1;

返回结果:

  1. 10

如果按分区列进行group by 和 order by情况,可以使用非MPP查询。

  1. select customer_id,count(DISTINCT age)
  2. from t_fact_customers
  3. WHERE customer_id<=5
  4. group by customer_id;

返回结果:

  1. 1 1
  2. 2 1
  3. 3 1
  4. 4 1
  5. 5 1

7.窗口函数

窗口函数可以按不同的分组进行排序,再计算。

使用格式:function() over (partition by ... order by .. )

rank() over

返回数据集中每个值的排名。排名值是根据当前行之前的行数加1,不包含当前行。因此,排序的关联值可能产生顺序上的空隙。 这个排名会对每个窗口分区进行计算。

如下SQL查询满足条例的客户信息,按性别分组、按年龄从大到小排序,按年龄大小排名。

  1. /*+engine=mpp*/
  2. SELECT customer_id,customer_name,
  3. decode(sex,0,'男','女') sex,round(age) age,
  4. rank() OVER (PARTITION BY sex ORDER BY age DESC) AS rnk
  5. FROM t_fact_customers
  6. WHERE customer_id>=1 and customer_id<=10;

返回结果:

  1. 8 段宗宝 46 1
  2. 4 孙大帅 43 2
  3. 10 林瑜 29 3
  4. 7 知文君 26 4
  5. 2 张大山 21 5
  6. 1 王小二 20 6
  7. 6 胡衣衣 55 1
  8. 3 李春梅 33 2
  9. 5 韩美美 24 3
  10. 9 石小英 22 4

若输出结果,需要输出男、女中年龄排名前三的客户信息。

  1. /*+engine=mpp*/
  2. SELECT
  3. customer_id, customer_name,
  4. decode(sex,0,'男','女') sex,ROUND(age) age,
  5. rnk
  6. FROM
  7. (
  8. SELECT
  9. customer_id,customer_name,sex,age,
  10. rank() OVER( PARTITION BY sex ORDER BY age DESC) AS rnk
  11. FROM t_fact_customers
  12. WHERE customer_id >= 1 and customer_id <= 10
  13. ) as tmp_result
  14. WHERE rnk <= 3;

返回结果:

  1. 8 段宗宝 46 1
  2. 4 孙大帅 43 2
  3. 10 林瑜 29 3
  4. 6 胡衣衣 55 1
  5. 3 李春梅 33 2
  6. 5 韩美美 24 3

row_number() over

为每行数据返回一个唯一的顺序的行号,从1开始,根据行在窗口分区内的顺序。

  1. /*+engine=mpp*/
  2. SELECT customer_id,customer_name,
  3. decode(sex,0,'男','女') sex,ROUND(age) age,
  4. row_number() OVER (PARTITION BY sex ORDER BY age DESC) AS rnk
  5. FROM t_fact_customers
  6. WHERE customer_id>=1 and customer_id<=10;

返回结果:

  1. 8 段宗宝 46 1
  2. 4 孙大帅 43 2
  3. 10 林瑜 29 3
  4. 7 知文君 26 4
  5. 2 张大山 21 5
  6. 1 王小二 20 6
  7. 6 胡衣衣 55 1
  8. 3 李春梅 33 2
  9. 5 韩美美 24 3
  10. 9 石小英 22 4

若输出结果,需要输出男、女中年龄排名前三的客户信息。

  1. /*+engine=mpp*/
  2. SELECT
  3. customer_id, customer_name,
  4. decode(sex,0,'男','女') sex,ROUND(age) age,
  5. rnk
  6. FROM
  7. (
  8. SELECT
  9. customer_id,customer_name,sex,age,
  10. row_number() OVER( PARTITION BY sex ORDER BY age DESC) AS rnk
  11. FROM t_fact_customers
  12. WHERE customer_id >= 1 and customer_id <= 10
  13. ) as tmp_result
  14. WHERE rnk <= 3;

返回结果:

  1. 8 段宗宝 46 1
  2. 4 孙大帅 43 2
  3. 10 林瑜 29 3
  4. 6 胡衣衣 55 1
  5. 3 李春梅 33 2
  6. 5 韩美美 24 3