HybridDB for MySQL的分析引擎支持在SQL的全文检索语法。需要说明的是,HybridDB for MySQL的全文检索能力和SQL中的like是有区别的,这里的全文检索是指在SQL中需要分词的搜索需求,而SQL中的like是模糊匹配的需求,只需要字符串列建索引就可以。目前HybridDB for MySQL的全文检索性能可以达到亿级别数据毫秒级响应。

全文检索列,底层实现是多值列,只是分词器为nlp。

建表

CREATE TABLE test3 (
    user_id BIGINT,
    city VARCHAR,
    text MULTIVALUE nlp_tokenizer 'ik' value_type 'varchar'
)
DISTRIBUTE BY HASH (user_id)
ENGINE = 'CSTORE'
INDEX_ALL='Y';

说明
  • 全文检索列,底层实现是多值列,只是分词器类型为nlp_tokenizer
  • 默认的分词器为ElasticSearch的ik分词器

写入数据

insert into test3 values(1, 'HZ', '中华人民共和国');
insert into test3 values(2, 'BJ', 'HybridDB for MySQL是新型HTAP数据库');
insert into test3 values(3, 'SH', 'hello, world');


mysql> select * from test3 order by user_id;
+---------+------+------------------------------------------+
| user_id | city | text                                     |
+---------+------+------------------------------------------+
|       1 | HZ   | 中华人民共和国                           |
|       2 | BJ   | HybridDB for MySQL是新型HTAP数据库       |
|       3 | SH   | hello, world                             |
+---------+------+------------------------------------------+
3 rows in set (0.02 sec)

检索查询

mysql> select * from test3 where text in ('中华', '数据库');
+---------+------+------------------------------------------+
| user_id | city | text                                     |
+---------+------+------------------------------------------+
|       1 | HZ   | 中华人民共和国                           |
|       2 | BJ   | HybridDB for MySQL是新型HTAP数据库       |
+---------+------+------------------------------------------+
2 rows in set (0.30 sec)

mysql> select * from test3 where text in ('hello') and city != 'HZ';
+---------+------+--------------+
| user_id | city | text         |
+---------+------+--------------+
|       3 | SH   | hello, world |
+---------+------+--------------+
1 row in set (0.93 sec)


mysql> select * from test3 where ref(text,0) in ('hybriddb');
+---------+------+---------------------------------------------+
| user_id | city | text                                        |
+---------+------+---------------------------------------------+
|       2 | BJ   | HybridDB for MySQL是新型HTAP数据库          |
+---------+------+---------------------------------------------+
4 rows in set (0.02 sec)

# 普通字符串列也可以做like查找
mysql> select * from test3 where city like '%J%';
+---------+------+---------------------------------------------+
| user_id | city | text                                        |
+---------+------+---------------------------------------------+
|       2 | BJ   | HybridDB for MySQL是新型HTAP数据库          |
+---------+------+---------------------------------------------+
4 rows in set (0.02 sec)

注意

全文检索的英文单词,默认都是小写归一化处理。