Hologres从 V1.3版本开始提供表统计信息日志系统表(hologres.hg_table_info)按日收集实例内表的统计信息,帮助您对实例中的表信息进行查看、分析,以便您可以根据这些信息采取优化措施。本文将会介绍在Hologres中如何查看表统计信息并分析。
使用限制
仅Hologres V1.3及以上版本支持查看表统计信息,如果您的实例是V1.3以下版本,请您使用自助升级或加入Hologres钉钉交流群反馈,详情请参见如何获取更多的在线支持?。
hologres.hg_table_info表的产出时效是T+1,当天的数据大概会在第二天凌晨5点前更新完成。Hologres实例从 V1.1版本升级到 V1.3版本的当天不会产出表统计信息,查询时提示:
meta warehouse store currently not available
,需要在升级后的第二天查询表统计信息。
注意事项
表统计信息日志默认保留30天的数据。
对于Hologres非分区内部表(type='TABLE'),可以查到详细统计信息,如存储空间、文件数、访问累积次数,行记录数。
对于其他对象(视图,物化视图,外表,分区父表),只能查到基本信息,如分区数量,外表对应外部表名,物化视图与视图定义等。
hologres.hg_table_info表属于Hologres的元仓系统,hologres.hg_table_info查询失败不会影响实例中的业务Query运行,故hologres.hg_table_info表的稳定性不在产品的SLA保护范围内。
hg_table_info表
表统计信息日志存储在hologres.hg_table_info系统表里,实例升级到 V1.3版本后,将会默认按天采集表信息, hg_table_info
表主要包含的字段信息如下。
字段 | 类型 | 说明
--------------------+--------------------------+-----------------------------------------------------------------------
db_name | text | 表所在数据库名称。
schema_name | text | 表所在Schema名称。
table_name | text | 表名称。
table_id | text | 表的惟一标识,外表id使用db.schema.table。
type | text | 表类型(TABLE、FOREIGN TABLE、PARTITION、VIEW、MATERIALIZED VIEW)。
partition_spec | text | 分区条件(分区子表有效)。
is_partition | boolean | 是否是分区子表。
owner_name | text | 表Owner的用户名,可与hologres.hg_query_log的usename列做join。
create_time | timestamp with time zone | 表的创建时间。
last_ddl_time | timestamp with time zone | 最后一次更新表信息的时间。
last_modify_time | timestamp with time zone | 最后一次用户修改表中数据的时间。
last_access_time | timestamp with time zone | 表的最后访问时间。
view_def | text | 视图的定义(只对视图有效)。
comment | text | 表或视图的描述信息。
hot_storage_size | bigint | 表占用的热存空间,单位:Byte。
cold_storage_size | bigint | 表占用的冷存空间,单位:Byte。
hot_file_count | bigint | 表的热存文件数。
cold_file_count | bigint | 表的冷存文件数。
table_meta | jsonb | 原始的Meta信息,格式为JSON。
row_count | bigint | 表或者分区的行记录数。
collect_time | timestamp with time zone | 本次上报的数据采集时间。
partition_count | bigint | 分区子表数量(当前表为分区父表时有效)。
parent_schema_name | text | 分区子表的父表schema名(当前表为分区子表时有效)。
parent_table_name | text | 分区子表的父表表名(当前表为分区子表时有效)。
total_read_count | bigint | 累计读表次数(非精确,SELECT,INSERT,UPDATE,DELETE 均会导致次数增加)。
total_write_count | bigint | 累计写表次数(非精确,INSERT,UPDATE,DELETE 均会导致次数增加)。
存在部分字段值为空的情况,属于历史创建的表未能统计到创建信息导致,实例升级到V1.3版本之后创建的表可以统计到。
当
type
为VIEW
时,create_time
、last_ddl_time
字段为空。当
type
为VIEW、FOREIGN TABLE、PARTITION三者其中之一时,last_modify_time
、last_access_time
、hot_file_count
、cold_file_count
、total_read_count
、total_write_count
字段值为空(无记录)。分别使用
hg_table_info
和pg_relation_size
查询存储量大小存在差异属于正常情况。因为hg_table_info
信息按天上报,并且pg_relation_size
不包含Binlog存储大小。
授予查看权限
表统计信息日志需要有一定的权限才能查看,其权限规则和授权方式说明如下。
查看Hologres实例所有数据库的表统计信息日志。
授予用户Superuser权限。
Superuser账号可以查看Hologres实例所有数据库的表统计信息日志,给用户授予Superuser用户的权限,使用户有权限查看实例所有数据库的表统计信息日志。
--将“云账号ID”替换为实际用户名。如果是RAM用户,账号ID前需要添加“p4_”。 ALTER USER "云账号ID" SUPERUSER;
将用户添加到
pg_stat_scan_table
用户组。除Superuser外,Hologres还支持通过设置用户组
pg_stat_scan_tables
(V1.3.44以前版本)或pg_read_all_stats
(V1.3.44及以上版本)查看所有DB表统计信息日志,普通用户如果需要查看所有日志,可以联系Superuser授权加入该用户组。授权命令如下。-- V1.3.44以前版本 GRANT pg_stat_scan_tables TO "云账号ID";--专家权限模型授权 CALL spm_grant('pg_stat_scan_tables', '云账号ID'); -- SPM权限模型 CALL slpm_grant('pg_stat_scan_tables', '云账号ID'); -- SLPM权限模型 -- V1.3.44及以上版本 GRANT pg_read_all_stats TO "云账号ID";--专家权限模型授权 CALL spm_grant('pg_read_all_stats', '云账号ID'); -- SPM权限模型 CALL slpm_grant('pg_read_all_stats', '云账号ID'); -- SLPM权限模型
查看本数据库的表统计信息日志。
开启简单权限模型(SPM)或基于Schema级别的简单权限模型(SLPM),将用户加入
db_admin
用户组,db_admin
角色可以查看本数据库的表统计信息日志。说明普通用户只能查询当前账号对应数据库下自己为Owner的表统计信息日志。
CALL spm_grant('<db_name>_admin', '云账号ID'); -- SPM权限模型 CALL slpm_grant('<db_name>.admin', '云账号ID'); -- SLPM权限模型
查询表统计信息趋势的SQL命令
场景1:查询Hologres实例各表统计的日变化趋势
-- 实例全局所有表的近一周的趋势变化:占用存储空间、文件数、访问累积次数,行记录数
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
hot_storage_size,
cold_storage_size,
hot_file_count,
cold_file_count,
total_read_count,
total_write_count,
row_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
;
-- 对于分区父表,可以通过聚合函数做统计
SELECT
db_name,
parent_schema_name AS schema_name,
parent_table_name AS table_name,
collect_time :: date AS collect_date,
sum(hot_file_count) AS hot_file_count,
sum(cold_file_count) AS cold_file_count,
sum(hot_storage_size) AS hot_storage_size,
sum(cold_storage_size) AS cold_storage_size,
sum(total_read_count) AS total_read_count,
sum(total_write_count) AS total_write_count,
sum(row_count) AS row_count,
count(*) AS partition_count
FROM
hologres.hg_table_info
WHERE
is_partition
GROUP BY
collect_date,
db_name,
parent_schema_name,
parent_table_name
;
场景2:查询占用磁盘空间较大表的总访问情况
-- 查看占用磁盘空间最大的 (10) 个表的访问情况
SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
AND (
cold_storage_size is NOT NULL
OR hot_storage_size IS NOT NULL
)
ORDER BY
total_storage_size DESC
LIMIT
10;
场景3:查询占用磁盘空间较大表的近期总访问趋势
-- 查看昨天占用磁盘空间最大的 (10) 个表的近一周总访问趋势
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date - interval '1 day')::timestamptz -- 查询昨天
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- 昨天占用大的 10 个表
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
total_read_count,
total_write_count,
total_storage_size,
row_count,
total_storage_size,
total_read_count,
total_write_count,
row_count
FROM
tmp_table_info
) d -- 7 天数据日变化量
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
ORDER BY
t.total_storage_size desc,
d.collect_date desc,
d.db_name,
d.schema_name,
d.table_name;
场景4:查询占用磁盘空间较大表的近期日访问情况与数据量变化
-- 查看占用磁盘空间最大的 (10) 个表的近一周的日访问情况与数据量变化
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date - interval '1 day')::timestamptz -- 查询昨天
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- 昨天占用大的 10 个表
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS day_read_count,
first_value(total_write_count) OVER w - last_value(total_write_count) OVER w AS day_write_count,
first_value(row_count) OVER w - last_value(row_count) OVER w AS day_modify_count
FROM
tmp_table_info
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 1 FOLLOWING -- 取两天的差值
)
) d -- 7 天数据日变化量
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
ORDER BY
t.total_storage_size desc,
d.collect_date desc,
d.db_name,
d.schema_name,
d.table_name;
场景5:查询占用磁盘空间较大表近一周总的访问情况与数据量变化
-- 查看占用磁盘空间最大的 (10) 个表的近一周的访问情况与数据量变化
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date - interval '1 day')::timestamptz -- 查询昨天
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- 昨天占用大的 10 个表
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS week_read_count,
first_value(total_write_count) OVER w - last_value(total_write_count) OVER w AS week_write_count,
first_value(row_count) OVER w - last_value(row_count) OVER w AS week_modify_count
FROM
tmp_table_info
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 6 FOLLOWING -- 取一周的差值
)
) d -- 7 天数据日变化量
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
WHERE
d.collect_time > (current_date - interval '1 day')::timestamptz -- 查询昨天
and d.collect_time < current_date::timestamptz
ORDER BY
t.total_storage_size desc,
d.db_name,
d.schema_name,
d.table_name;
场景6:查询所有表中近一周较少访问表的磁盘占用
-- 查最近(7)天最少访问的(10)个表大小
SELECT
*
FROM
(
SELECT
collect_time,
schema_name,
table_name,
hot_storage_size + cold_storage_size as total_storage_size,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS recent_read_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- 近一周
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 6 FOLLOWING -- 近七天
)
) tmp
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
and collect_time < current_date::timestamptz -- 只保留截止昨天的七天总访问量
ORDER BY
recent_read_count
LIMIT
10;
场景7:查询表最近一次修改表数据那天的行数变化
-- 查看表最近一次修改时间,相对前一次修改时间 的总共修改数据量
-- 当实例表数量很大时,建议对 CTE tmp_table_info 做过滤,以免因拉数据太大而导致查询时间太久
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time,
last_modify_time
FROM
hologres.hg_table_info
WHERE
last_modify_time IS NOT NULL
AND type = 'TABLE'
-- 在这里对 tmp_table_info 做一些过滤
-- 如 collect_time > (current_date - interval '14 day'):: timestamptz
-- 如 table_name like ''
-- 如 type = 'PARTITION'
)
SELECT
end_data.db_name AS db_name,
end_data.schema_name AS schema_name,
end_data.table_name AS table_name,
(end_data.row_count - start_data.row_count) AS modify_row_count,
end_data.row_count AS current_rows,
end_data.last_modify_time AS last_modify_time
FROM
(
SELECT
db_name,
schema_name,
table_name,
row_count,
last_modify_time
FROM
tmp_table_info
WHERE
collect_time > (current_date - interval '1 day')::timestamptz -- 查询昨天记录的表的最后修改时间
) end_data
left join (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time
FROM
tmp_table_info
) start_data ON (
end_data.db_name = start_data.db_name
AND end_data.schema_name = start_data.schema_name
AND end_data.table_name = start_data.table_name
AND end_data.last_modify_time::date = (start_data.collect_time + interval '1 day')::date
)
;
场景8:查询小文件过多而导致占用磁盘空间大的表
-- 查看每个表的文件数和占用磁盘大小,并按平均文件大小排序
-- table group 只能显示当前 db 的 shard count,其它 db 的显示为空
SELECT
db_name,
schema_name,
table_name,
cold_storage_size + hot_storage_size as total_storage_size,
cold_file_count + hot_file_count as total_file_count,
(cold_storage_size + hot_storage_size) / (cold_file_count + hot_file_count) AS avg_file_size,
tmp_table_info.table_meta ->> 'table_group' as table_group,
tg_info.shard_count
FROM
hologres.hg_table_info tmp_table_info
left join (
SELECT
tablegroup_name,
property_value as shard_count
FROM
hologres.hg_table_group_properties
WHERE
property_key = 'shard_count'
) tg_info on tmp_table_info.table_meta ->> 'table_group' = tg_info.tablegroup_name
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
AND (
cold_file_count IS NOT NULL
OR hot_file_count IS NOT NULL
)
AND cold_file_count + hot_file_count <> 0
ORDER BY
avg_file_size
;
- 本页导读 (1)