本节为您介绍什么是代价以及代价的相关概念。

概述

物理优化是基于代价的查询优化,执行代价由IO代价和CPU代价组成。

  • IO代价的评估方式请参见
  • CPU代价的评估方式请参见

统计信息

  • 高频值

    表示常见值,例如在表t1中,a字段大小是1~100,其中1~10的值占据了95%,1-10的值就称为高频值。高频值用于等值查询,进行评估选择性。

    高频值
  • 直方图

    表示数据值的分布情况,例如在表t1中,a字段大小是1~100,可以分为4个桶,1~25的值有30个,26~50的值有20个,51~75的值有25个,76~100的值有25个。

    直方图
  • 相关系数

    表示某一列的物理顺序和逻辑顺序的相关性,相关性越高,走索引扫描离散块扫描代价越低。

    相关系数
  • 其他统计信息
    • 唯一值个数
    • Null值比率
    • 表的行数
    • 表的页面数

选择率

  • 无条件查询
    EXPLAIN SELECT * FROM tenk1;
    
                             QUERY PLAN
    -------------------------------------------------------------
     Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
    
    
     SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
    
     relpages | reltuples
       ----------+-----------
             358 |     10000
  • 范围查询
    EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
    
                                       QUERY PLAN
    --------------------------------------------------------------------------------
     Bitmap Heap Scan on tenk1  (cost=24.06..394.64 rows=1007 width=244)
       Recheck Cond: (unique1 < 1000)
       ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..23.80 rows=1007 width=0)
             Index Cond: (unique1 < 1000)
  • 范围查询计算公式
    SELECT histogram_bounds FROM pg_stats
    WHERE tablename='tenk1' AND attname='unique1';
    
                       histogram_bounds
    ------------------------------------------------------
     {0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}
    
    selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets
                = (1 + (1000 - 993)/(1997 - 993))/10
                = 0.100697
    
    rows = rel_cardinality * selectivity
         = 10000 * 0.100697
         = 1007  (rounding off)
  • 等值查询
    EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA';
    
                            QUERY PLAN
    ----------------------------------------------------------
     Seq Scan on tenk1  (cost=0.00..483.00 rows=30 width=244)
       Filter: (stringu1 = 'CRAAAA'::name)
  • 等值查询计算公式
    SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
    WHERE tablename='tenk1' AND attname='stringu1';
    
    null_frac         | 0
    n_distinct        | 676
    most_common_vals|{EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,JOAAAA,MCAAAA,NAAAAA,WGAAAA}
    most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003}
    
    selectivity = mcf[3]  = 0.003
    
    rows = 10000 * 0.003 = 30
    
    ## 备注:如果值不在most_common_vals里面,计算公式为selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)