基于PostgreSQL数据库的订单列表查询优化探索

订单列表查询优化方案(探索)

为了探索一个可以支撑订单列表查询快速响应的可行方案,我们分析了业务模型,发现业务将存在以下特点:

  1. 一张有效订单需要在72小时之内完成。
  2. 有效的订单会持续不断的进行处理。
  3. 业务人员更加关心未完成订单。
  4. 双11会在10月份之后就开始进行预售。

我们创建1000万条订单记录和1000多万条(1 : 1.3)订单明细记录来模拟数据库的压力。
示例项目地址:https://github.com/dengbin19910916/pg-test.git

欢迎大家提供更加优秀的解决方案。

名词解释:

trade_order:订单表

order_item:订单明细表

code_dict:代码值字典

1.1. 订单总量较大的问题

我们可以通过分区表来降低数据索引大小来加快数据的查询速度,先根据订单渠道分区,再根据订单的下单时间进行分区,
这样每一个分区表的数据将被控制在合理的范围内,以下分区仅供测试使用,分区请根据实际情况进行合理的调整,

-- 订单表
create table trade_order
(
    id                     bigserial     not null,
    channel                varchar(4)    not null,
    out_id                 varchar(100)  not null,
    shop_code              varchar(50)   not null,
    shop_name              varchar(200)  not null,
    paid_time              timestamp     null,
    buyer_name             varchar(200)  not null,
    buyer_memo             varchar(2000) null,
    receiver_name          varchar(200)  not null,
    receiver_mobile        varchar(20)   not null,
    receiver_country_code  varchar(50)   null,
    receiver_city_code     varchar(50)   null,
    receiver_province_code varchar(50)   null,
    receiver_area_code     varchar(50)   null,
    receiver_address       varchar(50)   null,
    receiver_zip           varchar(50)   null,
    status                 varchar(4)    not null,
    created_time           timestamp     not null,
    updated_time           timestamp     not null,
    constraint trade_order_pk primary key (id, channel, created_time) -- 分区表的分区键必须包含在主键中
) partition by list (channel);

create table trade_order_tm partition of trade_order for values in (1) partition by range (created_time);
create table trade_order_tm_201901 partition of trade_order_tm for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_tm_201902 partition of trade_order_tm for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_tm_201903 partition of trade_order_tm for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_tm_201904 partition of trade_order_tm for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_tm_201905 partition of trade_order_tm for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_tm_201906 partition of trade_order_tm for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_tm_201907 partition of trade_order_tm for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_tm_201908 partition of trade_order_tm for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_tm_201909 partition of trade_order_tm for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_tm_201910 partition of trade_order_tm for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_tm_201911 partition of trade_order_tm for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_tm_201912 partition of trade_order_tm for values from ('2019-12-01') to ('2020-01-01');

create table trade_order_jd_vip partition of trade_order for values in (2, 3) partition by range (created_time);
create table trade_order_jd_vip_201901 partition of trade_order_jd_vip for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_jd_vip_201902 partition of trade_order_jd_vip for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_jd_vip_201903 partition of trade_order_jd_vip for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_jd_vip_201904 partition of trade_order_jd_vip for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_jd_vip_201905 partition of trade_order_jd_vip for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_jd_vip_201906 partition of trade_order_jd_vip for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_jd_vip_201907 partition of trade_order_jd_vip for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_jd_vip_201908 partition of trade_order_jd_vip for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_jd_vip_201909 partition of trade_order_jd_vip for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_jd_vip_201910 partition of trade_order_jd_vip for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_jd_vip_201911 partition of trade_order_jd_vip for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_jd_vip_201912 partition of trade_order_jd_vip for values from ('2019-12-01') to ('2020-01-01');

create table trade_order_gw partition of trade_order for values in (4) partition by range (created_time);
create table trade_order_gw_201901 partition of trade_order_gw for values from ('2019-01-01') to ('2019-02-01');
create table trade_order_gw_201902 partition of trade_order_gw for values from ('2019-02-01') to ('2019-03-01');
create table trade_order_gw_201903 partition of trade_order_gw for values from ('2019-03-01') to ('2019-04-01');
create table trade_order_gw_201904 partition of trade_order_gw for values from ('2019-04-01') to ('2019-05-01');
create table trade_order_gw_201905 partition of trade_order_gw for values from ('2019-05-01') to ('2019-06-01');
create table trade_order_gw_201906 partition of trade_order_gw for values from ('2019-06-01') to ('2019-07-01');
create table trade_order_gw_201907 partition of trade_order_gw for values from ('2019-07-01') to ('2019-08-01');
create table trade_order_gw_201908 partition of trade_order_gw for values from ('2019-08-01') to ('2019-09-01');
create table trade_order_gw_201909 partition of trade_order_gw for values from ('2019-09-01') to ('2019-10-01');
create table trade_order_gw_201910 partition of trade_order_gw for values from ('2019-10-01') to ('2019-11-01');
create table trade_order_gw_201911 partition of trade_order_gw for values from ('2019-11-01') to ('2019-12-01');
create table trade_order_gw_201912 partition of trade_order_gw for values from ('2019-12-01') to ('2020-01-01');



comment on table trade_order is '订单表';

comment on column trade_order.id is '订单ID';
comment on column trade_order.channel is '渠道';
comment on column trade_order.out_id is '外部订单ID';
comment on column trade_order.shop_code is '店铺编码';
comment on column trade_order.shop_name is '店铺名称';
comment on column trade_order.paid_time is '支付时间';
comment on column trade_order.buyer_name is '买家昵称';
comment on column trade_order.buyer_memo is '买家备注';
comment on column trade_order.receiver_name is '收货人昵称';
comment on column trade_order.receiver_mobile is '收货人手机号';
comment on column trade_order.receiver_country_code is '收货人国家编码';
comment on column trade_order.receiver_province_code is '收货人省份编码';
comment on column trade_order.receiver_city_code is '收货人城市编码';
comment on column trade_order.receiver_area_code is '收货人地区编码';
comment on column trade_order.receiver_address is '收货人详细地址';
comment on column trade_order.receiver_zip is '收货人邮政编码';
comment on column trade_order.status is '订单状态';
comment on column trade_order.created_time is '创建时间';
comment on column trade_order.updated_time is '最后时间';


-- 订单明细表
create table order_item
(
    id                 bigserial      not null,
    order_id           bigint         not null,
    order_channel      varchar(4)     not null,
    serial             int            not null,
    sku_id             varchar(50)    not null,
    name               varchar(200)   not null,
    quantity           int            not null,
    tag_price          decimal(16, 2) not null,
    purchase_price     decimal(16, 2) not null,
    given              boolean        not null default false,
    order_created_time timestamp      not null,
    created_time       timestamp      not null,
    updated_time       timestamp      not null,
    constraint order_item_pk
        primary key (id, order_channel, order_created_time)
) partition by list (order_channel);

create table order_item_tm partition of order_item for values in (1) partition by range (order_created_time);
create table order_item_tm_201901 partition of order_item_tm for values from ('2019-01-01') to ('2019-02-01');
create table order_item_tm_201902 partition of order_item_tm for values from ('2019-02-01') to ('2019-03-01');
create table order_item_tm_201903 partition of order_item_tm for values from ('2019-03-01') to ('2019-04-01');
create table order_item_tm_201904 partition of order_item_tm for values from ('2019-04-01') to ('2019-05-01');
create table order_item_tm_201905 partition of order_item_tm for values from ('2019-05-01') to ('2019-06-01');
create table order_item_tm_201906 partition of order_item_tm for values from ('2019-06-01') to ('2019-07-01');
create table order_item_tm_201907 partition of order_item_tm for values from ('2019-07-01') to ('2019-08-01');
create table order_item_tm_201908 partition of order_item_tm for values from ('2019-08-01') to ('2019-09-01');
create table order_item_tm_201909 partition of order_item_tm for values from ('2019-09-01') to ('2019-10-01');
create table order_item_tm_201910 partition of order_item_tm for values from ('2019-10-01') to ('2019-11-01');
create table order_item_tm_201911 partition of order_item_tm for values from ('2019-11-01') to ('2019-12-01');
create table order_item_tm_201912 partition of order_item_tm for values from ('2019-12-01') to ('2020-01-01');

create table order_item_jd_vip partition of order_item for values in (2, 3) partition by range (order_created_time);
create table order_item_jd_vip_201901 partition of order_item_jd_vip for values from ('2019-01-01') to ('2019-02-01');
create table order_item_jd_vip_201902 partition of order_item_jd_vip for values from ('2019-02-01') to ('2019-03-01');
create table order_item_jd_vip_201903 partition of order_item_jd_vip for values from ('2019-03-01') to ('2019-04-01');
create table order_item_jd_vip_201904 partition of order_item_jd_vip for values from ('2019-04-01') to ('2019-05-01');
create table order_item_jd_vip_201905 partition of order_item_jd_vip for values from ('2019-05-01') to ('2019-06-01');
create table order_item_jd_vip_201906 partition of order_item_jd_vip for values from ('2019-06-01') to ('2019-07-01');
create table order_item_jd_vip_201907 partition of order_item_jd_vip for values from ('2019-07-01') to ('2019-08-01');
create table order_item_jd_vip_201908 partition of order_item_jd_vip for values from ('2019-08-01') to ('2019-09-01');
create table order_item_jd_vip_201909 partition of order_item_jd_vip for values from ('2019-09-01') to ('2019-10-01');
create table order_item_jd_vip_201910 partition of order_item_jd_vip for values from ('2019-10-01') to ('2019-11-01');
create table order_item_jd_vip_201911 partition of order_item_jd_vip for values from ('2019-11-01') to ('2019-12-01');
create table order_item_jd_vip_201912 partition of order_item_jd_vip for values from ('2019-12-01') to ('2020-01-01');

create table order_item_gw partition of order_item for values in (4) partition by range (order_created_time);
create table order_item_gw_201901 partition of order_item_gw for values from ('2019-01-01') to ('2019-02-01');
create table order_item_gw_201902 partition of order_item_gw for values from ('2019-02-01') to ('2019-03-01');
create table order_item_gw_201903 partition of order_item_gw for values from ('2019-03-01') to ('2019-04-01');
create table order_item_gw_201904 partition of order_item_gw for values from ('2019-04-01') to ('2019-05-01');
create table order_item_gw_201905 partition of order_item_gw for values from ('2019-05-01') to ('2019-06-01');
create table order_item_gw_201906 partition of order_item_gw for values from ('2019-06-01') to ('2019-07-01');
create table order_item_gw_201907 partition of order_item_gw for values from ('2019-07-01') to ('2019-08-01');
create table order_item_gw_201908 partition of order_item_gw for values from ('2019-08-01') to ('2019-09-01');
create table order_item_gw_201909 partition of order_item_gw for values from ('2019-09-01') to ('2019-10-01');
create table order_item_gw_201910 partition of order_item_gw for values from ('2019-10-01') to ('2019-11-01');
create table order_item_gw_201911 partition of order_item_gw for values from ('2019-11-01') to ('2019-12-01');
create table order_item_gw_201912 partition of order_item_gw for values from ('2019-12-01') to ('2020-01-01');

comment on table order_item is '订单明细表';

comment on column order_item.id is '订单明细ID';
comment on column order_item.order_id is '订单ID';
comment on column order_item.order_channel is '订单渠道';
comment on column order_item.sku_id is 'SKU ID';
comment on column order_item.name is '商品名称';
comment on column order_item.quantity is 'SKU数量';
comment on column order_item.tag_price is '商品吊牌价';
comment on column order_item.purchase_price is '商品实际购买价';
comment on column order_item.given is '是否赠送';
comment on column order_item.order_created_time is '订单创建时间';
comment on column order_item.created_time is '创建时间';
comment on column order_item.updated_time is '最后更新时间';

通过以上方式,我们创建了一个分区订单表和分区订单明细表,先按照订单渠道进行第一次分区,在根据下单时间按照每月的方式再次进行分区,
但是对于双11的特殊跨月访问情况,这种分区会导致通过查询条件 created_time between '2019-08-15' and '2019-09-11' 进行查询
的时候

1.2. 有效的未完成的订单数据量问题

应为处理订单的处理期限为72小时,并且订单会源源不断的被处理,故系统的非已完成的订单量不会过于庞大,但是订单中心的订单总量会较大。
以下是一个查询所有未完成的订单。

-- 记录数 258421
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'   -- 天猫渠道
  and a.created_time between '2019-08-15' and '2019-09-11'
  and exists(select 1
             from order_item b
             where b.order_id = a.id
               and b.order_channel = '1'
               and b.order_created_time between '2019-08-15' and '2019-09-11'
               and b.given = false
    )
  and a.status <> '4';  -- 非已完成
Aggregate  (cost=55928.48..55928.49 rows=1 width=8) (actual time=4229.916..4229.916 rows=1 loops=1)
  Buffers: shared hit=3733186
  ->  Nested Loop  (cost=55370.28..55569.78 rows=143480 width=0) (actual time=918.037..4201.476 rows=258421 loops=1)
        Buffers: shared hit=3733186
        ->  HashAggregate  (cost=55369.85..55371.85 rows=200 width=8) (actual time=917.986..1064.494 rows=458923 loops=1)
              Group Key: b.order_id
              Buffers: shared hit=513961
              ->  Append  (cost=0.42..53531.74 rows=735244 width=8) (actual time=0.029..676.242 rows=737540 loops=1)
                    Buffers: shared hit=513961
                    ->  Index Scan using order_item_tm_201908_order_created_time_idx on order_item_tm_201908 b  (cost=0.42..27599.23 rows=459737 width=8) (actual time=0.028..393.246 rows=459178 loops=1)
                          Index Cond: ((order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
                          Filter: ((NOT given) AND ((order_channel)::text = '1'::text))
                          Rows Removed by Filter: 68553
                          Buffers: shared hit=319756
                    ->  Index Scan using order_item_tm_201909_order_created_time_idx on order_item_tm_201909 b_1  (cost=0.42..22256.30 rows=275507 width=8) (actual time=0.031..223.477 rows=278362 loops=1)
                          Index Cond: ((order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
                          Filter: ((NOT given) AND ((order_channel)::text = '1'::text))
                          Rows Removed by Filter: 42161
                          Buffers: shared hit=194205
        ->  Append  (cost=0.42..0.97 rows=2 width=8) (actual time=0.006..0.007 rows=1 loops=458923)
              Buffers: shared hit=3219225
              ->  Index Scan using trade_order_tm_201908_pkey on trade_order_tm_201908 a  (cost=0.42..0.48 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=458923)
                    Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text) AND (created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
                    Filter: ((status)::text <> '4'::text)
                    Rows Removed by Filter: 0
                    Buffers: shared hit=1666012
              ->  Index Scan using trade_order_tm_201909_pkey on trade_order_tm_201909 a_1  (cost=0.42..0.48 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=458923)
                    Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text) AND (created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone))
                    Filter: ((status)::text <> '4'::text)
                    Rows Removed by Filter: 0
                    Buffers: shared hit=1553213
Planning Time: 4.085 ms
Execution Time: 4233.171 ms

因为我们使用的下单时间为 ['2019-08-15', '2019-09-11'],它跨越了2个时间区间,我们发现查询已经正确的使用了分区表 trade_order_tm_201908,trade_order_tm_201909 和 order_item_tm_201908,order_item_tm_201909 ,但是对于25万的数据量进行嵌套循环扫描,数据库还是消耗了4秒钟的时间,虽然我们可以通过修改数据库的参数(只在会话中生效)强制使用
并行查询进行加速,但是效果也不是很明显。

-- 表最小大小启用并行
set min_parallel_table_scan_size = 0;
-- 索引最小大小启用并行
set min_parallel_index_scan_size = 0;
-- 并行扫描记录的花费
set parallel_tuple_cost = 0;
-- 启动一个并行进程的花费
set parallel_setup_cost = 0;
-- 指定表的并行度(并行度不要超过物理核心数)
alter table trade_order_tm_201908 set (parallel_workers = 4);
alter table trade_order_tm_201909 set (parallel_workers = 4);
alter table order_item_tm_201908 set (parallel_workers = 4);
alter table order_item_tm_201909 set (parallel_workers = 4);

对于跨越时间区间的问题,我们做一些分区时间上的小调整,如下:

create table trade_order_tm_201901 partition of trade_order_tm for values from ('2019-01-01') to ('2019-04-01');
create table trade_order_tm_201904 partition of trade_order_tm for values from ('2019-04-01') to ('2019-07-01');
create table trade_order_tm_201907 partition of trade_order_tm for values from ('2019-07-01') to ('2019-10-01');
create table trade_order_tm_201910 partition of trade_order_tm for values from ('2019-10-01') to ('2020-01-01');
-- 。。。

实际可以根据数据情况进行更为合理的时间分区,对于以上情况并没有进行实际测试,大家可以进行补充。对于跨越时间区间的问题,我们发现有一个有意思的现象,
我们可以通过以下SQL来干扰查询优化器的成本计算,让优化器启动并行加速。

explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'   -- 天猫渠道
  and a.created_time between '2019-09-01' and '2019-09-11'
  and exists(select 1
             from order_item b
             where b.order_id = a.id
               and b.order_channel = '1'
               and b.order_created_time between '2019-08-15' and '2019-09-11'
               and b.given = false
    )
  and a.status <> '4';  -- 非已完成
Finalize Aggregate  (cost=50944.40..50944.41 rows=1 width=8) (actual time=209.440..209.441 rows=1 loops=1)
  Buffers: shared hit=771773
  ->  Gather  (cost=50944.09..50944.40 rows=3 width=8) (actual time=209.310..236.256 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=771773
        ->  Partial Aggregate  (cost=49944.09..49944.10 rows=1 width=8) (actual time=204.051..204.051 rows=1 loops=4)
              Buffers: shared hit=771773
              ->  Nested Loop Semi Join  (cost=0.42..49900.92 rows=17266 width=0) (actual time=0.125..201.552 rows=24395 loops=4)
                    Buffers: shared hit=771773
                    ->  Parallel Append  (cost=0.00..19269.75 rows=34533 width=8) (actual time=0.022..56.565 rows=26988 loops=4)
                          Buffers: shared hit=15394
                          ->  Parallel Seq Scan on trade_order_tm_201909 a  (cost=0.00..19097.08 rows=34533 width=8) (actual time=0.021..54.215 rows=26988 loops=4)
                                Filter: ((created_time >= '2019-09-01 00:00:00'::timestamp without time zone) AND (created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((status)::text <> '4'::text) AND ((channel)::text = '1'::text))
                                Rows Removed by Filter: 116507
                                Buffers: shared hit=15394
                    ->  Append  (cost=0.42..1.33 rows=2 width=8) (actual time=0.005..0.005 rows=1 loops=107950)
                          Buffers: shared hit=756379
                          ->  Index Scan using order_item_tm_201908_order_id_idx on order_item_tm_201908 b  (cost=0.42..0.66 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=107950)
                                Index Cond: (order_id = a.id)
                                Filter: ((NOT given) AND (order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text))
                                Buffers: shared hit=324171
                          ->  Index Scan using order_item_tm_201909_order_id_idx on order_item_tm_201909 b_1  (cost=0.42..0.66 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=107950)
                                Index Cond: (order_id = a.id)
                                Filter: ((NOT given) AND (order_created_time >= '2019-08-15 00:00:00'::timestamp without time zone) AND (order_created_time <= '2019-09-11 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text))
                                Rows Removed by Filter: 0
                                Buffers: shared hit=432208
Planning Time: 3.050 ms
Execution Time: 236.355 ms

trade_order的下单时间区间为 ['2019-09-01', '2019-09-11'] 但明细表的订单下单时间区间为 ['2019-08-15', '2019-09-11'],这时候我们发现
查询的代价为56920.58,它选择了使用3个并行来加速嵌套循环半连接(Nested Loop Semi Join),这时候的查询好水降为250毫秒左右,对于时间区间完全匹配
的情况并不会发生这种情况,所以我们应该慎用这种方式,它可能存在一定的不确定性。

现在我们还只是将订单按平台和下单时间进行分拆来降低每一个分区表的数据量大小,通过这种方式来加强各个分区的单独的处理能。

但这种优化真的合理吗?我们真的需要这种复杂的分区方案吗?

我们通过更深层次的思考后,发现数据库中的大量订单是属于已完成的订单,这种数据不会再进行任何的修改操作,也就是说这种数据从未完成的热数据变成了已完成的冷数据,
对于冷数据我们在业务操作中并不关心,还有我们的订单会源源不断的进入订单中心并处理完成,也就是说处于未完成的订单不会太多,所以我们自然而然的想到了是否可以在
建立索引的时候指定条件来过滤掉我们不需要的数据,故我们进行了如下修改:

-- 订单表
create table trade_order
(
    id                     bigserial     not null,
    channel                varchar(4)    not null,
    out_id                 varchar(100)  not null,
    shop_code              varchar(50)   not null,
    shop_name              varchar(200)  not null,
    paid_time              timestamp     null,
    buyer_name             varchar(200)  not null,
    buyer_memo             varchar(2000) null,
    receiver_name          varchar(200)  not null,
    receiver_mobile        varchar(20)   not null,
    receiver_country_code  varchar(50)   null,
    receiver_city_code     varchar(50)   null,
    receiver_province_code varchar(50)   null,
    receiver_area_code     varchar(50)   null,
    receiver_address       varchar(50)   null,
    receiver_zip           varchar(50)   null,
    status                 varchar(4)    not null,
    created_time           timestamp     not null,
    updated_time           timestamp     not null,
    constraint trade_order_pk primary key (id, channel) -- 分区表的分区键必须包含在主键中
) partition by list (channel);

create table trade_order_tm partition of trade_order for values in ('1');
create table trade_order_jd_vip partition of trade_order for values in ('2', '3', '4');

-- 订单明细表
create table order_item
(
    id                 bigserial      not null,
    order_id           bigint         not null,
    order_channel      varchar(4)     not null,
    serial             int            not null,
    sku_id             varchar(50)    not null,
    name               varchar(200)   not null,
    quantity           int            not null,
    tag_price          decimal(16, 2) not null,
    purchase_price     decimal(16, 2) not null,
    given              boolean        not null default false,
    order_created_time timestamp      not null,
    order_status       varchar(4)     not null,
    created_time       timestamp      not null,
    updated_time       timestamp      not null,
    constraint order_item_pk
        primary key (id, order_channel)
) partition by list (order_channel);

create table order_item_tm partition of order_item for values in ('1');
create table order_item_jd_vip partition of order_item for values in ('2', '3', '4');


-- 创建条件索引
create index trade_order_created_time_index on trade_order (created_time) where status <> '4';
create index trade_order_status_index on trade_order using btree (status) where status <> '4';
create index trade_order_receiver_name_index on trade_order using btree (receiver_name) where status <> '4';
create index trade_order_buyer_memo_index on trade_order using gin (buyer_memo gin_trgm_ops) where status <> '4';

create index order_item_order_id on order_item using btree (order_id) where order_status <> '4';
create index order_item_order_created_time_index on order_item (order_created_time) where order_status <> '4';
create index order_item_order_status_index on order_item (order_status) where order_status <> '4';
create index order_item_given_index on order_item (given) where order_status <> '4';

这一次我们建立了时间更加集中的测试数据来模拟双11将有大量订单被同步至订单中心的情景。

-- 记录数 1200175
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
  and a.status = '1'
  and a.created_time >= '2019-09-21 00:00:00'
Finalize Aggregate  (cost=56711.26..56711.27 rows=1 width=8) (actual time=278.878..278.879 rows=1 loops=1)
  Buffers: shared hit=1133803
  ->  Gather  (cost=56710.83..56711.25 rows=4 width=8) (actual time=278.586..314.005 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        Buffers: shared hit=1133803
        ->  Parallel Append  (cost=55710.83..55710.85 rows=1 width=8) (actual time=273.406..273.406 rows=1 loops=5)
              Buffers: shared hit=1133803
              ->  Partial Aggregate  (cost=55710.83..55710.84 rows=1 width=8) (actual time=273.405..273.405 rows=1 loops=5)
                    Buffers: shared hit=1133803
                    ->  Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a  (cost=0.43..55335.03 rows=150322 width=0) (actual time=0.048..261.075 rows=119865 loops=5)
                          Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
                          Filter: (((channel)::text = '1'::text) AND ((status)::text = '1'::text))
                          Rows Removed by Filter: 150056
                          Buffers: shared hit=1133803
Planning Time: 0.333 ms
Execution Time: 314.079 ms

通过以上SQL我们查询了所有未确认的订单总记录数,共599325条记录,耗时314毫秒,已经可以很好的满足业务的需求了。
接下来我们对有赠品的订单进行查询,SQL如下:

explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
  and a.status = '1'
  and a.created_time >= '2019-09-21 00:00:00'
  and exists(select 1
             from order_item b
             where b.order_id = a.id
               and b.order_channel = '1'
               and b.order_status = '1'
               and b.order_created_time >= '2019-09-21 00:00:00'
               and b.given = true
    );
Aggregate  (cost=43894.67..43894.68 rows=1 width=8) (actual time=678.725..678.725 rows=1 loops=1)
  Buffers: shared hit=445323
  ->  Nested Loop  (cost=42853.74..43143.06 rows=300644 width=0) (actual time=192.187..670.102 rows=92873 loops=1)
        Buffers: shared hit=445323
        ->  HashAggregate  (cost=42853.31..42855.31 rows=200 width=8) (actual time=192.154..219.396 rows=92873 loops=1)
              Group Key: b.order_id
              Buffers: shared hit=73137
              ->  Append  (cost=0.43..42610.78 rows=97013 width=8) (actual time=0.032..161.060 rows=97692 loops=1)
                    Buffers: shared hit=73137
                    ->  Index Scan using order_item_tm_given_idx on order_item_tm b  (cost=0.43..42125.71 rows=97013 width=8) (actual time=0.031..150.090 rows=97692 loops=1)
                          Index Cond: (given = true)
                          Filter: (given AND (order_created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((order_channel)::text = '1'::text) AND ((order_status)::text = '1'::text))
                          Rows Removed by Filter: 122399
                          Buffers: shared hit=73137
        ->  Append  (cost=0.43..1.43 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=92873)
              Buffers: shared hit=372186
              ->  Index Scan using trade_order_tm_pkey on trade_order_tm a  (cost=0.43..1.42 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=92873)
                    Index Cond: ((id = b.order_id) AND ((channel)::text = '1'::text))
                    Filter: ((created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((status)::text = '1'::text))
                    Buffers: shared hit=372186
Planning Time: 0.644 ms
Execution Time: 678.803 ms

发现此次查询并没有触发数据库的并行查询,并且查询中包含了一个嵌套循环,消耗了400+毫秒的执行时间,对于这种情况,我们通过在订单表中加入includes_gift(订单中是否包含赠品)字段来避免嵌套循环扫描。

-- 记录数 92695
explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
  and a.status = '1'
  and a.created_time >= '2019-09-21 00:00:00'
  and a.includes_gift = true
;
Finalize Aggregate  (cost=56113.92..56113.93 rows=1 width=8) (actual time=267.326..267.327 rows=1 loops=1)
  Buffers: shared hit=1127219
  ->  Gather  (cost=56113.49..56113.91 rows=4 width=8) (actual time=267.154..303.035 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        Buffers: shared hit=1127219
        ->  Parallel Append  (cost=55113.49..55113.51 rows=1 width=8) (actual time=262.160..262.161 rows=1 loops=5)
              Buffers: shared hit=1127219
              ->  Partial Aggregate  (cost=55113.49..55113.50 rows=1 width=8) (actual time=262.156..262.157 rows=1 loops=5)
                    Buffers: shared hit=1127219
                    ->  Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a  (cost=0.43..55057.47 rows=22408 width=0) (actual time=0.089..259.940 rows=18539 loops=5)
                          Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
                          Filter: (includes_gift AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
                          Rows Removed by Filter: 251265
                          Buffers: shared hit=1127219
Planning Time: 0.339 ms
Execution Time: 303.110 ms

同时我们有些有些订单是有买家留言的,对于买家留言我们希望进行中缀模糊匹配(前后都有%),这时候我们需要使用一种特殊的索引-Gin索引。

create index trade_order_buyer_memo_index on trade_order using gin (buyer_memo gin_trgm_ops) where status <> '4';

explain ( analyze true , costs , buffers , timing , verbose false )
select count(1)
from trade_order a
where a.channel = '1'
  and a.status = '1'
  and a.created_time >= '2019-09-21 00:00:00'
  and a.includes_gift = true
  and buyer_memo like '%广东省%';
-- 无索引
Finalize Aggregate  (cost=56895.98..56895.99 rows=1 width=8) (actual time=248.714..248.715 rows=1 loops=1)
  Buffers: shared hit=1127219
  ->  Gather  (cost=56895.55..56895.97 rows=4 width=8) (actual time=248.700..282.707 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        Buffers: shared hit=1127219
        ->  Parallel Append  (cost=55895.55..55895.57 rows=1 width=8) (actual time=243.180..243.181 rows=1 loops=5)
              Buffers: shared hit=1127219
              ->  Partial Aggregate  (cost=55895.55..55895.56 rows=1 width=8) (actual time=243.178..243.179 rows=1 loops=5)
                    Buffers: shared hit=1127219
                    ->  Parallel Index Scan using trade_order_tm_created_time_idx on trade_order_tm a  (cost=0.43..55895.25 rows=121 width=0) (actual time=1.101..243.130 rows=87 loops=5)
                          Index Cond: (created_time >= '2019-09-21 00:00:00'::timestamp without time zone)
                          Filter: (includes_gift AND ((buyer_memo)::text ~~ '%广东省%'::text) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
                          Rows Removed by Filter: 269717
                          Buffers: shared hit=1127219
Planning Time: 0.534 ms
Execution Time: 282.775 ms

-- 有索引
Finalize Aggregate  (cost=8217.88..8217.89 rows=1 width=8) (actual time=14.852..14.852 rows=1 loops=1)
  Buffers: shared hit=6979
  ->  Append  (cost=8217.87..8217.88 rows=1 width=8) (actual time=14.846..14.847 rows=1 loops=1)
        Buffers: shared hit=6979
        ->  Partial Aggregate  (cost=8217.87..8217.88 rows=1 width=8) (actual time=14.846..14.846 rows=1 loops=1)
              Buffers: shared hit=6979
              ->  Bitmap Heap Scan on trade_order_tm a  (cost=96.30..8216.59 rows=509 width=0) (actual time=2.880..14.792 rows=493 loops=1)
                    Recheck Cond: (((buyer_memo)::text ~~ '%广东省%'::text) AND ((status)::text <> '4'::text))
                    Filter: (includes_gift AND (created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
                    Rows Removed by Filter: 6632
                    Heap Blocks: exact=6918
                    Buffers: shared hit=6979
                    ->  Bitmap Index Scan on trade_order_tm_buyer_memo_idx  (cost=0.00..96.17 rows=7620 width=0) (actual time=2.048..2.048 rows=7125 loops=1)
                          Index Cond: ((buyer_memo)::text ~~ '%广东省%'::text)
                          Buffers: shared hit=61
Planning Time: 0.738 ms
Execution Time: 14.917 ms

查询速度有了明显提升,但是Gin索引是基于相似度匹配的一种索引,它要求待搜索的内容长度必须 >= 3才能进行索引搜索。

Finalize Aggregate  (cost=9669.81..9669.82 rows=1 width=8) (actual time=201.025..201.026 rows=1 loops=1)
  Buffers: shared hit=64248
  ->  Append  (cost=9669.79..9669.80 rows=1 width=8) (actual time=201.017..201.018 rows=1 loops=1)
        Buffers: shared hit=64248
        ->  Partial Aggregate  (cost=9669.79..9669.80 rows=1 width=8) (actual time=201.016..201.017 rows=1 loops=1)
              Buffers: shared hit=64248
              ->  Bitmap Heap Scan on trade_order_tm a  (cost=1271.94..9668.42 rows=548 width=0) (actual time=75.892..200.800 rows=576 loops=1)
                    Recheck Cond: (((buyer_memo)::text ~~ '%广东%'::text) AND ((status)::text <> '4'::text))
                    Rows Removed by Index Recheck: 74130
                    Filter: (includes_gift AND (created_time >= '2019-09-21 00:00:00'::timestamp without time zone) AND ((channel)::text = '1'::text) AND ((status)::text = '1'::text))
                    Rows Removed by Filter: 7587
                    Heap Blocks: exact=63777
                    Buffers: shared hit=64248
                    ->  Bitmap Index Scan on trade_order_tm_buyer_memo_idx  (cost=0.00..1271.80 rows=7849 width=0) (actual time=63.515..63.516 rows=82293 loops=1)
                          Index Cond: ((buyer_memo)::text ~~ '%广东%'::text)
                          Buffers: shared hit=471
Planning Time: 0.323 ms
Execution Time: 201.084 ms

对于留言这种搜索型的数据,我们更加建议使用全文检索索引。

总结:

我们经过分析与试验得到了一个较优的解决方案:使用分区表 + 条件索引。
通过按渠道分区将天猫的订单放入独立的分区表,然后根据订单的状态流转特性建立条件索引来获取较优的性能。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,569评论 4 363
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,499评论 1 294
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 109,271评论 0 244
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 44,087评论 0 209
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,474评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,670评论 1 222
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,911评论 2 313
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,636评论 0 202
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,397评论 1 246
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,607评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,093评论 1 261
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,418评论 2 254
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,074评论 3 237
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,092评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,865评论 0 196
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,726评论 2 276
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,627评论 2 270

推荐阅读更多精彩内容