466.【数据库】Star Schema Benchmark 标准测试集优化(三)

这是Star Schema Benchmark 标准测试集优化的第三篇,前一篇我们分析了下表数据,这一篇是最后一篇了。

一、分析算法路径

更新到前一篇的时候,其实专利技术已经集成到数据库中了,这个算法路径,主要是验证一下:在测试环境中的算法路径,是否和开发环境中一致。实际结果如下,13 条SQL语句的算法路径和开发环境中的算法路径,经过验证是完全一致的。

2022-10-20 01:39:53.344 - SQL2: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25; ;
  Status: PASS, Elapsed: 3.923, Affected: 1
  Info: Job[67:RESTRICT, tasks:193, time:3,591, size: 3,576,178, [0|193|3,591] ]
Job[69:DIMENSION_JOIN, tasks:193, time:16, size: 3,576,178, [0|193|15] ]
Job[71:OUTPUT_HASH_GROUP_BY, tasks:6, time:166, size: 6, [0|6|167] ]
Job[73:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:3, size: 1, [0|1|3] ]
Job[75:PROJECT, tasks:1, time:2, size: 1, [0|1|1] ]
total time: 3,788, record count: 1
restrict的时间/整体时间: 3591/3788

result:
REVENUE       |
- - - - - - -
6568512155417 |


2022-10-20 01:39:59.707 - SQL4: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; ;
  Status: PASS, Elapsed: 1.767, Affected: 1
  Info: Job[99:RESTRICT, tasks:193, time:1,727, size: 126,610, [0|193|1,728] ]
Job[101:DIMENSION_JOIN, tasks:193, time:8, size: 126,610, [0|193|8] ]
Job[103:OUTPUT_HASH_GROUP_BY, tasks:2, time:22, size: 2, [0|2|22] ]
Job[105:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ]
Job[107:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ]
total time: 1,764, record count: 1
restrict的时间/整体时间: 1727/1764

result:
REVENUE      |
- - - - - - -
550150245374 |


2022-10-20 01:40:04.552 - SQL6: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35; ;
  Status: PASS, Elapsed: 2.138, Affected: 1
  Info: Job[131:RESTRICT, tasks:193, time:2,108, size: 28,441, [0|193|2,109] ]
Job[133:DIMENSION_JOIN, tasks:193, time:7, size: 28,441, [0|193|7] ]
Job[135:OUTPUT_HASH_GROUP_BY, tasks:1, time:15, size: 1, [0|1|15] ]
Job[137:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ]
Job[139:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ]
total time: 2,136, record count: 1
restrict的时间/整体时间: 2108/2136

result:
REVENUE      |
- - - - - - -
122223605792 |


2022-10-20 01:40:31.923 - SQL8: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand; ;
  Status: PASS, Elapsed: 4.714, Affected: 280
  Info: Job[193:RESTRICT, tasks:193, time:4,342, size: 1,442,564, [0|193|4,342] ]
Job[195:DIMENSION_JOIN, tasks:193, time:9, size: 1,442,564, [0|193|10] ]
Job[197:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ]
Job[199:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ]
Job[202:OUTPUT_HASH_GROUP_BY, tasks:6, time:332, size: 1,680, [0|6|332] ]
Job[204:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:4, size: 280, [0|7|4] ]
Job[206:PROJECT, tasks:7, time:3, size: 280, [0|7|3] ]
Job[208:RANGE_SORT, tasks:7, time:2, size: 280, [0|7|2] ]
Job[212:PARTITION_TABLET, tasks:7, time:0, size: 280, [0|7|0] ]
Job[214:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 70, [0|1|1] ]
total time: 4,711, record count: 280
restrict的时间/整体时间: 4342/4711

result:
LO_REVENUE  |D_YEAR     |P_BRAND    |
- - - - - - - - - - - - - - - - - -
18712903257 |1992       |MFGR#121   |
20576919851 |1992       |MFGR#1210  |
20452654696 |1992       |MFGR#1211  |

2022-10-20 01:40:37.664 - SQL10: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand; ;
  Status: PASS, Elapsed: 1.24, Affected: 56
  Info: Job[292:RESTRICT, tasks:193, time:1,017, size: 287,885, [0|193|1,017] ]
Job[294:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ]
Job[296:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ]
Job[298:DIMENSION_JOIN, tasks:193, time:4, size: 287,885, [0|193|4] ]
Job[301:OUTPUT_HASH_GROUP_BY, tasks:3, time:193, size: 168, [0|3|193] ]
Job[303:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 56, [0|7|2] ]
Job[305:PROJECT, tasks:7, time:1, size: 56, [0|7|1] ]
Job[307:RANGE_SORT, tasks:7, time:1, size: 56, [0|7|1] ]
Job[311:PARTITION_TABLET, tasks:7, time:0, size: 56, [0|7|0] ]
Job[313:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 12, [0|1|0] ]
total time: 1,235, record count: 56
restrict的时间/整体时间: 1017/1235

result:
LO_REVENUE  |D_YEAR     |P_BRAND    |
- - - - - - - - - - - - - - - - - -
19803695538 |1992       |MFGR#2221  |
19639734537 |1992       |MFGR#2222  |
19945070508 |1992       |MFGR#2223  |

2022-10-20 01:40:39.888 - SQL12: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand; ;
  Status: PASS, Elapsed: 0.862, Affected: 7
  Info: Job[397:RESTRICT, tasks:193, time:785, size: 35,599, [0|193|785] ]
Job[399:DIMENSION_JOIN, tasks:193, time:4, size: 35,599, [0|193|4] ]
Job[401:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ]
Job[403:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ]
Job[406:OUTPUT_HASH_GROUP_BY, tasks:1, time:49, size: 7, [0|1|49] ]
Job[408:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:8, size: 7, [0|7|8] ]
Job[410:PROJECT, tasks:7, time:0, size: 7, [0|7|0] ]
Job[412:RANGE_SORT, tasks:5, time:1, size: 7, [0|5|1] ]
Job[416:PARTITION_TABLET, tasks:5, time:0, size: 7, [0|5|0] ]
Job[418:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 2, [0|1|0] ]
total time: 859, record count: 7
restrict的时间/整体时间: 785/859

result:
LO_REVENUE  |D_YEAR     |P_BRAND    |
- - - - - - - - - - - - - - - - - -
19700225276 |1992       |MFGR#2239  |
19306484466 |1993       |MFGR#2239  |
19398411013 |1994       |MFGR#2239  |

2022-10-20 01:41:02.935 - SQL14: select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc; ;
  Status: PASS, Elapsed: 5.015, Affected: 150
  Info: Job[502:RESTRICT, tasks:193, time:3,856, size: 6,570,093, [0|193|3,857] ]
Job[504:DIMENSION_JOIN, tasks:193, time:5, size: 6,570,093, [0|193|4] ]
Job[506:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ]
Job[508:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ]
Job[511:OUTPUT_HASH_GROUP_BY, tasks:6, time:1,132, size: 900, [0|6|1,132] ]
Job[513:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:3, size: 150, [0|7|3] ]
Job[515:PROJECT, tasks:7, time:1, size: 150, [0|7|1] ]
Job[517:RANGE_SORT, tasks:7, time:1, size: 150, [0|7|1] ]
Job[521:PARTITION_TABLET, tasks:7, time:0, size: 150, [0|7|0] ]
Job[523:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 22, [0|1|1] ]
total time: 5,012, record count: 150
restrict的时间/整体时间: 3856/5012

result:
C_NATION   |S_NATION   |D_YEAR     |LO_REVENUE   |
- - - - - - - - - - - - - - - - - - - - - - - - -
JAPAN      |INDIA      |1992       |163691240866 |
CHINA      |INDONESIA  |1992       |163434081261 |
CHINA      |INDIA      |1992       |163430796231 |

2022-10-20 01:41:11.469 - SQL16: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and  c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
  Status: PASS, Elapsed: 3.478, Affected: 600
  Info: Job[589:RESTRICT, tasks:193, time:3,262, size: 264,531, [0|193|3,262] ]
Job[591:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ]
Job[593:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ]
Job[595:DIMENSION_JOIN, tasks:193, time:3, size: 264,531, [0|193|3] ]
Job[598:OUTPUT_HASH_GROUP_BY, tasks:3, time:189, size: 1,800, [0|3|189] ]
Job[600:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:5, size: 600, [0|7|5] ]
Job[602:PROJECT, tasks:7, time:1, size: 600, [0|7|1] ]
Job[604:RANGE_SORT, tasks:7, time:1, size: 600, [0|7|1] ]
Job[608:PARTITION_TABLET, tasks:7, time:0, size: 600, [0|7|0] ]
Job[610:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 100, [0|1|0] ]
total time: 3,475, record count: 600
restrict的时间/整体时间: 3262/3475

result:
C_CITY     |S_CITY     |D_YEAR     |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED ST3 |UNITED ST4 |1992       |1915435842 |
UNITED ST8 |UNITED ST0 |1992       |1910327375 |
UNITED ST5 |UNITED ST0 |1992       |1893024189 |

2022-10-20 01:41:16.505 - SQL18: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
  Status: PASS, Elapsed: 1.588, Affected: 24
  Info: Job[679:RESTRICT, tasks:193, time:1,525, size: 10,616, [0|193|1,525] ]
Job[681:DIMENSION_JOIN, tasks:193, time:6, size: 10,616, [0|193|6] ]
Job[683:DIMENSION_JOIN, tasks:193, time:4, size: 10,616, [0|193|4] ]
Job[685:DIMENSION_JOIN, tasks:193, time:3, size: 10,616, [0|193|3] ]
Job[688:OUTPUT_HASH_GROUP_BY, tasks:1, time:35, size: 24, [0|1|35] ]
Job[690:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 24, [0|7|2] ]
Job[692:PROJECT, tasks:7, time:0, size: 24, [0|7|0] ]
Job[694:RANGE_SORT, tasks:7, time:1, size: 24, [0|7|1] ]
Job[698:PARTITION_TABLET, tasks:7, time:0, size: 24, [0|7|0] ]
Job[700:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 3, [0|1|1] ]
total time: 1,585, record count: 24
restrict的时间/整体时间: 1525/1585

result:
C_CITY     |S_CITY     |D_YEAR     |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED KI1 |UNITED KI1 |1992       |1786080690 |
UNITED KI5 |UNITED KI1 |1992       |1705128984 |
UNITED KI1 |UNITED KI5 |1992       |1620054330 |

2022-10-20 01:41:19.166 - SQL20: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth  = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
  Status: PASS, Elapsed: 1.046, Affected: 4
  Info: Job[775:RESTRICT, tasks:193, time:1,012, size: 151, [0|193|1,012] ]
Job[777:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ]
Job[779:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ]
Job[781:DIMENSION_JOIN, tasks:109, time:3, size: 151, [0|109|3] ]
Job[784:OUTPUT_HASH_GROUP_BY, tasks:1, time:13, size: 4, [0|1|13] ]
Job[786:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 4, [0|7|1] ]
Job[788:PROJECT, tasks:7, time:1, size: 4, [0|7|1] ]
Job[790:RANGE_SORT, tasks:2, time:0, size: 4, [0|2|0] ]
Job[794:PARTITION_TABLET, tasks:2, time:0, size: 4, [0|2|0] ]
Job[796:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 2, [0|1|1] ]
total time: 1,044, record count: 4
restrict的时间/整体时间: 1012/1044

result:
C_CITY     |S_CITY     |D_YEAR     |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED KI5 |UNITED KI1 |1997       |168840628  |
UNITED KI1 |UNITED KI5 |1997       |140264663  |
UNITED KI1 |UNITED KI1 |1997       |135684305  |

2022-10-20 01:41:29.956 - SQL22: select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation; ;
  Status: PASS, Elapsed: 2.833, Affected: 35
  Info: Job[878:RESTRICT, tasks:193, time:2,137, size: 2,882,137, [0|193|2,138] ]
Job[880:DIMENSION_JOIN, tasks:193, time:8, size: 2,882,137, [0|193|8] ]
Job[882:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|3] ]
Job[884:DIMENSION_JOIN, tasks:193, time:2, size: 2,882,137, [0|193|2] ]
Job[886:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|2] ]
Job[889:OUTPUT_HASH_GROUP_BY, tasks:6, time:663, size: 210, [0|6|662] ]
Job[891:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 35, [0|7|1] ]
Job[893:PROJECT, tasks:7, time:1, size: 35, [0|7|1] ]
Job[895:PROJECT, tasks:7, time:0, size: 35, [0|7|0] ]
Job[897:RANGE_SORT, tasks:7, time:1, size: 35, [0|7|1] ]
Job[901:PARTITION_TABLET, tasks:7, time:0, size: 35, [0|7|0] ]
Job[903:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 1, [0|1|0] ]
total time: 2,830, record count: 35
restrict的时间/整体时间: 2137/2830

result:
D_YEAR     |C_NATION      |PROFIT       |
- - - - - - - - - - - - - - - - - - - -
1992       |ARGENTINA     |312585625436 |
1992       |BRAZIL        |312719709853 |
1992       |CANADA        |307040911677 |

2022-10-20 01:41:37.244 - SQL24: select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category; ;
  Status: PASS, Elapsed: 4.462, Affected: 100
  Info: Job[1005:RESTRICT, tasks:193, time:4,155, size: 694,402, [0|193|4,155] ]
Job[1007:DIMENSION_JOIN, tasks:193, time:4, size: 694,402, [0|193|4] ]
Job[1009:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ]
Job[1011:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ]
Job[1013:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|2] ]
Job[1016:OUTPUT_HASH_GROUP_BY, tasks:6, time:272, size: 600, [0|6|271] ]
Job[1018:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 100, [0|7|2] ]
Job[1020:PROJECT, tasks:7, time:0, size: 100, [0|7|0] ]
Job[1022:PROJECT, tasks:7, time:1, size: 100, [0|7|0] ]
Job[1024:RANGE_SORT, tasks:7, time:2, size: 100, [0|7|1] ]
Job[1028:PARTITION_TABLET, tasks:7, time:0, size: 100, [0|7|0] ]
Job[1030:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 10, [0|1|0] ]
total time: 4,460, record count: 100
restrict的时间/整体时间: 4155/4460

result:
D_YEAR     |S_NATION      |P_CATEGORY |PROFIT      |
- - - - - - - - - - - - - - - - - - - - - - - - - -
1997       |ARGENTINA     |MFGR#11    |31668757023 |
1997       |ARGENTINA     |MFGR#12    |31315629143 |
1997       |ARGENTINA     |MFGR#13    |31899989093 |


我这边主要做了一下内容:

  1. 将数据库执行结果通过Linux脚本下载下来
cat ../conf/ssb_test.sql | ./cplus.sh > ssb30_record_result.txt
  1. 将数据库日志中的 job 信息通过Linux脚本下载下来
./cplus.sh <<EOF
desc history;
EOF
  1. 编写python脚本提取 RESTRICT 时间,并组合前两步骤的结果
a = """
 {具体的 job 信息}
"""
b = """
{具体的 查询结果 信息}
"""
if __name__ == '__main__':
    result_a = []
    # print(a.split("\n\n"))
    for item in a.split("\n\n"):
        result_item = '\nrestrict的时间/整体时间: '
        flag = False
        for item_item in item.split("\n"):
            if "RESTRICT" in item_item:
                flag = True
                result_item += item_item.split("time:")[1].split(", ")[0] + "/"
            elif "total time" in item_item:
                result_item += item_item.split("total time: ")[1].split(", ")[0] + "\n"
        if not flag:
            continue
        result_item = result_item.replace(",", "")
        result_a.append(item + result_item)
    result_a.reverse()
    # for item in result_a:
    #     print(item)
    b_1 = ["\n".join(item.split("\n")[1:6]) for item in b.split("\nSQL")]

    result_b = []
    for item in b_1:
        if "|" in item:
            if "\nSelects" in item:
                result_b.append("result: \n" + "\n".join(item.split("\n")[0:-2]) + "\n\n")
            else:
                result_b.append("result: \n" + item + "\n")
    if len(result_a) != 26 or len(result_b) != 26:
        raise Exception("result'len should =26 and result_a'len should =26")


    result = ["\n".join(item) for item in list(zip(result_a, result_b))]

    result.insert(0, "d")

    result = result[0:-1:2][1:]
    print("\n".join(result))

二、解决线程 bug

我们知道,在 windows 中,通过任务管理器可以看CPU信息,比如下面是我在windows上的CPU信息截图:


现在问题来了,我的数据库系统到底是启用 8 线程好呢?还是 16线程好呢?
答案是 16 线程。以逻辑处理器为准!

数据库原先启用的是8线程,这就是问题所在。这块优化完毕之后,“Star Schema Benchmark 标准测试集优化”基本已终结。

三、测试结果


左边是咱们数据库,右边是 Starrocks 数据库

相关阅读:
443.【数据库】Star Schema Benchmark 标准测试集优化(一)
463.【数据库】Star Schema Benchmark 标准测试集优化(二)

推荐阅读更多精彩内容