From Calcite to Tampering with Flink SQL

今天为组内同学做了题为「From Calcite to Tampering with Flink SQL」的分享，将Markdown版讲义贴在下面。

本次分享信息量极大，涵盖Calcite基础、Blink Planner执行原理、优化器与优化规则等。之后会择重点专门写文章二次讲解。

From Calcite to Tampering with Flink SQL

August 26th, 2021

For NiceTuan Real-Time Team

Prerequisites

Basic understanding of
- Flink DataStream runtime (3-layered DAGs, stream partition, etc.)
- Database system concepts
- SQL queries
- Scala language, just in case

(Review) Some Relational Algebra

Textbook - Database System Concepts 6th Edition [Abraham Silberschatz et al. 2011]
But Wikipedia is fairly enough
- Relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it
- The theory was introduced by Edgar F. Codd
Projection (Π)

Selection (σ)

Rename (ρ)

Natural join (⋈) & Equi-join

Left outer join (⟕)

Right outer join (⟖)

Calcite In A Nutshell

What is it

As you already knew, "Flink does not reinvent the wheel, but leverages Apache Calcite to deal with most SQL-related works"
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD

Architecture

From Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources [Edmon Begoli et al. SIGMOD 2018]

Fundamental Concepts

Catalog - A metadata store & handler for schema, tables, etc.
SqlNode - A parsed SQL tree (i.e. AST)
- SqlLiteral - Constant value (1, FALSE, ...)
- SqlIdentifier - Identifier
- SqlCall - Call to functions, operators, etc.
- SqlSelect / SqlJoin / SqlOrderBy / ...
RelNode - A relational (algebraic) expression
- LogicalTableScan
- LogicalProject
- LogicalFilter
- LogicalCalc
- ...
RexNode - A (typed) row-level expression
- RexLiteral
- RexVariable
- RexCall
- ...
RelTrait & RelTraitDef - A set of physical properties & their definitions carried by a relational expression
- Convention - Working scope, mainly a single data source
- RelCollation - Ordering method of data (and sort keys)
- RelDistribution - Distribution method of data
RelOptPlanner - A query optimizer, which transforms a relational expression into a semantically equivalent relational expression, according to a given set of rules and a cost model
- HepPlanner - RBO, greedy, heuristic
- VolcanoPlanner - CBO, dynamic programming, Volcano-flavored
RelOptRule - A (usually empirical) rule which defines the transformation routine for RBO
- RelOptRuleOperand - Used by the rule to determine the section of RelNodes to be optimized
- RuleSet - Self-explanatory
RelOptCost - An interface for optimizer cost in terms of number of rows processed, CPU cost, and I/O cost
RelMetadataProvider - An interface for obtaining metadata about relational expressions to support optimization process
- Min / max row count
- Data size
- Expression lineage
- Distinctness / uniqueness
- ...
RelOptCluster - The environment during the optimization of a query

Process Flow

A Quick Calcite Show

Prepare Schema and SQL

SchemaPlus rootSchema = Frameworks.createRootSchema(true);

rootSchema.add("student", new AbstractTable() {
  @Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
    RelDataTypeFactory.Builder builder = new Builder(DEFAULT_TYPE_FACTORY);

    builder.add("id", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.BIGINT));
    builder.add("name", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.VARCHAR));
    builder.add("class", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.VARCHAR));
    builder.add("age", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.INTEGER));

    return builder.build();
  }
});

rootSchema.add("exam_result", new AbstractTable() {
  @Override public RelDataType getRowType(RelDataTypeFactory typeFactory) {
    RelDataTypeFactory.Builder builder = new Builder(DEFAULT_TYPE_FACTORY);

    builder.add("student_id", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.BIGINT));
    builder.add("score1", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.FLOAT));
    builder.add("score2", new BasicSqlType(DEFAULT_TYPE_SYSTEM, SqlTypeName.FLOAT));

    return builder.build();
  }
});

String sql = /* language=SQL */
  "SELECT a.id, a.name, SUM(b.score1 * 0.7 + b.score2 * 0.3) AS total_score " +
  "FROM student a " +
  "INNER JOIN exam_result b ON a.id = b.student_id " +
  "WHERE a.age < 20 AND b.score1 > 60.0 " +
  "GROUP BY a.id, a.name";

Parsing

FrameworkConfig frameworkConfig = Frameworks.newConfigBuilder()
  .parserConfig(SqlParser.config().withCaseSensitive(false).withLex(Lex.MYSQL_ANSI))
  .defaultSchema(rootSchema)
  .build();

SqlParser parser = SqlParser.create(sql);
SqlNode originalSqlNode = parser.parseStmt();

System.out.println(originalSqlNode.toString());

--- Original SqlNode ---
SELECT `A`.`ID`, `A`.`NAME`, SUM(`B`.`SCORE1` * 0.7 + `B`.`SCORE2` * 0.3) AS `TOTAL_SCORE`
FROM `STUDENT` AS `A`
INNER JOIN `EXAM_RESULT` AS `B` ON `A`.`ID` = `B`.`STUDENT_ID`
WHERE `A`.`AGE` < 20 AND `B`.`SCORE1` > 60.0
GROUP BY `A`.`ID`, `A`.`NAME`

Validation

Properties cxnConfig = new Properties();
cxnConfig.setProperty(
  CalciteConnectionProperty.CASE_SENSITIVE.camelName(),
  String.valueOf(frameworkConfig.getParserConfig().caseSensitive()));

CalciteCatalogReader catalogReader = new CalciteCatalogReader(
  CalciteSchema.from(rootSchema),
  CalciteSchema.from(frameworkConfig.getDefaultSchema()).path(null),
  DEFAULT_TYPE_FACTORY,
  new CalciteConnectionConfigImpl(cxnConfig)
);

SqlValidator validator = new SqlValidatorImpl1(
  frameworkConfig.getOperatorTable(),
  catalogReader,
  DEFAULT_TYPE_FACTORY
);

SqlNode validatedSqlNode = validator.validate(originalSqlNode);

System.out.println(validatedSqlNode.toString());

--- Validated SqlNode ---
SELECT `A`.`ID`, `A`.`NAME`, SUM(`B`.`SCORE1` * 0.7 + `B`.`SCORE2` * 0.3) AS `TOTAL_SCORE`
FROM `STUDENT` AS `A`
INNER JOIN `EXAM_RESULT` AS `B` ON `A`.`id` = `B`.`student_id`
WHERE `A`.`age` < 20 AND `B`.`score1` > 60.0
GROUP BY `A`.`id`, `A`.`name`

Planning

RelOptCluster relOptCluster = RelOptCluster.create(new VolcanoPlanner(), new RexBuilder(DEFAULT_TYPE_FACTORY));

SqlToRelConverter relConverter = new SqlToRelConverter(
  null,
  validator,
  catalogReader,
  relOptCluster,
  frameworkConfig.getConvertletTable()
);

RelRoot relRoot = relConverter.convertQuery(validatedSqlNode, false, true);
RelNode originalRelNode = relRoot.rel;

System.out.println(RelOptUtil.toString(originalRelNode));

--- Original RelNode ---
LogicalProject(ID=[$0], NAME=[$1], TOTAL_SCORE=[$2])
  LogicalAggregate(group=[{0, 1}], TOTAL_SCORE=[SUM($2)])
    LogicalProject(id=[$0], name=[$1], $f2=[+(*($5, 0.7:DECIMAL(2, 1)), *($6, 0.3:DECIMAL(2, 1)))])
      LogicalFilter(condition=[AND(<($3, 20), >($5, 60.0:DECIMAL(3, 1)))])
        LogicalJoin(condition=[=($0, $4)], joinType=[inner])
          LogicalTableScan(table=[[student]])
          LogicalTableScan(table=[[exam_result]])

Optimization

Predicate (filter) pushdown past join into table scan using HepPlanner and FILTER_INTO_JOIN rule

σ_{R.aθa' ^ S.bθb'} (R ⋈ S) = (σ_R.aθa' R) ⋈ (σ_S.bθb' S)

HepProgram defines the order of rules to be attempted

HepProgram hepProgram = new HepProgramBuilder()
  .addRuleInstance(CoreRules.FILTER_INTO_JOIN)
  .addMatchOrder(HepMatchOrder.BOTTOM_UP)
  .build();

HepPlanner hepPlanner = new HepPlanner(hepProgram);
hepPlanner.setRoot(originalRelNode);
RelNode optimizedRelNode = hepPlanner.findBestExp();

System.out.println(RelOptUtil.toString(optimizedRelNode));

--- Optimized RelNode ---
LogicalProject(ID=[$0], NAME=[$1], TOTAL_SCORE=[$2])
  LogicalAggregate(group=[{0, 1}], TOTAL_SCORE=[SUM($2)])
    LogicalProject(id=[$0], name=[$1], $f2=[+(*($5, 0.7:DECIMAL(2, 1)), *($6, 0.3:DECIMAL(2, 1)))])
      LogicalJoin(condition=[=($0, $4)], joinType=[inner])
        LogicalFilter(condition=[<($3, 20)])
          LogicalTableScan(table=[[student]])
        LogicalFilter(condition=[>($1, 60.0:DECIMAL(3, 1))])
          LogicalTableScan(table=[[exam_result]])

Rules can do a lot more...

Dive Into Blink Stream Planner

Overview

Parsing & validation
Logical planning
All-over optimization w/ physical planning
Execution planning & codegen (only a brief today)

SQL for Example

Will not cover sophisticated things (e.g. sub-queries, aggregate functions, window TVFs) for now
Just an ordinary streaming ETL process, which will be optimized later

INSERT INTO expdb.print_joined_result
SELECT 
  FROM_UNIXTIME(a.ts / 1000, 'yyyy-MM-dd HH:mm:ss') AS tss, 
  a.userId, a.eventType, 
  a.siteId, b.site_name AS siteName
FROM expdb.kafka_analytics_access_log_app 
/*+ OPTIONS('scan.startup.mode'='latest-offset','properties.group.id'='DiveIntoBlinkExp') */ a
LEFT JOIN rtdw_dim.mysql_site_war_zone_mapping_relation 
FOR SYSTEM_TIME AS OF a.procTime AS b ON CAST(a.siteId AS INT) = b.site_id
WHERE a.userId > 3 + 4;

Parsing & Validation

Build the flink-sql-parser module, and you'll get the exact parser for Flink SQL dialect

Call stack

// parse
parse:54, CalciteParser (org.apache.flink.table.planner.parse)
parse:96, ParserImpl (org.apache.flink.table.planner.delegation)
executeSql:722, TableEnvironmentImpl (org.apache.flink.table.api.internal)

// validation
-- goes to org.apache.flink.table.planner.calcite.FlinkCalciteSqlValidator#validate()
org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate:150, FlinkPlannerImpl (org.apache.flink.table.planner.calcite)
validate:108, FlinkPlannerImpl (org.apache.flink.table.planner.calcite)
convert:201, SqlToOperationConverter (org.apache.flink.table.planner.operations)
parse:99, ParserImpl (org.apache.flink.table.planner.delegation)
executeSql:722, TableEnvironmentImpl (org.apache.flink.table.api.internal)

SqlNode tree
- Note that FOR SYSTEM_TIME AS OF syntax is translated to a SqlSnapshot node

Logical Planning

Call stack
- Obviously, these are a bunch of recursive processes

-- goes to Calcite SqlToRelConverter
org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel:168, FlinkPlannerImpl (org.apache.flink.table.planner.calcite)
rel:160, FlinkPlannerImpl (org.apache.flink.table.planner.calcite)
toQueryOperation:967, SqlToOperationConverter (org.apache.flink.table.planner.operations)
convertSqlQuery:936, SqlToOperationConverter (org.apache.flink.table.planner.operations)
convert:275, SqlToOperationConverter (org.apache.flink.table.planner.operations)
convertSqlInsert:595, SqlToOperationConverter (org.apache.flink.table.planner.operations)
convert:268, SqlToOperationConverter (org.apache.flink.table.planner.operations)
parse:99, ParserImpl (org.apache.flink.table.planner.delegation)
executeSql:722, TableEnvironmentImpl (org.apache.flink.table.api.internal)

Logical planning in Flink SQL yields a tree of Operations (e.g. ModifyOperation, QueryOperation)
- Just wrappers of RelNodes
RelNode tree
- SqlJoin → LogicalCorrelate (in Calcite this means nested-loop join)
- SqlSnapshot → LogicalSnapshot
- etc.

Output of EXPLAIN statement

-- In fact this is the original logical plan
== Abstract Syntax Tree ==
LogicalSink(table=[hive.expdb.print_joined_result], fields=[tss, userId, eventType, siteId, siteName])
+- LogicalProject(tss=[FROM_UNIXTIME(/($0, 1000), _UTF-16LE'yyyy-MM-dd HH:mm:ss')], userId=[$1], eventType=[$2], siteId=[$6], siteName=[$10])
   +- LogicalFilter(condition=[>($1, +(3, 4))])
      +- LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{6, 8}])
         :- LogicalProject(ts=[$0], userId=[$1], eventType=[$2], columnType=[$3], fromType=[$4], grouponId=[$5], siteId=[$6], merchandiseId=[$7], procTime=[PROCTIME()])
         :  +- LogicalTableScan(table=[[hive, expdb, kafka_analytics_access_log_app]], hints=[[[OPTIONS inheritPath:[] options:{properties.group.id=DiveIntoBlinkExp, scan.startup.mode=latest-offset}]]])
         +- LogicalFilter(condition=[=(CAST($cor0.siteId):INTEGER, $0)])
            +- LogicalSnapshot(period=[$cor0.procTime])
               +- LogicalTableScan(table=[[hive, rtdw_dim, mysql_site_war_zone_mapping_relation]])

All-Over Optimization w/ Physical Planning

Call stack
- CommonSubGraphBasedOptimizer is a Flink-implemented optimizer that divides logical plan into sub-graphs by SinkBlocks, and reuses common sub-graphs whenever available
- For most scenarios, the logical plan is merely a single tree (optimizeTree)

-- goes to org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram#optimize()
optimizeTree:163, StreamCommonSubGraphBasedOptimizer (org.apache.flink.table.planner.plan.optimize)
doOptimize:79, StreamCommonSubGraphBasedOptimizer (org.apache.flink.table.planner.plan.optimize)
optimize:77, CommonSubGraphBasedOptimizer (org.apache.flink.table.planner.plan.optimize)
optimize:284, PlannerBase (org.apache.flink.table.planner.delegation)
translate:168, PlannerBase (org.apache.flink.table.planner.delegation)
translate:1516, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:738, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:854, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeSql:728, TableEnvironmentImpl (org.apache.flink.table.api.internal)

FlinkChainedProgram breaks down to several FlinkHepPrograms (resemble to HepProgram), which defines the order of rules to be attempted with HepPlanner
- This time a lot more rules of course
- Flink SQL handles entire physical planning process with RelOptRules, along with logical/physical optimization
All RuleSets are presented in FlinkStreamRuleSets, some of them are shipped natively with Calcite

FlinkStreamProgram actually build up the program sequence
- The names are quite straightforward though
- At the end of LOGICAL, specialized ConverterRules will convert Calcite RelNode into FlinkLogicalRel
  - e.g. LogicalCalc → FlinkLogicalCalcConverter → FlinkLogicalCalc
  - i.e. Converted the convention to FLINK_LOGICAL
  - Logical optimization phase is somewhat hard to observe

The optimized StreamPhysicalRel tree
- Physical planning rules are almost all ConverterRules
  - FlinkLogicalRel → StreamPhysicalRel, convention FLINK_LOGICAL → STREAM_PHYSICAL
  - e.g. FlinkLogicalCalc → StreamPhysicalCalcRule → StreamPhysicalCalc
- HepRelVertex is the wrapper of RelNode in HepPlanner

Output of EXPLAIN statement

== Optimized Physical Plan ==
Sink(table=[hive.expdb.print_joined_result], fields=[tss, userId, eventType, siteId, siteName])
+- Calc(select=[FROM_UNIXTIME(/(ts, 1000), _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS tss, userId, eventType, siteId, site_name AS siteName])
   +- LookupJoin(table=[hive.rtdw_dim.mysql_site_war_zone_mapping_relation], joinType=[LeftOuterJoin], async=[false], lookup=[site_id=siteId0], select=[ts, userId, eventType, siteId, siteId0, site_id, site_name])
      +- Calc(select=[ts, userId, eventType, siteId, CAST(siteId) AS siteId0], where=[>(userId, 7)])
         +- TableSourceScan(table=[[hive, expdb, kafka_analytics_access_log_app]], fields=[ts, userId, eventType, columnType, fromType, grouponId, siteId, merchandiseId], hints=[[[OPTIONS options:{properties.group.id=DiveIntoBlinkExp, scan.startup.mode=latest-offset}]]])

Pick two rules for some explanation
TEMPORAL_JOIN_REWRITE - LogicalCorrelateToJoinFromLookupTableRuleWithFilter

This rule matches

+- LogicalCorrelate
   :- [RelNode related to stream table]
   +- LogicalFilter(condition)
      +- LogicalSnapshot(time_attr)
         +- [RelNode related to temporal table]

and transforms into

+- LogicalJoin(condition)
   :- [RelNode related to stream table]
   +- LogicalSnapshot(time_attr)
      +- [RelNode related to temporal table]

PHYSICAL - StreamPhysicalLookupJoinRule - SnapshotOnTableScanRule

This rule matches

+- FlinkLogicalJoin(condition)
   :- [RelNode related to stream table]
   +- FlinkLogicalSnapshot(time_attr)
      +- FlinkLogicalTableSourceScan [w/ LookupTableSource]

and transforms into StreamPhysicalLookupJoin

Execution Planning & Codegen

Call stack

-- goes to separate FlinkPhysicalRel#translateToExecNode()
generate:74, ExecNodeGraphGenerator (org.apache.flink.table.planner.plan.nodes.exec)
generate:54, ExecNodeGraphGenerator (org.apache.flink.table.planner.plan.nodes.exec)
translateToExecNodeGraph:312, PlannerBase (org.apache.flink.table.planner.delegation)
translate:164, PlannerBase (org.apache.flink.table.planner.delegation)
translate:1518, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:740, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:856, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeSql:730, TableEnvironmentImpl (org.apache.flink.table.api.internal)

-- goes to separate ExecNodeBase#translateToPlan() & StreamExecNode#translateToPlanInternal()
translateToPlan:70, StreamPlanner (org.apache.flink.table.planner.delegation)
translate:165, PlannerBase (org.apache.flink.table.planner.delegation)
translate:1518, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:740, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeInternal:856, TableEnvironmentImpl (org.apache.flink.table.api.internal)
executeSql:730, TableEnvironmentImpl (org.apache.flink.table.api.internal)

The ExecNodeGraph DAG
- JSON representation of this DAG can be acquired or executed by tableEnv.asInstanceOf[TableEnvironmentInternal].getJsonPlan(sql) / executeJsonPlan(plan)

Output of EXPLAIN statement

== Optimized Execution Plan ==
Sink(table=[hive.expdb.print_joined_result], fields=[tss, userId, eventType, siteId, siteName])
+- Calc(select=[FROM_UNIXTIME((ts / 1000), _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS tss, userId, eventType, siteId, site_name AS siteName])
   +- LookupJoin(table=[hive.rtdw_dim.mysql_site_war_zone_mapping_relation], joinType=[LeftOuterJoin], async=[false], lookup=[site_id=siteId0], select=[ts, userId, eventType, siteId, siteId0, site_id, site_name])
      +- Calc(select=[ts, userId, eventType, siteId, CAST(siteId) AS siteId0], where=[(userId > 7)])
         +- TableSourceScan(table=[[hive, expdb, kafka_analytics_access_log_app]], fields=[ts, userId, eventType, columnType, fromType, grouponId, siteId, merchandiseId], hints=[[[OPTIONS options:{properties.group.id=DiveIntoBlinkExp, scan.startup.mode=latest-offset}]]])

StreamExecNode → Transformation → Generated DataStream Operator / Function code
- e.g. StreamExecCalc → OneInputStreamTransformation → OneInputStreamOperator / FlatMapFunction
Generated code will be dynamically compiled into Java class files through Janino
- You can view all generated code by setting debug output of CompileUtils
- Too long, refer to https://pastebin.com/NCMSxh5h
We'll leave detailed explanation of this part for the next lecture

Get Our Hands Dirty

Question

Are there any hidden trouble in the simple example program shown above?

Try focus on the LookupJoin and consider its cache locality
- In extreme conditions, a lookup-ed KV can be re-cached N times

Define An Option

Distributing lookup keys (according to hash) to sub-tasks seems better
In ExecutionConfigOptions...

@Documentation.TableOption(execMode = Documentation.ExecMode.STREAMING)
public static final ConfigOption<Boolean> TABLE_EXEC_LOOKUP_DISTRIBUTE_BY_KEY =
    key("table.exec.lookup.distribute-by-key")
    .defaultValue(false)
    .withDescription("Specifies whether to distribute lookups to sub-tasks by hash value of lookup key.");

Customize A Rule

When to apply this rule? --- After physical planning
What should we do? --- Insert a hash-by-key operation before StreamPhysicalLookupJoin
- FlinkRelDistribution will do the work
- Physical redistribution means StreamPhysicalExchange node
Note that there are 5 kinds of RelTrait in Flink SQL

class HashDistributedLookupJoinRule extends RelOptRule(
  operand(classOf[StreamPhysicalLookupJoin], any()),
  "HashDistributedLookupJoinRule") {

  override def matches(call: RelOptRuleCall): Boolean = {
    val tableConfig = call.getPlanner.getContext.unwrap(classOf[FlinkContext]).getTableConfig
    tableConfig.getConfiguration.getBoolean(ExecutionConfigOptions.TABLE_EXEC_LOOKUP_DISTRIBUTE_BY_KEY)
  }

  override def onMatch(call: RelOptRuleCall): Unit = {
    val originalLookupJoin: StreamPhysicalLookupJoin = call.rel(0)
    val joinInfo = originalLookupJoin.joinInfo
    val traitSet = originalLookupJoin.getTraitSet

    val requiredDistribution = FlinkRelDistribution.hash(joinInfo.leftKeys)

    val hashDistributedTraitSet = traitSet
      .replace(requiredDistribution)
      .replace(FlinkConventions.STREAM_PHYSICAL)
      .replace(RelCollations.EMPTY)
      .replace(traitSet.getTrait(ModifyKindSetTraitDef.INSTANCE))
      .replace(traitSet.getTrait(UpdateKindTraitDef.INSTANCE))

    val hashDistributedInput = new StreamPhysicalExchange(
      originalLookupJoin.getCluster,
      hashDistributedTraitSet,
      originalLookupJoin,
      requiredDistribution
    )

    call.transformTo(
      originalLookupJoin.copy(originalLookupJoin.getTraitSet, util.Arrays.asList(hashDistributedInput))
    )
  }
}

object HashDistributedLookupJoinRule {
  val INSTANCE: RelOptRule = new HashDistributedLookupJoinRule
}

There's a helper method FlinkExpandConversionRule#satisfyDistribution() (also used in two-stage aggregation), how lucky

val hashDistributedInput = FlinkExpandConversionRule.satisfyDistribution(
  FlinkConventions.STREAM_PHYSICAL,
  originalLookupJoin.getInput,
  requiredDistribution
)

Put Into Rule Set

At the tail of FlinkStreamRuleSets

val PHYSICAL_REWRITE: RuleSet = RuleSets.ofList(
    // hash distributed lookup join rule
    HashDistributedLookupJoinRule.INSTANCE,
    // optimize agg rule
    TwoStageOptimizedAggregateRule.INSTANCE,
    // incremental agg rule
    IncrementalAggregateRule.INSTANCE,
    // optimize window agg rule
    TwoStageOptimizedWindowAggregateRule.INSTANCE
)

Have A Try

Rebuild flink-table-api-java & flink-table-planner-blink module
SET table.exec.lookup.distribute-by-key=true

== Optimized Physical Plan ==
Sink(table=[hive.expdb.print_joined_result], fields=[tss, userId, eventType, siteId, siteName])
+- Calc(select=[FROM_UNIXTIME(/(ts, 1000), _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS tss, userId, eventType, siteId, site_name AS siteName])
   +- LookupJoin(table=[hive.rtdw_dim.mysql_site_war_zone_mapping_relation], joinType=[LeftOuterJoin], async=[false], lookup=[site_id=siteId0], select=[ts, userId, eventType, siteId, siteId0, site_id, site_name])
      +- Exchange(distribution=[hash[siteId0]])
         +- Calc(select=[ts, userId, eventType, siteId, CAST(siteId) AS siteId0], where=[>(userId, 7)])
            +- TableSourceScan(table=[[hive, expdb, kafka_analytics_access_log_app]], fields=[ts, userId, eventType, columnType, fromType, grouponId, siteId, merchandiseId], hints=[[[OPTIONS options:{properties.group.id=DiveIntoBlinkExp, scan.startup.mode=latest-offset}]]])

== Optimized Execution Plan ==
Sink(table=[hive.expdb.print_joined_result], fields=[tss, userId, eventType, siteId, siteName])
+- Calc(select=[FROM_UNIXTIME((ts / 1000), _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS tss, userId, eventType, siteId, site_name AS siteName])
   +- LookupJoin(table=[hive.rtdw_dim.mysql_site_war_zone_mapping_relation], joinType=[LeftOuterJoin], async=[false], lookup=[site_id=siteId0], select=[ts, userId, eventType, siteId, siteId0, site_id, site_name])
      +- Exchange(distribution=[hash[siteId0]])
         +- Calc(select=[ts, userId, eventType, siteId, CAST(siteId) AS siteId0], where=[(userId > 7)])
            +- TableSourceScan(table=[[hive, expdb, kafka_analytics_access_log_app]], fields=[ts, userId, eventType, columnType, fromType, grouponId, siteId, merchandiseId], hints=[[[OPTIONS options:{properties.group.id=DiveIntoBlinkExp, scan.startup.mode=latest-offset}]]])

The End

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 156,423评论 4赞 359
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 66,339评论 1赞 289
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 106,241评论 0赞 237
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 43,503评论 0赞 203
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 51,824评论 3赞 285
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,262评论 1赞 207
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,615评论 2赞 309
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,337评论 0赞 194
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 33,989评论 1赞 238
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,300评论 2赞 240
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 31,829评论 1赞 256
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,193评论 2赞 250
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 32,753评论 3赞 230
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 25,970评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,708评论 0赞 192
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,295评论 2赞 267
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,207评论 2赞 258

From Calcite to Tampering with Flink SQL

From Calcite to Tampering with Flink SQL

Prerequisites

(Review) Some Relational Algebra

Calcite In A Nutshell

What is it

Architecture

Fundamental Concepts

Process Flow

A Quick Calcite Show

Prepare Schema and SQL

Parsing

Validation

Planning

Optimization

Dive Into Blink Stream Planner

Overview

SQL for Example

Parsing & Validation

Logical Planning

All-Over Optimization w/ Physical Planning

Execution Planning & Codegen

Get Our Hands Dirty

Question

Define An Option

Customize A Rule

Put Into Rule Set

Have A Try

The End

推荐阅读更多精彩内容