Flink 源码周边之Maven Plugin

Flink源码分析系列文档目录

请点击:Flink 源码分析系列文档目录

前言

为了实现构建自动化,Flink项目使用了多种Maven插件,从而将构建过程各个步骤做成标准的、配置式流程。我们可以根据实际需求,将这些plugin应用在自己的项目中。提高日常工作效率。

Maven assembly plugin

Maven assembly plugin用于组装项目。Flink包含众多子项目,他们的编译输出自然也是分散的。Flink版本发布包不仅包含了Java源码编译输出,还有shell脚本,example项目,配置文件等等。正是Maven assembly plugin这一插件,帮我们完成了从编译输出到组装Flink发版包这个过程。

配置

Maven assembly plugin的配置方式如下方xml:

<project>
  [...]
  <build>
    [...]
    <plugins>
      <plugin>
        <!-- 启用 assembly plugin -->
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.3.0</version>
        <configuration>
          <!-- 指定插件的assembly descriptor -->
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
        <executions>
          <execution>
            <!-- 将goal single和package构建阶段绑定在一起 -->
            <id>make-assembly</id> <!-- this is used for inheritance merges -->
            <phase>package</phase> <!-- bind to the packaging phase -->
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      [...]
</project>

我们发现,Assembly Plugin组装时候的具体行为在descriptorRef中指定。它对应一个专用的文件,叫做Assembly Descriptor。下面为大家讲解它的用法。

Assembly descriptor

即组装描述符,用于告知assembly plugin组装项目的时候,具体需要如何操作。接下来讲解它的编写方式和作用。下面的例子均引用Flink源代码项目的片段。

文件格式如下:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
  [...]
  <!-- 自己的配置项 -->
  <dependencySets>
    <dependencySet>
      <includes>
        <include>*:war</include>
      </includes>
    </dependencySet>
  </dependencySets>
  [...]
</assembly>

dependencySets标签

该标签包含一个或多个dependencySet。用于将pom依赖复制到目标目录。配置方式如下:

<dependencySet>
    <!-- 输出目录 -->
    <outputDirectory>lib</outputDirectory>
    <!-- 是否解压 -->
    <unpack>false</unpack>
    <!-- 是否添加本项目依赖到此DependencySet中 -->
    <useProjectArtifact>false</useProjectArtifact>
    <!-- 是否添加本项目的附件到此DependencySet中 -->
    <useProjectAttachments>false</useProjectAttachments>
    <!-- 是否添加间接依赖到此DependencySet中 -->
    <useTransitiveDependencies>true</useTransitiveDependencies>
    <!-- 此依赖项集中的包含/排除模式是否将应用于给定artifact的传递路径 -->
    <useTransitiveFiltering>true</useTransitiveFiltering>

    <!-- 包含哪些依赖 -->
    <!-- 格式为groupId:artifactId:type:classifier:version -->
    <includes>
        <include>org.apache.logging.log4j:log4j-api</include>
        <include>org.apache.logging.log4j:log4j-core</include>
        <include>org.apache.logging.log4j:log4j-slf4j-impl</include>
        <include>org.apache.logging.log4j:log4j-1.2-api</include>
    </includes>
</dependencySet>

fileSets标签

包含一个或多个fileSet标签。用于复制一组文件到输出目录。

<fileSet>
    <!-- 复制directory目录中的所有文件到outputDirectory目录中 -->
    <directory>src/main/flink-bin/bin</directory>
    <outputDirectory>bin</outputDirectory>
    <!-- 设定目标位置文件权限 -->
    <fileMode>0755</fileMode>
</fileSet>

fileSet标签内可以使用includesexcludes标签,配合通配符,起到按照文件名过滤的功能。如下:

<fileSet>
    <directory>../flink-examples/flink-examples-streaming/target</directory>
    <outputDirectory>examples/streaming</outputDirectory>
    <fileMode>0644</fileMode>
    <!-- 包含所有jar文件 -->
    <includes>
        <include>*.jar</include>
    </includes>
    <!-- 再排除掉这些jar文件 -->
    <excludes>
        <exclude>flink-examples-streaming*.jar</exclude>
        <exclude>original-*.jar</exclude>
        <exclude>MatrixVectorMul.jar</exclude>
    </excludes>
</fileSet>

files标签

fileSets类似,用于复制单个文件到输出目录。包含一个或多个file标签。

<file>
    <source>../flink-table/flink-table-uber/target/flink-table-uber_${scala.binary.version}-${project.version}.jar</source>
    <outputDirectory>lib/</outputDirectory>
    <destName>flink-table_${scala.binary.version}-${project.version}.jar</destName>
    <fileMode>0644</fileMode>
</file>

assembly plugin内置的descriptor

编写Assembly descriptor不是一件轻松的事。对于一些常用的操作,Maven assembly plugin已经为我们提供了内置的Assembly descriptor,使用方法参见Pre-defined Descriptor Files

bin

生成项目默认的二进制发版包。完整的descriptor如下:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
  <id>bin</id>
  <formats>
    <format>tar.gz</format>
    <format>tar.bz2</format>
    <format>zip</format>
  </formats>
  <fileSets>
    <fileSet>
      <directory>${project.basedir}</directory>
      <outputDirectory></outputDirectory>
      <includes>
        <include>README*</include>
        <include>LICENSE*</include>
        <include>NOTICE*</include>
      </includes>
    </fileSet>
    <fileSet>
      <directory>${project.build.directory}</directory>
      <outputDirectory></outputDirectory>
      <includes>
        <include>*.jar</include>
      </includes>
    </fileSet>
    <fileSet>
      <directory>${project.build.directory}/site</directory>
      <outputDirectory>docs</outputDirectory>
    </fileSet>
  </fileSets>
</assembly>

jar-with-dependencies

将项目编译后连同依赖一起,打成jar包输出。完整的descriptor如下:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
  <!-- TODO: a jarjar format would be better -->
  <id>jar-with-dependencies</id>
  <formats>
    <format>jar</format>
  </formats>
  <includeBaseDirectory>false</includeBaseDirectory>
  <dependencySets>
    <dependencySet>
      <outputDirectory>/</outputDirectory>
      <useProjectArtifact>true</useProjectArtifact>
      <unpack>true</unpack>
      <scope>runtime</scope>
    </dependencySet>
  </dependencySets>
</assembly>

src

将项目源代码打包输出。完整的descriptor如下:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
  <id>src</id>
  <formats>
    <format>tar.gz</format>
    <format>tar.bz2</format>
    <format>zip</format>
  </formats>
  <fileSets>
    <fileSet>
      <directory>${project.basedir}</directory>
      <includes>
        <include>README*</include>
        <include>LICENSE*</include>
        <include>NOTICE*</include>
        <include>pom.xml</include>
      </includes>
      <useDefaultExcludes>true</useDefaultExcludes>
    </fileSet>
    <fileSet>
      <directory>${project.basedir}/src</directory>
      <useDefaultExcludes>true</useDefaultExcludes>
    </fileSet>
  </fileSets>
</assembly>

project

除了编译输出目录外,将整个项目打包输出。完整的descriptor如下:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.1.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.1.0 http://maven.apache.org/xsd/assembly-2.1.0.xsd">
  <id>project</id>
  <formats>
    <format>tar.gz</format>
    <format>tar.bz2</format>
    <format>zip</format>
  </formats>
  <fileSets>
    <fileSet>
      <directory>${project.basedir}</directory>
      <outputDirectory></outputDirectory>
      <useDefaultExcludes>true</useDefaultExcludes>
      <excludes>
        <exclude>**/*.log</exclude>
        <exclude>**/${project.build.directory}/**</exclude>
      </excludes>
    </fileSet>
  </fileSets>
</assembly>

Maven checkstyle plugin

Maven代码规范检测插件。Checkstyle是一种代码规约工具。Maven checkstyle plugin用于将代码规约检查和构建过程绑定一起,实现代码检查自动化。

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-checkstyle-plugin</artifactId>
    <version>2.17</version>
    <dependencies>
        <dependency>
            <groupId>com.puppycrawl.tools</groupId>
            <artifactId>checkstyle</artifactId>
            <!-- Note: match version with docs/flinkDev/ide_setup.md -->
            <version>8.14</version>
        </dependency>
    </dependencies>
    <executions>
        <execution>
            <!-- 绑定validate阶段运行 -->
            <id>validate</id>
            <phase>validate</phase>
            <goals>
                <goal>check</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <!-- 过滤规则文件路径,该过滤规则匹配的文件不接受checkstyle检查 -->
        <suppressionsLocation>/tools/maven/suppressions.xml</suppressionsLocation>
        <!-- 是否检查测试源代码目录 -->
        <includeTestSourceDirectory>true</includeTestSourceDirectory>
        <!-- 自定义checkstyle规则文件路径 -->
        <configLocation>/tools/maven/checkstyle.xml</configLocation>
        <!-- 是否打印规则违反情况到控制台 -->
        <logViolationsToConsole>true</logViolationsToConsole>
        <!-- 发现不合规则的地方是否立即失败退出 -->
        <failOnViolation>true</failOnViolation>
    </configuration>
</plugin>

规则文件的编写很复杂,大家可参考Flink源代码中的tools/maven/checkstyle.xml文件。对于自己的项目,建议直接使用社区或厂商现成的规范。如果需要自己编写或加工,可参考checkstyle 官方网站

Maven enforcer plugin

用于检查编译环境是否符合要求。例如Maven版本,JDK版本和操作系统类型等。

官网的配置示例如下:

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-enforcer-plugin</artifactId>
        <version>3.0.0-M3</version>
        <executions>
          <execution>
            <id>enforce-versions</id>
            <goals>
              <goal>enforce</goal>
            </goals>
            <configuration>
              <rules>
                <!-- 填写禁止使用的maven plugin -->
                <!-- 这里禁止使用maven-verifier-plugin -->
                <bannedPlugins>
                  <!-- will only display a warning but does not fail the build. -->
                  <!-- 级别为警告,只显示告警信息,并不会构建失败 -->
                  <level>WARN</level>
                  <excludes>
                    <exclude>org.apache.maven.plugins:maven-verifier-plugin</exclude>
                  </excludes>
                  <message>Please consider using the maven-invoker-plugin (http://maven.apache.org/plugins/maven-invoker-plugin/)!</message>
                </bannedPlugins>
                <!-- 要求maven的版本号 -->
                <requireMavenVersion>
                  <version>2.0.6</version>
                </requireMavenVersion>
                <!-- 要求JDK版本 -->
                <requireJavaVersion>
                  <version>1.5</version>
                </requireJavaVersion>
                <!-- 要求操作系统类型 -->
                <requireOS>
                  <family>unix</family>
                </requireOS>
              </rules>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  [...]
</project>

其他的检查规则,可参考Apache Maven Enforcer Built-In Rules

Maven spotless plugin

用于统一代码格式检查和修正。官方介绍可参考:spotless/README.md at main · diffplug/spotless (github.com)

我们可以使用:

  • mvn spotless:check 执行代码规范检查
  • mvn spotless:apply 将代码格式化(自动修改代码)

下面是Flink项目中spotless插件的配置方式:

<plugin>
    <groupId>com.diffplug.spotless</groupId>
    <artifactId>spotless-maven-plugin</artifactId>
    <version>${spotless.version}</version>
    <configuration>
        <java>
            <!-- 使用Google的Java代码规范 -->
            <googleJavaFormat>
                <version>1.7</version>
                <style>AOSP</style>
            </googleJavaFormat>

            <!-- \# refers to the static imports -->
            <!-- 指定import语句的顺序,\# 代表静态导入语句 -->
            <importOrder>
                <order>org.apache.flink,org.apache.flink.shaded,,javax,java,scala,\#</order>
            </importOrder>

            <!-- 去掉无用的import语句 -->
            <removeUnusedImports />
        </java>
    </configuration>
    <executions>
        <!-- 绑定validate阶段执行spotless格式检查 -->
        <execution>
            <id>spotless-check</id>
            <phase>validate</phase>
            <goals>
                <goal>check</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Maven shade plugin

为项目提供打uber-jar(将依赖和项目本身编译后的class文件打成一个jar包)的能力。

配置shade plugin的方式如下:

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.2.4</version>
        <configuration>
          <!-- 自定义配置在此处 -->
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
</project>

我们这里还是以Flink源代码根目录的pom.xml为例,分析下这个插件的功能和使用方式。

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <executions>
        <execution>
            <id>shade-flink</id>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <!-- 是否合并test类到uber jar -->
                <shadeTestJar>true</shadeTestJar>
                <!-- 是否再单独输出一个uber jar -->
                <!-- 这里为false,表示只输出一个uber jar -->
                <shadedArtifactAttached>false</shadedArtifactAttached>
                <!-- 是否需要移除已经被打入jar的依赖 -->
                <createDependencyReducedPom>true</createDependencyReducedPom>
                <!-- 生成去除重复依赖的pom文件地址 -->
                <dependencyReducedPomLocation>${project.basedir}/target/dependency-reduced-pom.xml</dependencyReducedPomLocation>
                <!-- Filters MUST be appended; merging filters does not work properly, see MSHADE-305 -->
                <!-- 过滤器配置 -->
                <filters combine.children="append">
                    <!-- Globally exclude log4j.properties from our JAR files. -->
                    <filter>
                        <!-- 排除所有Flink子module中的log4j等包 -->
                        <artifact>*</artifact>
                        <excludes>
                            <exclude>log4j.properties</exclude>
                            <exclude>log4j2.properties</exclude>
                            <exclude>log4j-test.properties</exclude>
                            <exclude>log4j2-test.properties</exclude>
                        </excludes>
                    </filter>
                    <!-- drop entries into META-INF and NOTICE files for the dummy artifact -->
                    <!-- 排除force-shading子module下所有内容 -->
                    <filter>
                        <artifact>org.apache.flink:force-shading</artifact>
                        <excludes>
                            <exclude>**</exclude>
                        </excludes>
                    </filter>
                    <!-- io.netty:netty brings its own LICENSE.txt which we don't need -->
                    <!-- 排除io.netty:netty项目中的LICENSE文件 -->
                    <filter>
                        <artifact>io.netty:netty</artifact>
                        <excludes>
                            <exclude>META-INF/LICENSE.txt</exclude>
                        </excludes>
                    </filter>
                </filters>
                <artifactSet>
                    <includes>
                        <!-- Unfortunately, the next line is necessary for now to force the execution
         of the Shade plugin upon all sub modules. This will generate effective poms,
         i.e. poms which do not contain properties which are derived from this root pom.
         In particular, the Scala version properties are defined in the root pom and without
         shading, the root pom would have to be Scala suffixed and thereby all other modules.
         -->
                        <!-- 包含force-shading项目 -->
                        <include>org.apache.flink:force-shading</include>
                    </includes>
                </artifactSet>
                <transformers combine.children="append">
                    <!-- The service transformer is needed to merge META-INF/services files -->
                    <!-- 将META-INF/services文件合并 -->
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    <!-- The ApacheNoticeResourceTransformer collects and aggregates NOTICE files -->
                    <!-- 将NOTICE文件合并 -->
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
                        <projectName>Apache Flink</projectName>
                        <encoding>UTF-8</encoding>
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>

除此之外,Maven shade plugin还可以创建可执行jar包。实例配置如下:

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.2.4</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <!-- 指定入口类全限定名 -->
                  <mainClass>org.sonatype.haven.HavenCli</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
</project>

通过这两个例子可以看到,Maven shade plugin提供了多种transformer。这些transformer可以理解为官方提供的常用的转换功能。官方提供的几个转换器介绍如下:

转换器 功能
ApacheLicenseResourceTransformer 防止license重复
ApacheNoticeResourceTransformer 合并NOTICE文件
AppendingTransformer 为资源增加内容
ComponentsXmlResourceTransformer 合并 Plexus components.xml
DontIncludeResourceTransformer 阻止包含匹配的资源
IncludeResourceTransformer 增加额外文件到项目中
ManifestResourceTransformer 追加内容到MANIFEST文件
ServicesResourceTransformer 合并所有的 META-INF/services文件
XmlAppendingTransformer 为XML类型资源追加内容

Maven shade plugin的其他功能使用,参见Apache Maven Shade Plugin – Introduction

Maven dependency plugin

用于处理依赖的插件,提供了依赖的分析,列出依赖树,复制依赖到某个目录等。下面介绍几个常用的命令。

dependency:tree

mvn dependency:tree

列出项目的所有依赖树。

dependency:copy-dependencies

mvn dependency:copy-dependencies -DoutputDirectory=src/main/webapp/WEB-INF/lib -DincludeScope=runtime

这个例子,复制项目所有的依赖到src/main/webapp/WEB-INF/lib目录,包含runtime scope的依赖。

当然通过xml方式配置也是可以的。官网的例子如下:

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <version>3.2.0</version>
        <executions>
          <execution>
            <!-- 绑定在package阶段执行copy-dependencies -->
            <id>copy-dependencies</id>
            <phase>package</phase>
            <goals>
              <goal>copy-dependencies</goal>
            </goals>
            <configuration>
              <!-- 指定输出目录 -->
              <outputDirectory>${project.build.directory}/alternateLocation</outputDirectory>
              <!-- 重写策略,是否重写release的构件 -->
              <overWriteReleases>false</overWriteReleases>
              <!-- 重写策略,是否重写snapshot的构件 -->
              <overWriteSnapshots>true</overWriteSnapshots>
              <!-- 是否排除间接依赖 -->
              <excludeTransitive>true</excludeTransitive>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  [...]
</project>

Maven frontend plugin

该插件可以通过maven下载并安装nodejs,然后执行配置好的node/npm命令。从而实现了maven可以一同编译前端工程的功能。

我们以flink-runtime-web为例,讲解frontend plugin的使用方式。

<plugin>
    <groupId>com.github.eirslett</groupId>
    <artifactId>frontend-maven-plugin</artifactId>
    <version>1.6</version>
    <executions>
        <!-- 安装node和npm -->
        <execution>
            <id>install node and npm</id>
            <goals>
                <goal>install-node-and-npm</goal>
            </goals>
            <configuration>
                <!-- 指定node版本 -->
                <!-- 还可以指定npmVersion(npm版本)和downloadRoot(从哪里下载nodejs) -->
                <nodeVersion>v10.9.0</nodeVersion>
            </configuration>
        </execution>
        <!-- 执行npm命令 -->
        <!-- 这里执行的是npm ci --cache-max=0 --no-save -->
        <execution>
            <id>npm install</id>
            <goals>
                <goal>npm</goal>
            </goals>
            <configuration>
                <arguments>ci --cache-max=0 --no-save</arguments>
                <environmentVariables>
                    <HUSKY_SKIP_INSTALL>true</HUSKY_SKIP_INSTALL>
                </environmentVariables>
            </configuration>
        </execution>
        <!-- 执行npm run build -->
        <execution>
            <id>npm run build</id>
            <goals>
                <goal>npm</goal>
            </goals>
            <configuration>
                <arguments>run build</arguments>
            </configuration>
        </execution>
    </executions>
    <configuration>
        <!-- 设置工作目录 -->
        <workingDirectory>web-dashboard</workingDirectory>
    </configuration>
</plugin>

更详细的使用方式,参见https://github.com/eirslett/frontend-maven-plugin

推荐阅读更多精彩内容