Kafka的日志清理-LogCleaner

  • 这里说的日志,是指Kafka保存写入消息的文件;
  • Kafka日志清除策略包括中间:
    1. 基于时间和大小的删除策略;
    2. Compact清理策略;
  • 我们这里主要介绍基于Compact策略的Log Clean;

Compact策略说明
  • Kafka官网介绍: Log compaction;
  • Compact就是压缩, 只能针对特定的topic应用此策略,即写入的message都带有Key, 合并相同Keymessage, 只留下最新的message;
  • 在压缩过程中, 针对message的payload为null的也将会去除掉;
  • 官网上扒了一张图, 大家先感受下:
110.png
日志清理过程中的状态
  • 主要涉及三种状态: LogCleaningInProgress, LogCleaningAborted,和LogCleaningPaused, 从字面上就很容易理解是什么意思,下面是源码中的注释:
  • If a partition is to be cleaned, it enters the LogCleaningInProgress state.
  • While a partition is being cleaned, it can be requested to be aborted and paused. Then the partition first enters
  • the LogCleaningAborted state. Once the cleaning task is aborted, the partition enters the LogCleaningPaused state.
  • While a partition is in the LogCleaningPaused state, it won't be scheduled for cleaning again, until cleaning is requested to be resumed.
  • LogCleanerManager类 管理所有清理的log的状态及转换:
def abortCleaning(topicAndPartition: TopicAndPartition)
def abortAndPauseCleaning(topicAndPartition: TopicAndPartition)
def resumeCleaning(topicAndPartition: TopicAndPartition)
def checkCleaningAborted(topicAndPartition: TopicAndPartition) 
要清理的日志的选取
  • 因为这个compact清理过程涉及到log和index等文件的重写,比较耗IO, 因此kafka会作流控, 每次compact时都会先按规则确定要清理哪些TopicAndPartiton的log;
  • 使用LogToClean类来表示要被清理的Log:
private case class LogToClean(topicPartition: TopicAndPartition, log: Log, firstDirtyOffset: Long) extends Ordered[LogToClean] {
  val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size).sum
  val dirtyBytes = log.logSegments(firstDirtyOffset, math.max(firstDirtyOffset, log.activeSegment.baseOffset)).map(_.size).sum
  val cleanableRatio = dirtyBytes / totalBytes.toDouble
  def totalBytes = cleanBytes + dirtyBytes
  override def compare(that: LogToClean): Int = math.signum(this.cleanableRatio - that.cleanableRatio).toInt
}
  1. firstDirtyOffset:表示本次清理的起始点, 其前边的offset将被作清理,与在其后的messagekey的合并;
  2. val cleanableRatio = dirtyBytes / totalBytes.toDouble, 需要清理的log的比例,这个值越大,越可能被最后选中作清理;
  3. 每次清理完,要更新当前已经清理到的位置, 记录在cleaner-offset-checkpoint文件中,作为下一次清理时生成firstDirtyOffset的参考;
def updateCheckpoints(dataDir: File, update: Option[(TopicAndPartition,Long)]) {
    inLock(lock) {
      val checkpoint = checkpoints(dataDir)
      val existing = checkpoint.read().filterKeys(logs.keys) ++ update
      checkpoint.write(existing)
    }
  }
  • 选出最需要清理的日志:
def grabFilthiestLog(): Option[LogToClean] = {
    inLock(lock) {
      val lastClean = allCleanerCheckpoints()
      val dirtyLogs = logs.filter {
        case (topicAndPartition, log) => log.config.compact  // skip any logs marked for delete rather than dedupe
      }.filterNot {
        case (topicAndPartition, log) => inProgress.contains(topicAndPartition) // skip any logs already in-progress
      }.map {
        case (topicAndPartition, log) => // create a LogToClean instance for each
          // if the log segments are abnormally truncated and hence the checkpointed offset
          // is no longer valid, reset to the log starting offset and log the error event
          val logStartOffset = log.logSegments.head.baseOffset
          val firstDirtyOffset = {
            val offset = lastClean.getOrElse(topicAndPartition, logStartOffset)
            if (offset < logStartOffset) {
              error("Resetting first dirty offset to log start offset %d since the checkpointed offset %d is invalid."
                    .format(logStartOffset, offset))
              logStartOffset
            } else {
              offset
            }
          }
          LogToClean(topicAndPartition, log, firstDirtyOffset)
      }.filter(ltc => ltc.totalBytes > 0) // skip any empty logs

      this.dirtiestLogCleanableRatio = if (!dirtyLogs.isEmpty) dirtyLogs.max.cleanableRatio else 0
      // and must meet the minimum threshold for dirty byte ratio
      val cleanableLogs = dirtyLogs.filter(ltc => ltc.cleanableRatio > ltc.log.config.minCleanableRatio)
      if(cleanableLogs.isEmpty) {
        None
      } else {
        val filthiest = cleanableLogs.max
        inProgress.put(filthiest.topicPartition, LogCleaningInProgress)
        Some(filthiest)
      }
    }
  }

代码看着多,实在比较简单:

  1. 从所有的Log中产生出 LogToClean对象列表;
  2. 从1中获得的LogToClean列表中过滤过cleanableRatio大于config中配置的清理比率的LogToClean;
  3. 从2中获取的LogToClean列表中取cleanableRatio最大的,即为当前最需要被清理的.
先放两张网上扒来的图:
111.png
  1. 这里的CleanerPoint就是我们上面说的firstDirtyOffset;
  2. Log Tail中的key将被合并到 LogHead中,实际上因为构建OffsetMap是在Log Head部分,因此合并Key的部分还包括构建OffsetMap最后到达的Offset位置;

下面这个是整个压缩合并的过程, Kafka的代码就是把这个过程翻译成Code

112.png

构建OffsetMap
  • 构建上面图111.png中LogHead部分的所有日志的OffsetMap, 此Map中的key即为message.key的hash值, value即为当前message的offset
  • 实现:
private[log] def buildOffsetMap(log: Log, start: Long, end: Long, map: OffsetMap): Long = {
    map.clear()
    val dirty = log.logSegments(start, end).toSeq
    info("Building offset map for log %s for %d segments in offset range [%d, %d).".format(log.name, dirty.size, start, end))
    
    // Add all the dirty segments. We must take at least map.slots * load_factor,
    // but we may be able to fit more (if there is lots of duplication in the dirty section of the log)
    var offset = dirty.head.baseOffset
    require(offset == start, "Last clean offset is %d but segment base offset is %d for log %s.".format(start, offset, log.name))
    val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
    var full = false
    for (segment <- dirty if !full) {
      checkDone(log.topicAndPartition)
      val segmentSize = segment.nextOffset() - segment.baseOffset

      require(segmentSize <= maxDesiredMapSize, "%d messages in segment %s/%s but offset map can fit only %d. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads".format(segmentSize,  log.name, segment.log.file.getName, maxDesiredMapSize))
      if (map.size + segmentSize <= maxDesiredMapSize)
        offset = buildOffsetMapForSegment(log.topicAndPartition, segment, map)
      else
        full = true
    }
    info("Offset map for log %s complete.".format(log.name))
    offset
  }
  1. 顺序读取每个LogSegment, 将相关信息put到OffsetMap, 其中的keymessage.key的hash值, 这个地方有个坑,如果出现了hash碰撞怎么?
  2. build的OffsetMap有大小限制, 不能超过val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt.
重新分组需要清理的LogSegments
  • 因为压缩清理后,原来的单个LogSegment势必大小要减少,因此需要重新分组来为重写LogIndex文件作准备;
  • 分组的规则也很简单: 根据segmentsizeindexsize进行分组,这个分组是每一组的segmentsize不能超过segmentSize的配置大小,indexfile不能超过配置的最大indexsize的大小,同时条数不能超过int.maxvalue.
private[log] def groupSegmentsBySize(segments: Iterable[LogSegment], maxSize: Int, maxIndexSize: Int): List[Seq[LogSegment]] = {
    var grouped = List[List[LogSegment]]()
    var segs = segments.toList
    while(!segs.isEmpty) {
      var group = List(segs.head)
      var logSize = segs.head.size
      var indexSize = segs.head.index.sizeInBytes
      segs = segs.tail
      while(!segs.isEmpty &&
            logSize + segs.head.size <= maxSize &&
            indexSize + segs.head.index.sizeInBytes <= maxIndexSize &&
            segs.head.index.lastOffset - group.last.index.baseOffset <= Int.MaxValue) {
        group = segs.head :: group
        logSize += segs.head.size
        indexSize += segs.head.index.sizeInBytes
        segs = segs.tail
      }
      grouped ::= group.reverse
    }
    grouped.reverse
  }
按上面重新分成的组作真正的清理工作
  • 清理的过程,遍历所有需要清理的LogSegment, 按一定的规则过滤出需要保留的msg重定入新的Log文件中;
  • 符合下列规则的message将被保留
    1. messagekeyOffsetMap中能找到,同时当前的messageoffset不小于offsetMap中存储的offset;
    2. 这个segment的最后修改时间大于最大的保留时间,同时这个消息的value是有效的value,即不为null;
private def shouldRetainMessage(source: kafka.log.LogSegment,
                                  map: kafka.log.OffsetMap,
                                  retainDeletes: Boolean,
                                  entry: kafka.message.MessageAndOffset): Boolean = {
    val key = entry.message.key
    if (key != null) {
      val foundOffset = map.get(key)
      /* two cases in which we can get rid of a message:
       *   1) if there exists a message with the same key but higher offset
       *   2) if the message is a delete "tombstone" marker and enough time has passed
       */
      val redundant = foundOffset >= 0 && entry.offset < foundOffset
      val obsoleteDelete = !retainDeletes && entry.message.isNull
      !redundant && !obsoleteDelete
    } else {
      stats.invalidMessage()
      false
    }
  }
private[log] def cleanSegments(log: Log,
                                 segments: Seq[LogSegment], 
                                 map: OffsetMap, 
                                 deleteHorizonMs: Long) {
    // create a new segment with the suffix .cleaned appended to both the log and index name
    val logFile = new File(segments.head.log.file.getPath + Log.CleanedFileSuffix)
    logFile.delete()
    val indexFile = new File(segments.head.index.file.getPath + Log.CleanedFileSuffix)
    indexFile.delete()
    val messages = new FileMessageSet(logFile, fileAlreadyExists = false, initFileSize = log.initFileSize(), preallocate = log.config.preallocate)
    val index = new OffsetIndex(indexFile, segments.head.baseOffset, segments.head.index.maxIndexSize)
    val cleaned = new LogSegment(messages, index, segments.head.baseOffset, segments.head.indexIntervalBytes, log.config.randomSegmentJitter, time)

    try {
      // clean segments into the new destination segment
      for (old <- segments) {
        val retainDeletes = old.lastModified > deleteHorizonMs
        info("Cleaning segment %s in log %s (last modified %s) into %s, %s deletes."
            .format(old.baseOffset, log.name, new Date(old.lastModified), cleaned.baseOffset, if(retainDeletes) "retaining" else "discarding"))
        cleanInto(log.topicAndPartition, old, cleaned, map, retainDeletes)
      }

      // trim excess index
      index.trimToValidSize()

      // flush new segment to disk before swap
      cleaned.flush()

      // update the modification date to retain the last modified date of the original files
      val modified = segments.last.lastModified
      cleaned.lastModified = modified

      // swap in new segment
      info("Swapping in cleaned segment %d for segment(s) %s in log %s.".format(cleaned.baseOffset, segments.map(_.baseOffset).mkString(","), log.name))
      log.replaceSegments(cleaned, segments)
    } catch {
      case e: LogCleaningAbortedException =>
        cleaned.delete()
        throw e
    }
  }

Kafka源码分析-汇总

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 141,558评论 1 298
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 60,739评论 1 254
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 93,327评论 0 211
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 40,752评论 0 174
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 48,452评论 1 252
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 38,617评论 1 171
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 30,286评论 2 267
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 29,083评论 0 165
  • 想象着我的养父在大火中拼命挣扎,窒息,最后皮肤化为焦炭。我心中就已经是抑制不住地欢快,这就叫做以其人之道,还治其人...
    爱写小说的胖达阅读 28,839评论 6 227
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 32,413评论 0 213
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 29,186评论 2 213
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 30,506评论 1 223
  • 白月光回国,霸总把我这个替身辞退。还一脸阴沉的警告我。[不要出现在思思面前, 不然我有一百种方法让你生不如死。]我...
    爱写小说的胖达阅读 24,171评论 0 31
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 27,049评论 2 213
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 31,417评论 3 202
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 25,588评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 25,942评论 0 163
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 33,392评论 2 228
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 33,499评论 2 229

推荐阅读更多精彩内容