2. LevelDB源码剖析之基础部件-AtomicPointer、Arena、Slice

在分析LevelDB各种实现细节之前，先来了解一下LevelDB的各个基础部件。

2.1 AtomicPointer

LevelDB有一个port目录，port目录下所有实现都是平台相关的，而所有在port之外的代码则是平台无关的。这就保证了LevelDB的跨平台特性，而AtomicPointer也在其中。
当然，跨平台只是AtomicPointer的附加属性，其根本目的在于实现原子指针，代码如下：

class AtomicPointer
{
private:
  void *rep_;

public:
  AtomicPointer() {}
  explicit AtomicPointer(void *p) : rep_(p) {}
  inline void *NoBarrier_Load() const { return rep_; }
  inline void NoBarrier_Store(void *v) { rep_ = v; }
  inline void *Acquire_Load() const
  {
    void *result = rep_;
    MemoryBarrier();
    return result;
  }
  inline void Release_Store(void *v)
  {
    MemoryBarrier();
    rep_ = v;
  }

刚刚提到AtomicPointer用于实现原子指针的描述是有偏颇的，考虑如下两个问题：

代码中NoBarrier_Store/NoBarrier_Load操作只是最简单的指针操作，那么这些操作是原子的么？
Acquire_Load/Release_Store操作增加了MemoryBarrier操作，其作用是什么？又如何保证原子性呢？

2.1.1 指针操作的原子性

《Intel® 64 and IA-32 Architectures
Software Developer’s Manual》8.1.1节描述如下：

8.1.1 Guaranteed Atomic Operations
The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:
• Reading or writing a byte
• Reading or writing a word aligned on a 16-bit boundary
• Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:
• Reading or writing a quadword aligned on a 64-bit boundary
• 16-bit accesses to uncached memory locations that fit within a 32-bit data bus
The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:
• Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line
Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

简单来说，在不跨越cacheline情况下，Intel处理器保证指针操作的原子性；跨域cacheline情况下，部分处理器提供了原子保证。在通常情况下，C++ new出来的指针及对象内部数据都是cacheline对其的，但如果使用 align 1 byte或者采用c++ placement new等特性时可能出现指针对象跨越cacheline的情况。
在LevelDB中，指针操作是cacheline对齐的，因此问题一种NoBarrier_*的指针操作本身是原子的。那么，为何还需要Acqiure_Load和Release_Store呢？来看下一节。

2.1.2 Memory Barrier

CPU可以保证指针操作的原子性，但编译器、CPU指令优化--重排序(reorder)可能导致指令乱序，在多线程情况下程序运行结果不符合预期。关于重排序说明如下:

单核单线程时，重排序保证单核单线程下程序运行结果一致。
单核多线程时，编译器reorder可能导致运行结果不一致。参见《memory-ordering-at-compile-time》。
多核多线程时，编译器reorder、CPU reorder将导致运行结果不一致。参见《memory-reordering-caught-in-the-act》。

避免编译器Reorder通常的做法是引入Compiler Barrier(或称之为Memory Barrier)，避免CPU Reorder通常的做法是引入CPU Barrier(或称之为Full Memory Barrier)。LevelDB引入的是Memory Barrier，必然只是为了解决编译器Reorder问题。
不同处理器支持的Memory Barrier指令不同，有些甚至不支持Memory Barrier，对于此类场景LevelDB采用C++ 11标准库中的std::atomic<T>实现。以x86下的MemoryBarrier为例：

inline void MemoryBarrier()
{
  // See http://gcc.gnu.org/ml/gcc/2003-04/msg01180.html for a discussion on
  // this idiom. Also see http://en.wikipedia.org/wiki/Memory_ordering.
  asm volatile(""
               :
               :
               : "memory");
}

volatile表示阻止编译器对该值进行优化，强制变量使用精确内存地址（非 cache或register），memory表示对内存有修改操作，需要重新读入。
AtomicPointer只解决了编译器重排序问题，对CPU重排序并未做任何防护，这是否意味着Leveldb在多核环境下运行是有问题的呢?? 实际上不然，因为Leveldb做了一个隐含保证：所有的AtomicPointer都是多读单写的，CPU重排序只有在并发写场景下才会有问题。

最后需要说明的是，如果AtomicPointer中不是inline函数(显示指定非inline，避免编译器优化为inline)，我们并不需要采用Memory Barrier，因为函数调用本身就是一种Memory Barrier。引用《memory-ordering-at-compile-time》中相关描述：

In fact, the majority of function calls act as compiler barriers, whether they contain their own compiler barrier or not. This excludes inline functions, functions declared with thepure attribute, and cases where link-time code generation is used. Other than those cases, a call to an external function is even stronger than a compiler barrier, since the compiler has no idea what the function’s side effects will be. It must forget any assumptions it made about memory that is potentially visible to that function.

当然，这并不是说作者多此一举，采用inline+Memory Barrier将获取更好的性能、并解除了对编译器依赖。
至此，我们分别回答了文章开始提出的两个问题，总结如下：

问：代码中NoBarrier_Store/NoBarrier_Load操作只是最简单的指针操作，那么这些操作是原子的么？
答：在不跨越cacheline情况下，Intel处理器保证指针操作的原子性；跨域cacheline情况下，部分处理器提供了原子保证。LevelDB场景下不存在跨cacheline场景，因此这部分操作是原子的。
问：Acquire_Load/Release_Store操作增加了MemoryBarrier操作，其作用是什么？又如何保证原子性呢？
答：增加Memory Barrier是为了避免编译器重排序，保证MemoryBarrier前的全部操作真正在Memory Barrier前执行。

再来追加提出几个问题，相信解答这几个问题后，你对AtomicPointer会有一个完整的理解：

问：为何要设计这样两组操作？
答：性能。NoBarrier_Store/NoBarrier_Load的性能要优于Acquire_Load/Release_Store，但Acquire_Load/Release_Store可以避免编译器优化，由此保证load/store时指针里面的数据一定是最新的。
问：LevelDB代码中如何选择何时使用何种操作？
答：时刻小心。在任意一个用到指针的场景，结合上下文+并发考量选择合适的load/store方法。当然，一个比较保守的做法是，所有的场景下都使用带Memory Barrier的load/store方法，仅当确定可以使用NoBarrier的load/store方法才将其替换掉。

2.2 Arena

Arena用于内存管理，其存在的价值在于：

提高程序性能。减少Heap调用次数，由Arena统一分配后返回到应用层。
降低程序复杂度。分配后无需执行dealloc，当Arena对象释放时，统一释放由其创建的所有内存。
便于内存统计。如Arena分配的整体内存大小等信息。

class Arena
{
public:
  Arena();
  ~Arena();

  // Return a pointer to a newly allocated memory block of "bytes" bytes.
  char *Allocate(size_t bytes);

  // Allocate memory with the normal alignment guarantees provided by malloc
  char *AllocateAligned(size_t bytes);

  // Returns an estimate of the total memory usage of data allocated
  // by the arena.
  size_t MemoryUsage() const
  {
    return reinterpret_cast<uintptr_t>(memory_usage_.NoBarrier_Load());
  }

private:
  char *AllocateFallback(size_t bytes);
  char *AllocateNewBlock(size_t block_bytes);

  // Allocation state
  char *alloc_ptr_;              //当前block当前位置指针
  size_t alloc_bytes_remaining_; //当前block可用内存大小

  // Array of new[] allocated memory blocks
  std::vector<char *> blocks_; //创建的全部内存块

  // Total memory usage of the arena.
  port::AtomicPointer memory_usage_; //目前为止分配的内存总量

  // No copying allowed
  Arena(const Arena &);
  void operator=(const Arena &);
};

Arena为LevelDB定制的内存管理器，并不保证线程安全，消费者为MemTable、SkipList，有几个小技巧值得学习。

2.2.1 非边界对齐内存分配

函数定义：

inline char *Arena::Allocate(size_t bytes)
{
  // The semantics of what to return are a bit messy if we allow
  // 0-byte allocations, so we disallow them here (we don't need
  // them for our internal use).
  assert(bytes > 0);
  //优先从已分配内存中做二次分配
  if (bytes <= alloc_bytes_remaining_)
  {
    char *result = alloc_ptr_;
    alloc_ptr_ += bytes;
    alloc_bytes_remaining_ -= bytes;
    return result;
  }

  //仅当现有内存不足时，从操作系统中分配
  return AllocateFallback(bytes);
}

唯一消费者：

void MemTable::Add(SequenceNumber s, ValueType type,
                   const Slice& key,
                   const Slice& value) {
  // Format of an entry is concatenation of:
  //  key_size     : varint32 of internal_key.size()
  //  key bytes    : char[internal_key.size()]
  //  value_size   : varint32 of value.size()
  //  value bytes  : char[value.size()]
  size_t key_size = key.size();
  size_t val_size = value.size();
  size_t internal_key_size = key_size + 8;
  const size_t encoded_len =
      VarintLength(internal_key_size) + internal_key_size +
      VarintLength(val_size) + val_size;
  
  // 分配数据区
  char* buf = arena_.Allocate(encoded_len);
  char* p = EncodeVarint32(buf, internal_key_size);
  memcpy(p, key.data(), key_size);
  p += key_size;
  EncodeFixed64(p, (s << 8) | type);
  p += 8;
  p = EncodeVarint32(p, val_size);
  memcpy(p, value.data(), val_size);
  assert((p + val_size) - buf == encoded_len);
  table_.Insert(buf);
}

allocate函数出现有几个目的:

分配数据区。唯一消费者MemTable中存储的是数据对象，而非数据结构。
性能优化。包括采用inline形式、预先分配4k内存等。
和AllocateAligned相比，更充分利用内存，减少实际像OS申请内存的次数。

2.2.2 边界对齐的内存分配

char *Arena::AllocateAligned(size_t bytes)
{
  //最小8字节对齐
  const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;
  assert((align & (align - 1)) == 0); // Pointer size should be a power of 2
  size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);
  size_t slop = (current_mod == 0 ? 0 : align - current_mod);
  size_t needed = bytes + slop;
  char *result;
  if (needed <= alloc_bytes_remaining_)
  {
    result = alloc_ptr_ + slop;
    alloc_ptr_ += needed;
    alloc_bytes_remaining_ -= needed;
  }
  else
  {
    // AllocateFallback always returned aligned memory
    result = AllocateFallback(bytes);
  }
  assert((reinterpret_cast<uintptr_t>(result) & (align - 1)) == 0);
  return result;
}

唯一消费者：

template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node*
SkipList<Key,Comparator>::NewNode(const Key& key, int height) {
  char* mem = arena_->AllocateAligned(
      sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));
  return new (mem) Node(key);
}

总结：

AllocateAligned用于分配数据结构对象。不采用Allocate是为了避免出现边界不对其导致指针操作的非原子性。
性能优化。包括采用inline形式、预先分配4k内存等。

2.3 Slice

Slice的含义和其名称一致，代表了一个数据块，data_为数据地址，size_为数据长度。在LevelDB中一般用于传递Key、Value或编解码处理后的数据块。
Slice一般和Arena配合使用，其仅保持了数据信息，并未拥有数据的所有权。而数据在Arena对象的整个声明周期内有效。
和string相比，Slice具有的明显好处包括：避免不必要的拷贝动作、具有比string更丰富的语义(可包含任意内容)。

class Slice {
    public:
            ......
    private:
        const char* data_;
        size_t size_;
    };

2.4 总结

原子指针(AtomicPointer)是通用的工具类，为了高性能牺牲了部分可读性(不可避免)。

Arena和Slice是为LevelDB定制的数据结构，通过Arena有效减少了实际内存分配频率，但降低了内存使用率。Slice则用于各个流程间数据传递，减少不必要的数据拷贝开销。

额外聊一点，DPDK(Data Plane Development Kit)也是对性能要求极高的开源框架，但定位和LevelDB完全不同。DPDK主要处理网络数据包转发，应用于NFV场景。其对内存的处理上采用了大页内存、连续物理内存等方式提升程序性能，但这要求其独占一台VM。LevelDB后续版本，如果性能上想进一步提升可以从这点上做些文章。

参考文章：
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
http://www.voidcn.com/blog/chj90220/article/p-6069844.html
http://www.pandademo.com/2016/03/atomicpointer-leveldb-source-dissect-2/
其他相关资料：
an-introduction-to-lock-free-programming
memory-ordering-at-compile-time
acquire-and-release-fences
memory-barriers-are-like-source-control-operations
memory-reordering-caught-in-the-act
acquire-and-release-semantics

转载请注明：【随安居士】http://www.jianshu.com/p/3161784e7573

最后编辑于：2017.12.08 03:03:36

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 159,835评论 4赞 364
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,598评论 1赞 295
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 109,569评论 0赞 244
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 44,159评论 0赞 213
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,533评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,710评论 1赞 222
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,923评论 2赞 313
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,674评论 0赞 203
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,421评论 1赞 246
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,622评论 2赞 245
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,115评论 1赞 260
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,428评论 2赞 254
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,114评论 3赞 238
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,097评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,875评论 0赞 197
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,753评论 2赞 276
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,649评论 2赞 271