ART对象内存分配过程解析——内存分配的准备阶段(Android 8.1)

注:本文基于Android 8.1进行分析。

ART对象分配过程解析——内存分配的准备阶段

本章我们将分析Android 8.1中ART虚拟机的对象创建时内存分配过程的分析。本节将介绍内存分配相关的环境准备及各种跳转逻辑。

我们首先从Thread类开始分析。

Thread类

Thread类的Init()方法会进行线程相关的所有初始化工作,例如,初始化Cpu信息,成员函数InitTlsEntryPoints初始化一个外部库函数调用跳转表。例如,Thread类将外部库函数调用跳转表划分为4个,其中,interpreter_entrypoints_描述的是解释器要用到的跳转表,jni_entrypoints_描述的是JNI调用相关的跳转表,portable_entrypoints_描述的是Portable后端生成的本地机器指令要用到的跳转表,而quick_entrypoints_描述的是Quick后端生成的本地机器指令要用到的跳转表。这些函数跳转入口通过访问线程Thread对应的偏移量进入。

Thread的Init方法:

bool Thread::Init(ThreadList* thread_list, JavaVMExt* java_vm, JNIEnvExt* jni_env_ext) {
  // This function does all the initialization that must be run by the native thread it applies to.
  // (When we create a new thread from managed code, we allocate the Thread* in Thread::Create so
  // we can handshake with the corresponding native thread when it's ready.) Check this native
  // thread hasn't been through here already...
  CHECK(Thread::Current() == nullptr);

  // Set pthread_self_ ahead of pthread_setspecific, that makes Thread::Current function, this
  // avoids pthread_self_ ever being invalid when discovered from Thread::Current().
  tlsPtr_.pthread_self = pthread_self();
  CHECK(is_started_);

  SetUpAlternateSignalStack();
  if (!InitStackHwm()) {
    return false;
  }
  InitCpu();
  InitTlsEntryPoints();
  RemoveSuspendTrigger();
  InitCardTable();
  InitTid();
  interpreter::InitInterpreterTls(this);
  ……
  thread_list->Register(this);
  return true;
}

Thread的InitTlsEntryPoints()方法:

void Thread::InitTlsEntryPoints() {
  // Insert a placeholder so we can easily tell if we call an unimplemented entry point.
  uintptr_t* begin = reinterpret_cast<uintptr_t*>(&tlsPtr_.jni_entrypoints);
  uintptr_t* end = reinterpret_cast<uintptr_t*>(
      reinterpret_cast<uint8_t*>(&tlsPtr_.quick_entrypoints) + sizeof(tlsPtr_.quick_entrypoints));
  for (uintptr_t* it = begin; it != end; ++it) {
    *it = reinterpret_cast<uintptr_t>(UnimplementedEntryPoint);
  }
  InitEntryPoints(&tlsPtr_.jni_entrypoints, &tlsPtr_.quick_entrypoints);
}

entrypoints目录

Thread的InitTlsEntryPoints()方法调用InitEntryPoints()方法,并且把偏移地址传递进去。根据设备cpu架构的不同,该方法的实现也不同,我们来看ARM 64的实现(/art/runtime/arch/arm64/entrypoints_init_arm64.cc):


void InitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
     DefaultInitEntryPoints(jpoints, qpoints);
     ……
 }

调用DefaultInitEntryPoints()方法(/art/runtime/entrypoints/quick/quick_default_init_entrypoints.h):

static void DefaultInitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
  // JNI
  jpoints->pDlsymLookup = art_jni_dlsym_lookup_stub;

  // Alloc
  ResetQuickAllocEntryPoints(qpoints, /* is_marking */ true);
  ……
}

我们只关注Alloc部分。这里继续调用ResetQuickAllocEntryPoints()方法。

位置:/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc

static gc::AllocatorType entry_points_allocator = gc::kAllocatorTypeDlMalloc;

void SetQuickAllocEntryPointsAllocator(gc::AllocatorType allocator) {
  entry_points_allocator = allocator;
}
void ResetQuickAllocEntryPoints(QuickEntryPoints* qpoints, bool is_marking) {
#if !defined(__APPLE__) || !defined(__LP64__)
  switch (entry_points_allocator) {
    case gc::kAllocatorTypeDlMalloc: {
      SetQuickAllocEntryPoints_dlmalloc(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRosAlloc: {
      SetQuickAllocEntryPoints_rosalloc(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeBumpPointer: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_bump_pointer(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeTLAB: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRegion: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_region(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRegionTLAB: {
      CHECK(kMovingCollector);
      if (is_marking) {
        SetQuickAllocEntryPoints_region_tlab(qpoints, entry_points_instrumented);
      } else {
        // Not marking means we need no read barriers and can just use the normal TLAB case.
        SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
      }
      return;
    }
    default:
      break;
  }
#else
  UNUSED(qpoints);
  UNUSED(is_marking);
#endif
  UNIMPLEMENTED(FATAL);
  UNREACHABLE();
}
  • entry_points_allocator代表了内存分配器的类型,初始值为kAllocatorTypeDlMalloc表示将会使用DlMalloc的分配器入口。可以在调用SetQuickAllocEntryPointsAllocator改变entry_points_allocator的值。大部分情况下entry_points_allocator这个值为kAllocatorTypeRosAlloc。

  • SetQuickAllocEntryPointsAllocator会在ChangeAllocator方法修改分配器时被调用,ChangeAllocator会在ChangeCollector(修改垃圾收集方式)时被调用。

上面的代码调用到了SetQuickAllocEntryPoints_+不同分配器后缀,该方法又是在哪定义的呢?我们继续来看。

/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc:

#define GENERATE_ENTRYPOINTS(suffix) \
extern "C" void* art_quick_alloc_array_resolved##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix(void*); \
extern "C" void* art_quick_alloc_array_resolved##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix##_instrumented(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix##_instrumented(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix##_instrumented(void*); \
void SetQuickAllocEntryPoints##suffix(QuickEntryPoints* qpoints, bool instrumented) { \
  if (instrumented) { \
    qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix##_instrumented; \
    qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix##_instrumented; \
    qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix##_instrumented; \
    qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix##_instrumented; \
    qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix##_instrumented; \
    qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix##_instrumented; \
    qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix##_instrumented; \
    qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix##_instrumented; \
    qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix##_instrumented; \
    qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix##_instrumented; \
    qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix##_instrumented; \
  } else { \
    qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix; \
    qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix; \
    qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix; \
    qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix; \
    qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix; \
    qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix; \
    qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix; \
    qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix; \
    qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix; \
    qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix; \
    qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix; \
  } \
}

我们以pAllocObject为例,实际上art_quick_alloc_object_rosalloc使用bl指令跳转到C函数artAllocObjectFromCodeRosAlloc。参数type_idx描述的是要分配的对象的类型,通过寄存器r0传递,参数method描述的是当前调用的类方法,通过寄存器r1传递。

以函数artAllocObjectFromCodeRosAlloc为例,它是由以下代码调用的:(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc)

#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, suffix2, instrumented_bool, allocator_type) \
extern "C" mirror::Object* artAllocObjectFromCodeWithChecks##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<false, true, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeResolved##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<false, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeInitialized##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<true, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Array* artAllocArrayFromCodeResolved##suffix##suffix2( \
    mirror::Class* klass, int32_t component_count, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  ScopedQuickEntrypointChecks sqec(self); \
  return AllocArrayFromCodeResolved<instrumented_bool>(klass, component_count, self, \
                                                       allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromBytesFromCode##suffix##suffix2( \
    mirror::ByteArray* byte_array, int32_t high, int32_t offset, int32_t byte_count, \
    Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  ScopedQuickEntrypointChecks sqec(self); \
  StackHandleScope<1> hs(self); \
  Handle<mirror::ByteArray> handle_array(hs.NewHandle(byte_array)); \
  return mirror::String::AllocFromByteArray<instrumented_bool>(self, byte_count, handle_array, \
                                                               offset, high, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromCharsFromCode##suffix##suffix2( \
    int32_t offset, int32_t char_count, mirror::CharArray* char_array, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  StackHandleScope<1> hs(self); \
  Handle<mirror::CharArray> handle_array(hs.NewHandle(char_array)); \
  return mirror::String::AllocFromCharArray<instrumented_bool>(self, char_count, handle_array, \
                                                               offset, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromStringFromCode##suffix##suffix2( /* NOLINT */ \
    mirror::String* string, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  StackHandleScope<1> hs(self); \
  Handle<mirror::String> handle_string(hs.NewHandle(string)); \
  return mirror::String::AllocFromString<instrumented_bool>(self, handle_string->GetLength(), \
                                                            handle_string, 0, allocator_type); \
}

#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR(suffix, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, Instrumented, true, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, , false, allocator_type)

最终都调用到了artAllocObjectFromCode()方法(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc):

static constexpr bool kUseTlabFastPath = true;

template <bool kInitialized,
          bool kFinalize,
          bool kInstrumented,
          gc::AllocatorType allocator_type>
static ALWAYS_INLINE inline mirror::Object* artAllocObjectFromCode(
    mirror::Class* klass,
    Thread* self) REQUIRES_SHARED(Locks::mutator_lock_) {
  ScopedQuickEntrypointChecks sqec(self);
  DCHECK(klass != nullptr);
  if (kUseTlabFastPath && !kInstrumented && allocator_type == gc::kAllocatorTypeTLAB) {
    if (kInitialized || klass->IsInitialized()) {
      if (!kFinalize || !klass->IsFinalizable()) {
        size_t byte_count = klass->GetObjectSize();
        byte_count = RoundUp(byte_count, gc::space::BumpPointerSpace::kAlignment);
        mirror::Object* obj;
        if (LIKELY(byte_count < self->TlabSize())) {
          obj = self->AllocTlab(byte_count);
          DCHECK(obj != nullptr) << "AllocTlab can't fail";
          obj->SetClass(klass);
          if (kUseBakerReadBarrier) {
            obj->AssertReadBarrierState();
          }
          QuasiAtomic::ThreadFenceForConstructor();
          return obj;
        }
      }
    }
  }
  if (kInitialized) {
    return AllocObjectFromCodeInitialized<kInstrumented>(klass, self, allocator_type);
  } else if (!kFinalize) {
    return AllocObjectFromCodeResolved<kInstrumented>(klass, self, allocator_type);
  } else {
    return AllocObjectFromCode<kInstrumented>(klass, self, allocator_type);
  }
}

该方法做了以下几个事:

  • 首先判断是否可以使用TLAB方式分配内存。TLAB是Android为了减少多线程之间同步,加快处理速度,使用Thread的本地存储空间来进行存储。如果可以使用TLAB分配,最终会调用Thread对象的AllocTlab()方法进行内存分配。

  • 接下来会根据参数kInitialized和kFinalize的值来进行分支条件判断。如果类已经初始化,执行AllocObjectFromCodeInitialized()方法;否则,执行AllocObjectFromCodeResolved()和AllocObjectFromCode()方法。

我们来看AllocObjectFromCodeResolved方法( /art/runtime/entrypoints/entrypoint_utils-inl.h):

// Given the context of a calling Method and a resolved class, create an instance.
template <bool kInstrumented>
ALWAYS_INLINE
inline mirror::Object* AllocObjectFromCodeResolved(mirror::Class* klass,
                                                   Thread* self,
                                                   gc::AllocatorType allocator_type) {
  DCHECK(klass != nullptr);
  bool slow_path = false;
  klass = CheckClassInitializedForObjectAlloc(klass, self, &slow_path);
  if (UNLIKELY(slow_path)) {
    if (klass == nullptr) {
      return nullptr;
    }
    gc::Heap* heap = Runtime::Current()->GetHeap();
    // Pass in false since the object cannot be finalizable.
    // CheckClassInitializedForObjectAlloc can cause thread suspension which means we may now be
    // instrumented.
    return klass->Alloc</*kInstrumented*/true, false>(self, heap->GetCurrentAllocator()).Ptr();
  }
  // Pass in false since the object cannot be finalizable.
  return klass->Alloc<kInstrumented, false>(self, allocator_type).Ptr();
}

判断是否需要对类进行解析(类没有加载到虚拟机中),默认不需要,则slow_path为false,如果需要解析,则slow_path为true。CheckClassInitializedForObjectAlloc返回要分配的对象对应的class。 如果klass不为null,则进行该类的对象的内存分配:调用klass的Alloc方法。

Alloc方法:(/art/runtime/mirror/class-inl.h)

template<bool kIsInstrumented, bool kCheckAddFinalizer>
inline ObjPtr<Object> Class::Alloc(Thread* self, gc::AllocatorType allocator_type) {
  CheckObjectAlloc();
  gc::Heap* heap = Runtime::Current()->GetHeap();
  const bool add_finalizer = kCheckAddFinalizer && IsFinalizable();
  if (!kCheckAddFinalizer) {
    DCHECK(!IsFinalizable());
  }
  // Note that the this pointer may be invalidated after the allocation.
  ObjPtr<Object> obj =
      heap->AllocObjectWithAllocator<kIsInstrumented, false>(self,
                                                             this,
                                                             this->object_size_,
                                                             allocator_type,
                                                             VoidFunctor());
  if (add_finalizer && LIKELY(obj != nullptr)) {
    heap->AddFinalizerReference(self, &obj);
    if (UNLIKELY(self->IsExceptionPending())) {
      // Failed to allocate finalizer reference, it means that the whole allocation failed.
      obj = nullptr;
    }
  }
  return obj.Ptr();
}
  1. 通过CheckObjectAlloc()方法检查对象类型是否合法。

  2. 进行finalize相关判断,如果这个类重写了finalize()方法,则需要调用heap->AddFinalizerReference(self, &obj),通过FinalizerReference.java的add()方法,生成一个FinalizerReference对象,并添加到一个链表结构中。当对象进行销毁时,会执行调用该对象的finalize()方法。

  3. 调用heap->AllocObjectWithAllocator进行对象的内存分配。

到了这里,对象的内存分配就进入到heap堆的相关分配阶段了,我们将在下一节介绍heap堆中的内存分配环节。

小结

  1. Thread类初始化外部库函数调用跳转表。这些函数跳转入口通过访问线程Thread对应的偏移量进入。

  2. Thread的InitTlsEntryPoints()方法调用InitEntryPoints()方法,并且把偏移地址传递进去。根据设备cpu架构的不同,该方法的实现也不同,例如ARM 64的实现/art/runtime/arch/arm64/entrypoints_init_arm64.cc。

  3. entry_points_allocator代表了内存分配器的类型,初始值为kAllocatorTypeDlMalloc表示将会使用DlMalloc的分配器入口。可以在调用SetQuickAllocEntryPointsAllocator改变entry_points_allocator的值。大部分情况下entry_points_allocator这个值为kAllocatorTypeRosAlloc。

  4. artAllocObjectFromCode()方法(/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc)会根据条件(例如,是否需要对类进行解析)调用不同分支条件的内存分配。

  5. 最终,都调用heap->AllocObjectWithAllocator进行对象的内存分配。

推荐阅读更多精彩内容