Android造成ANR的常见原因及示例分析

文章有错误之处,欢迎批评指正!

什么是ANR

在Android中,如果主线程被长时间阻塞,导致无法响应用户的操作,即造成ANR(Application Not Responding)。通常的表现是弹出一个应用无响应的对话框,让用户选择强制退出或者等待。

ANR_Dialog.png

注意:主线程做耗时操作本身是不会产生ANR的,导致ANR的根本还是应用程序无法在一定时间内响应用户的操作。因为主线程被耗时操作阻塞了,主线程无法对下一个操作进行响应才会ANR,没有需要响应的操作自然就不会产生ANR,或者应该这样说:主线程做耗时操作,非常容易引发ANR。

ANR的类型

KeyDispatch Timeout :按键或触摸事件在特定时间内无响应。超时时间5秒。超时时间是在ActivityManagerService类中定义的。

// How long we wait until we timeout on key dispatching.
static final int KEY_DISPATCHING_TIMEOUT = 5*1000;

Broadcast Timeout :BroadcastReceiver在特定时间内无法处理完成。前台广播10秒,后台广播60秒。超时时间是在ActivityManagerService类中定义的。

// How long we allow a receiver to run before giving up on it.
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

Service Timeout :Service在特定的时间内生命周期函数无法处理完成。前台服务20秒,后台服务200秒。超时时间是在ActiveServices类中定义的。

// How long we wait for a service to finish executing.
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

ContentProvider Timeout :ContentProvider在特定的时间内没有完成发布。超时时间10秒。超时时间是在ActivityManagerService类中定义的。

// How long we wait for an attached process to publish its content providers
// before we decide it must be hung.
static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10*1000;

关于ANR类型的详细信息不在本篇文章的叙述范围之内,请自行查阅资料。

造成ANR的常见原因

  1. 应用在主线程上进行长时间的计算。
  2. 应用在主线程上执行耗时的I/O的操作。
  3. 主线程处于阻塞状态,等待获取锁。
  4. 主线程与其他线程之间发生死锁。
  5. 主线程在对另一个进程进行同步Binder调用,而后者需要很长时间才能返回。(如果我们知道调用远程方法需要很长时间,我们应该避免在主线程调用)

上述原因都会造成主线程被长时间阻塞,导致无法响应用户的操作,从而造成ANR。

ANR原因排查

ANR发生以后,在Logcat中有相应的日志输出,并且会在/data/anr/目录下输出一个traces.tx文件,该文件记录了ANR的更加详细的信息,我们可以导出分析。接下来我们就依次模拟上述5种方式来制造ANR,然后分析产生的Logcat和traces.txt文件。

测试环境:Android Studio 3.6.1
测试手机: HUAWEI MLA-AL10,Android版本: 7.0

1.应用在主线程上进行长时间的计算

//使用冒泡排序对一个大数组排序
private fun sortBigArray() {
    val currTime = System.currentTimeMillis()
    val random = IntArray(1000000)
    for (i in random.indices) {
        random[i] = (Math.random() * 10000000).toInt()
    }
    BubbleSort.sort(random)
    println("耗时" + (System.currentTimeMillis() - currTime) + "ms")
    for (i in random.indices) {
        println(random[i].toString())
    }
}

我们点击一个按钮调用sortBigArray()方法,内部调用BubbleSort类的sort()方法对一个大数组(100万)进行排序,然后点击几次返回键,然后就出现ANR了。

我们先看一下Logcat日志输出

//debug级别日志
2020-06-03 21:20:24.209 com.example.android.jetpackdemo I/art: Wrote stack traces to '/data/anr/traces.txt'

//error级别日志
2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.7 / 7.48 / 7.35
    CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor
      0.7% 2661/com.android.systemui: 0.6% user + 0.1% kernel / faults: 1648 minor 1 major
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 21 minor
      0.7% 24463/com.huawei.hwid.persistent: 0.6% user + 0% kernel / faults: 4650 minor 1 major
      0.5% 4018/com.huawei.android.launcher: 0.4% user + 0% kernel / faults: 16025 minor 3 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.4% 28932/com.huawei.appmarket: 0.3% user + 0% kernel / faults: 2526 minor
     //...
   2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel
        1.8% 2862/Binder:2001_6: 0% user + 1.8% kernel
        1.8% 3089/Binder:2001_7: 1.8% user + 0% kernel
      5.3% 29428/adbd: 0% user + 5.3% kernel / faults: 480 minor
        3.5% 29430/->transport: 0% user + 3.5% kernel
        1.7% 29428/adbd: 0% user + 1.7% kernel
      1.3% 53/rcuop/6: 0% user + 1.3% kernel
    16% TOTAL: 14% user + 2.1% kernel + 0.2% irq + 0.2% softirq

在上面的日志中输出了堆栈信息的保存在 /data/anr/traces.txt文件中。

com.example.android.jetpackdemo I/dalvikvm: Wrote stack traces to '/data/anr/traces.txt'

发生ANR进程的包名信息,所在的类,进程id和ANR的类型

2020-06-03 21:20:28.048 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 15564
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)  

包名com.example.android.jetpackdemo,具体的类com.example.android.jetpackdemo.StartActivity,进程号是PID: 15564,ANR的类型是Input dispatching timed out

 CPU usage from 294322ms to 0ms ago (2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139):
      4.1% 2001/system_server: 3.1% user + 0.9% kernel / faults: 64102 minor 6 major
      3.3% 29428/adbd: 0.8% user + 2.4% kernel / faults: 131259 minor
      1.1% 508/logd: 0.5% user + 0.6% kernel / faults: 18 minor

//...

2020-06-03 21:20:28.048 ? E/ActivityManager: CPU usage from 1721ms to 2250ms later (2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389):
      99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel / faults: 37 minor
        99% 15564/oid.jetpackdemo: 99% user + 0% kernel
      7.5% 2001/system_server: 3.7% user + 3.7% kernel / faults: 5 minor
        5.6% 2014/ActivityManager: 1.8% user + 3.7% kernel
        1.8% 2813/Binder:2001_5: 1.8% user + 0% kernel

//...

注意:
在ANR发生之前,2020-06-03 21:15:29.817 to 2020-06-03 21:20:24.139,这段时间CPU的使用并不高。

在ANR发生的时候,2020-06-03 21:20:25.860 to 2020-06-03 21:20:26.389,这段时间CPU的使用相当高,已经达到99%了。

99% 15564/com.example.android.jetpackdemo: 97% user + 1.8% kernel
  • 99%:内存占用率
  • 15564/com.example.android.jetpackdemo:进程id和进程名。

这两段CPU 信息分别代表ANR发生前和ANR时的CPU占用率,在输出的CPU使用信息中我们也可以看出一些端倪,我们注意到我们的进程CPU的占用率比较高,说明我们的进程比较忙碌,这里需要说明一下,进程忙碌并不一定代表主线程忙碌,也可能是进程中的后台线程忙碌。

但是现在我们虽然知道了ANR发生的所在的类,但是如何精确定位到具体的哪一行代码呢?这就需要分析发生ANR的时候保存的traces.txt文件了。

导出traces文件

使用adb命令导出traces.txt文件

adb pull /data/anr/traces.txt traces_1.txt 
/data/anr/traces.txt: 1 file pulled, 0 skipped. 28.5 MB/s (701726 bytes in 0.023s)

如果入到permission相关问题,请使用bugreport命令导出,参考 Capture and read bug reports

traces.txt部分信息

----- pid 15564 at 2020-06-03 21:20:24 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'
ABI: 'arm64'
//...

在traces.txt文件的最顶部,首先输出的是发生ANR的进程号和包名信息,然后我们可以在traces.txt中搜索我们的进程号或者包名。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
  at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)
//...

我们首先看一下和线程相关的部分信息。

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=15564 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 22116939220 18299419 428 ) utm=2209 stm=2 core=5 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)

线程基本信息:

  • 线程名:main

  • 线程优先级:prio=5,优先级取值范围[1,10],详见Thread类:

//最小取值
public final static int MIN_PRIORITY = 1;
//默认优先级
public final static int NORM_PRIORITY = 5;
//最大取值
public final static int MAX_PRIORITY = 10;
  • 线程id: tid=1,1代表主线程
  • 线程状态:Runnable,状态取值如下,详见Thread.State枚举类:
NEW, //线程还没启动

RUNNABLE, //正在执行

BLOCKED, //等待获取锁

WAITING, //等待其他线程执行一个特定的动作,比如说调用Object.notify或Object.notifyAll()

TIMED_WAITING, //等待一定时间

TERMINATED //执行完毕
  • 线程组名称:group="main"
  • 线程被挂起的次数:sCount=0
  • 线程被调试器挂起的次数:dsCount=0
  • 线程的java的对象地址:obj= 0x77d21af8
  • 线程本身的Native对象地址:self= 0x7fa2ea2a00

线程调度信息:

  • Linux系统中内核线程id: sysTid= 15564 与进程号相同
  • 线程调度优先级:nice=-10,详细信息可参考 浅析Linux线程调度
  • 线程调度组:cgrp=default
  • 线程调度策略和优先级:sched=0/0
  • 线程处理函数地址:handle= 0x7fa6f4ba98

线程的堆栈信息:

  • 堆栈地址和大小:stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB

held mutexes:

  • held mutexes 到底是什么意思我没有找到官方的文档解释,网上大多数关于held mutexes的解释也都是一笔带过没有实际参考意义,我们这里先忽略这个东西,并不会影响我们排查问题。

从上面traces.txt文件中这段信息可以看出,导致ANR的最终原因是在BubbleSort.java的第45行。

 at com.example.android.jetpackdemo.BubbleSort.sort(BubbleSort.java:45)
 at com.example.android.jetpackdemo.StartActivity.sortBigArray(StartActivity.kt:76)
 at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:47)
 at java.lang.reflect.Method.invoke!(Native method)
anr_1.png

2.应用在主线程上执行耗时的I/O的操作

/**
 * 拷贝文件,注意要有读写权限
 */
private fun doIo() {
    val prePath = Environment.getExternalStorageDirectory().path
    val file = File("${prePath}/test/View.java")
    if (file.exists()) {
        Log.d(TAG, "doIo: ${file.length()}")

        val reader = FileReader(file)
        val fileWriter = FileWriter("${prePath}/test/ViewCopy.java", true)

        for (index in 0 until 5) {
            var count: Int
            while (reader.read().also { count = it } != -1) {
                fileWriter.write(count)
            }
            try {
                reader.reset()
            } catch (e: IOException) {
                Log.d(TAG, "doIo: error ${e.message}")
            }
        }
    }
}

调用doIo()方法以后,多次点击返回键,制造ANR。

Logcat日志输出

2020-06-04 21:05:24.462 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 16295
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.49 / 7.45 / 7.24
    CPU usage from 87491ms to 0ms ago (2020-06-04 21:03:53.035 to 2020-06-04 21:05:20.526):
      7.8% 2001/system_server: 6.1% user + 1.7% kernel / faults: 34095 minor 3 major
      4.2% 28932/com.huawei.appmarket: 3.7% user + 0.5% kernel / faults: 12314 minor 5 major
      2.8% 2661/com.android.systemui: 2.2% user + 0.5% kernel / faults: 4222 minor 1 major
      2% 412/msm-core:sampli: 0% user + 2% kernel
      1.7% 24463/com.huawei.hwid.persistent: 1.5% user + 0.1% kernel / faults: 3317 minor 1 major
      1.5% 607/surfaceflinger: 1% user + 0.5% kernel / faults: 24 minor
     //...    
2020-06-04 21:05:24.462 ? E/ActivityManager: CPU usage from 1696ms to 2226ms later (2020-06-04 21:05:22.222 to 2020-06-04 21:05:22.752):
      84% 16295/com.example.android.jetpackdemo: 84% user + 0% kernel / faults: 562 minor 1 major
        68% 16295/oid.jetpackdemo: 68% user + 0% kernel
        12% 16317/RenderThread: 12% user + 0% kernel
        1.8% 16307/HeapTaskDaemon: 1.8% user + 0% kernel
       +0% 16461/DeferredSaveThr: 0% user + 0% kernel
      9.1% 2001/system_server: 1.8% user + 7.3% kernel / faults: 7 minor
        7.3% 2014/ActivityManager: 0% user + 7.3% kernel
        3.6% 2536/Binder:2001_3: 3.6% user + 0% kernel
      5.5% 607/surfaceflinger: 2.7% user + 2.7% kernel
        1.3% 607/surfaceflinger: 1.3% user + 0% kernel
        1.3% 658/Binder:607_1: 0% user + 1.3% kernel
      4.3% 2661/com.android.systemui: 2.9% user + 1.4% kernel / faults: 26 minor
        4.3% 3614/RenderThread: 2.9% user + 1.4% kernel
        1.4% 2661/ndroid.systemui: 1.4% user + 0% kernel
      1.3% 25/rcuop/2: 0% user + 1.3% kernel
      1.3% 339/irq/171-tsens_i: 0% user + 1.3% kernel
      1.5% 11851/mdss_fb0: 0% user + 1.5% kernel
      1.6% 14246/kworker/u16:5: 0% user + 1.6% kernel
      1.6% 16318/kworker/u16:4: 0% user + 1.6% kernel
    15% TOTAL: 13% user + 1.8% kernel

从上面的日志信息中我们也看出来发生ANR的时候,我们的进程com.example.android.jetpackdemoCPU占用率是比较高的,说明我们进程内存在比较忙碌的线程。然后我们继续看一下对应的traces.txt文件。

traces.txt部分信息

----- pid 16295 at 2020-06-04 21:05:20 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 16295搜索

"main" prio=5 tid=1 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=16295 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=R schedstat=( 16406184130 12254163 407 ) utm=1630 stm=10 core=6 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0000000000478088  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+220)
  native: #01 pc 0000000000478084  /system/lib64/libart.so (_ZN3art15DumpNativeStackERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEEiP12BacktraceMapPKcPNS_9ArtMethodEPv+216)
  native: #02 pc 000000000044c604  /system/lib64/libart.so (_ZNK3art6Thread9DumpStackERNSt3__113basic_ostreamIcNS1_11char_traitsIcEEEEbP12BacktraceMap+524)
  native: #03 pc 0000000000463f60  /system/lib64/libart.so (_ZN3art14DumpCheckpoint3RunEPNS_6ThreadE+820)
  native: #04 pc 000000000044d510  /system/lib64/libart.so (_ZN3art6Thread21RunCheckpointFunctionEv+192)
  native: #05 pc 00000000000ff870  /system/lib64/libart.so (_ZN3art27ScopedObjectAccessUncheckedD2Ev+576)
  native: #06 pc 000000000010a764  /system/lib64/libart.so (_ZN3art8CheckJNI23GetPrimitiveArrayRegionEPKcNS_9Primitive4TypeEP7_JNIEnvP7_jarrayiiPv+1164)
  native: #07 pc 0000000000022ee4  /system/lib64/libjavacore.so (???)
  native: #08 pc 00000000004747a8  /data/dalvik-cache/arm64/system@framework@boot-core-libart.oat (Java_libcore_icu_NativeConverter_encode__J_3CI_3BI_3IZ+244)
  at libcore.icu.NativeConverter.encode(Native method)
  at java.nio.charset.CharsetEncoderICU.encodeLoop(CharsetEncoderICU.java:169)
  at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:579)
  at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:271)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
  - locked <0x05b5279d> (a java.io.FileWriter)
  at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:113)
  at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
  at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

我们重点看一下这段信息

at java.io.OutputStreamWriter.write(OutputStreamWriter.java:194)
at com.example.android.jetpackdemo.StartActivity.doIo(StartActivity.kt:116)
at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:65)

从上面这段信息可以看出,导致ANR的最终原因是在OutputStreamWriter.java的第194行。而我们的代码出问题的地方是StartActivity.kt的116行。

OutputStreamWriter_194.png

StartActivity_116.png

3.主线程处于阻塞状态,等待获取锁

//锁资源
val lockedResource: Any = Any()

fun onClick(v: View) {
    when (v.id) {
        R.id.btnWaitLockedResource -> {
            LockTask().execute(arrayListOf<Int>())
            Log.d(TAG, "onClick: 主线程先睡眠一会,避免先获取到锁")
            Thread.sleep(200)
            Log.d(TAG, "onClick: 主线程先睡眠结束,尝试获取锁")
            synchronized(lockedResource) {
                for (index in 0 until 10) {
                    Log.d(TAG, "onClick: 主线程获取到锁了$index")
                }
            }
        }
    }
}


//LockTask后台线程
inner class LockTask : AsyncTask<MutableList<Int>, Int, Unit>() {
    override fun doInBackground(vararg params: MutableList<Int>) =
        synchronized(lockedResource) {
            val list = params[0]
            for (i in 0 until 1000000) {
                list.add((Math.random() * 10000000).toInt())
            }
            list.forEach {
                Log.d(TAG, "doInBackground: for each element is $it")
            }
        }
}

调用onClick()方法以后,先让后台线程获取锁,然后主线程再尝试获取锁。然后多次点击返回键,制造ANR。

Logcat日志输出

2020-06-04 09:55:04.396 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 20008
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 8.27 / 7.73 / 7.37
    CPU usage from 83152ms to 0ms ago (2020-06-04 09:53:36.842 to 2020-06-04 09:54:59.995) with 99% awake:
      19% 508/logd: 15% user + 3.5% kernel / faults: 533 minor 1 major
      5.5% 2001/system_server: 3.9% user + 1.5% kernel / faults: 10843 minor 7 major
      4.9% 28932/com.huawei.appmarket: 4.3% user + 0.6% kernel / faults: 13003 minor 79 major
      2.6% 2661/com.android.systemui: 2.2% user + 0.3% kernel / faults: 7158 minor 2 major
      1.5% 607/surfaceflinger: 0.9% user + 0.6% kernel / faults: 190 minor 1 major
      1.2% 24307/logcat: 0.7% user + 0.4% kernel
      0.8% 11161/com.android.settings: 0.6% user + 0.1% kernel / faults: 9084 minor 20 major
      0.6% 24305/logcat: 0.2% user + 0.3% kernel
      0.4% 24301/fingerprint_log: 0% user + 0.4% kernel
      0.3% 15363/kworker/u16:10: 0% user + 0.3% kernel
      0.2% 6831/kworker/u16:5: 0% user + 0.2% kernel
      0.2% 837/imonitor: 0% user + 0.1% kernel
      //...
     2020-06-04 09:55:04.396 ? E/ActivityManager: CPU usage from 2211ms to 2742ms later (2020-06-04 09:55:02.206 to 2020-06-04 09:55:02.737):
      105% 20008/com.example.android.jetpackdemo: 92% user + 13% kernel / faults: 220 minor
        99% 20096/AsyncTask #1: 86% user + 13% kernel
        5.6% 20019/HeapTaskDaemon: 5.6% user + 0% kernel
      103% 508/logd: 99% user + 3.7% kernel / faults: 8 minor
        92% 24315/logd.reader.per: 92% user + 0% kernel
        7.5% 511/logd.writer: 5.6% user + 1.8% kernel
        3.7% 24314/logd.reader.per: 0% user + 3.7% kernel
        1.8% 24313/logd.reader.per: 0% user + 1.8% kernel
      11% 2661/com.android.systemui: 11% user + 0% kernel / faults: 52 minor
        9.3% 3614/RenderThread: 7.5% user + 1.8% kernel
        1.8% 2661/ndroid.systemui: 1.8% user + 0% kernel
      9.3% 607/surfaceflinger: 9.3% user + 0% kernel
        3.7% 607/surfaceflinger: 3.7% user + 0% kernel
        1.8% 2614/Binder:607_4: 0% user + 1.8% kernel
      5.6% 2001/system_server: 1.8% user + 3.7% kernel / faults: 2 minor
        5.6% 2014/ActivityManager: 0% user + 5.6% kernel
      3.3% 19794/adbd: 1.6% user + 1.6% kernel / faults: 147 minor
        1.6% 19794/adbd: 0% user + 1.6% kernel
        1.6% 19796/->transport: 0% user + 1.6% kernel
        1.6% 19797/<-transport: 0% user + 1.6% kernel
      3.4% 24307/logcat: 0% user + 3.4% kernel
      1.3% 624/mm-pp-dpps: 0% user + 1.3% kernel
        1.3% 717/ABA_THREAD: 1.3% user + 0% kernel
      1.6% 18971/kworker/0:2: 0% user + 1.6% kernel
      1.6% 18974/kworker/u16:0: 0% user + 1.6% kernel
      1.6% 19095/mdss_fb0: 0% user + 1.6% kernel
      1.7% 24301/fingerprint_log: 1.7% user + 0% kernel
      1.7% 24305/logcat: 1.7% user + 0% kernel
    31% TOTAL: 26% user + 4% kernel + 0.2% irq + 0.2% softirq

从上面的日志信息中我们也看出来我们的进程CPU占用率是比较高的,说明我们进程内存在比较忙碌的线程。然后我们继续看一下对应的traces.txt文件。

traces.txt部分信息

----- pid 20008 at 2020-06-04 09:55:00 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 20008搜索

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=20008 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 278831875 7233747 156 ) utm=22 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
  - waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

关键信息

at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:61)
- waiting to lock <0x0f8c80b0> (a java.lang.Object) held by thread 16

在StartActivity的61行,在等待一个锁对象<0x0f8c80b0>,该对象是一个Object对象(a java.lang.Object),这个锁对象正在被线程id为16的线程持有。那么我们下面在traces.txt文件中搜索一下这个锁对象<0x0f8c80b0>。如下所示:

DALVIK THREADS (16):
"AsyncTask #1" prio=5 tid=16 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x12cd61f0 self=0x7f93187200
  | sysTid=20096 nice=10 cgrp=bg_non_interactive sched=0/0 handle=0x7f84346450
  | state=R schedstat=( 13814173056 6030204 1355 ) utm=1193 stm=188 core=3 HZ=100
  | stack=0x7f84244000-0x7f84246000 stackSize=1037KB
  | held mutexes= "mutator lock"(shared held)
  at java.lang.Integer.stringSize(Integer.java:414)
  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:630)
  at java.lang.StringBuilder.append(StringBuilder.java:220)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
  - locked <0x0f8c80b0> (a java.lang.Object)
  at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:99)
  at android.os.AsyncTask$2.call(AsyncTask.java:316)
  at java.util.concurrent.FutureTask.run(FutureTask.java:237)
  at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:255)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
  at java.lang.Thread.run(Thread.java:776)

关键信息

 at com.example.android.jetpackdemo.StartActivity$LockTask.doInBackground(StartActivity.kt:107)
 - locked <0x0f8c80b0> (a java.lang.Object)

我们看到正是这个AsyncTask在107行持有锁对象0x0f8c80b0,导致主线程无法获取锁而阻塞,最终导致ANR。

AsyncTaskHeldLock.png

4.主线程与其他线程之间发生死锁

val resourceFirst = "resourceFirst"
val resourceSecond = "resourceSecond"

private fun mockDeadLock() {
    //启动一个后台线程
    thread(start = false) {
        synchronized(resourceSecond) {
            Log.d(TAG, "工作线程获取了锁 resourceSecond")
            Thread.sleep(100)
            Log.d(TAG, "工作线程尝试获取锁 resourceFirst")
            synchronized(resourceFirst) {
                while (true) {
                    Log.d(TAG, "工作线程 mockDeadLock")
                }
            }
        }
    }.start()

    //主线程睡眠30ms后开始获取锁
    Thread.sleep(30)

    synchronized(resourceFirst) {
        Log.d(TAG, "主线程获取了锁 resourceFirst")

        Log.d(TAG, "主线程尝试获取锁 resourceSecond")
        synchronized(resourceSecond) {
            Log.d(TAG, "主线程获取了锁 resourceFirst")
            while (true) {
                Log.d(TAG, "主线程 mockDeadLock")
            }
        }
    }
}

上面这段代码逻辑:

  1. 工作线程先获取锁resourceSecond,然后睡眠100ms保证主线程能获取到锁resourceFirst
  2. 主线程睡眠30ms后先获取锁resourceFirst,然后再尝试获取锁resourceSecond,这时候是获取不到的,因为工作线程已经持有锁resourceSecond并且不释放。
  3. 工作线程睡眠结束以后尝试获取锁resourceFirst,这时候是获取不到的,因为主线程持有了锁resourceFirst并且不释放。
  4. 最终,造成死锁。

调用mockDeadLock()方法以后,多次点击返回键,制造ANR。

Logcat输出

2020-06-04 15:07:41.246 ? E/ActivityManager: ANR in com.example.android.jetpackdemo (com.example.android.jetpackdemo/.StartActivity)
    PID: 13626
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.53 / 6.81 / 6.4
    CPU usage from 177565ms to 0ms ago (2020-06-04 15:04:39.715 to 2020-06-04 15:07:37.281):
      11% 2001/system_server: 7.1% user + 4.4% kernel / faults: 68219 minor 37 major
      3.4% 2661/com.android.systemui: 2.8% user + 0.6% kernel / faults: 20555 minor 29 major
      2% 508/logd: 0.9% user + 1.1% kernel / faults: 76 minor
      1.8% 607/surfaceflinger: 1.1% user + 0.7% kernel / faults: 82 minor 1 major
      0% 24463/com.huawei.hwid.persistent: 0% user + 0% kernel / faults: 7819 minor 24 major
      0.9% 2823/com.huawei.systemmanager:service: 0.6% user + 0.2% kernel / faults: 13277 minor 12 major
    //...      
2020-06-04 15:07:41.246 ? E/ActivityManager: CPU usage from 1714ms to 2243ms later (2020-06-04 15:07:38.994 to 2020-06-04 15:07:39.523):
      12% 2001/system_server: 9% user + 3.6% kernel / faults: 8 minor
        10% 2014/ActivityManager: 5.4% user + 5.4% kernel
        1.8% 2399/UEventObserver: 1.8% user + 0% kernel
      1.5% 13652/kworker/u16:7: 0% user + 1.5% kernel
    2.3% TOTAL: 1.1% user + 1.1% kernel

上面的Logcat输出并没有关于我们进程的CUP信息,说明我们的进程CPU占用率很低。那么我们继续看一下traces.txt文件。

traces.txt部分信息

----- pid 13626 at 2020-06-04 15:07:37 -----
Cmd line: com.example.android.jetpackdemo
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 13626搜索


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=13626 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 288564792 6939269 224 ) utm=23 stm=5 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity.mockDeadLock(StartActivity.kt:142)
  - waiting to lock <0x0a43b5c8> (a java.lang.String) held by thread 17
  at com.example.android.jetpackdemo.StartActivity.onClick(StartActivity.kt:70)
  at java.lang.reflect.Method.invoke!(Native method)
  at androidx.appcompat.app.AppCompatViewInflater$DeclaredOnClickListener.onClick(AppCompatViewInflater.java:397)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

主线程状态是线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0a43b5c8>是一个String对象(a java.lang.String),该对象被线程id为17的线程持有。然后我们搜索这个锁对象。

"Thread-2" prio=5 tid=17 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x12c89dc0 self=0x7f931cd000
  | sysTid=13763 nice=0 cgrp=default sched=0/0 handle=0x7f84344450
  | state=S schedstat=( 886406 280365 2 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7f84242000-0x7f84244000 stackSize=1037KB
  | held mutexes=
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:127)
  - waiting to lock <0x0ec26674> (a java.lang.String) held by thread 1
  - locked <0x0a43b5c8> (a java.lang.String)
  at com.example.android.jetpackdemo.StartActivity$mockDeadLock$1.invoke(StartActivity.kt:21)
  at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

Thread-2,线程状态是Blocked,说明正在等待获取锁对象,等待获取的锁对象<0x0ec26674>是一个String对象(a java.lang.String),这个对象被线程id为1的线程(也就是主线程)持有。并且当前线程持有锁对象<0x0a43b5c8>

最终,主线程和工作线程Thread-2造成死锁,导致应用无响应。

5.主线程在对另一个进程进行同步Binder调用,而后者需要很长时间才能返回

我们的代码是实现从客户端的两个EditText中获取两个数字,然后通过Binder调用服务端的方法计算两个数的和返回给客户端,然后客户端讲计算结果展示在界面上。完整代码请参考 AIDLDemo

客户端部分代码

private IMyAidlInterface iMyAidlInterface;
    
private ServiceConnection conn = new ServiceConnection() {
    @Override
    public void onServiceConnected(ComponentName name, IBinder service) {
         //获取Binder对象
         iMyAidlInterface = IMyAidlInterface.Stub.asInterface(service);
    }
    //...
};
    
public void onClick(View view) {
    switch (view.getId()) {
        case R.id.btn_count:
            mNum1 = Integer.parseInt(etNum1.getText().toString());
            mNum2 = Integer.parseInt(etNum2.getText().toString());
            try {
                //在主线程进行同步binder调用
                mTotal = iMyAidlInterface.add(mNum1, mNum2);
            } catch (RemoteException e) {
                e.printStackTrace();
                Log.e(TAG, "onClick: " + e.getMessage());
            }
            editShowResult.setText("mTotal=" + mTotal);
            break;
       }
}

服务端部分代码

public class IRemoteService extends Service {

    private static final String TAG = "IRemoteService";

    private IBinder iBinder = new IMyAidlInterface.Stub() {
        @Override
        public int add(int num1, int num2) throws RemoteException {
            Log.d(TAG, "remote method add: start sleep thread id =" + Thread.currentThread().getId()+"," +
                    "thread name = "+Thread.currentThread().getName());
            try {
                //睡眠一段时间,然后才进行计算
                Thread.sleep(120000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            Log.d(TAG, "remote method add: finish sleep return calculate result");
            return num1 + num2;
        }
    };

    public IRemoteService() {
    }

    @Override
    public IBinder onBind(Intent intent) {
        return iBinder;
    }
}

注意:我们需要先把Binder服务端运行起来,然后再运行Binder客户端执行相应的方法。

Logcat输出

2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: ANR in com.hm.aidlclient (com.hm.aidlclient/.BaseKnowledgeActivity)
    PID: 18096
    Reason: Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 2.)
    Load: 7.55 / 7.26 / 6.87
    CPU usage from 755516ms to 0ms ago (2020-06-04 15:37:07.545 to 2020-06-04 15:49:43.062) with 99% awake:
      5.1% 2001/system_server: 3.5% user + 1.5% kernel / faults: 139606 minor 17 major
      1.2% 508/logd: 0.5% user + 0.6% kernel / faults: 35 minor
      0.9% 2661/com.android.systemui: 0.7% user + 0.1% kernel / faults: 13039 minor 4 major
      0.8% 12442/adbd: 0.2% user + 0.6% kernel / faults: 23957 minor
      0.7% 607/surfaceflinger: 0.4% user + 0.3% kernel / faults: 183 minor 2 major
      0.6% 28932/com.huawei.appmarket: 0.5% user + 0.1% kernel / faults: 9311 minor 64 major
      0.4% 24463/com.huawei.hwid.persistent: 0.3% user + 0% kernel / faults: 11607 minor 6 major
      0.5% 24301/fingerprint_log: 0% user + 0.5% kernel
      0.3% 4128/com.google.android.gms: 0.2% user + 0% kernel / faults: 26970 minor 16 major
      //...
2020-06-04 15:49:47.006 2001-2014/? E/ActivityManager: CPU usage from 1701ms to 2232ms later (2020-06-04 15:49:44.762 to 2020-06-04 15:49:45.293):
      28% 2001/system_server: 21% user + 7.2% kernel / faults: 38 minor
        16% 2010/HeapTaskDaemon: 16% user + 0% kernel
        9% 2014/ActivityManager: 1.8% user + 7.2% kernel
        1.8% 2001/system_server: 0% user + 1.8% kernel
        1.8% 2540/NetdConnector: 1.8% user + 0% kernel
      9% 607/surfaceflinger: 9% user + 0% kernel
        5.4% 607/surfaceflinger: 5.4% user + 0% kernel
        1.8% 658/Binder:607_1: 0% user + 1.8% kernel
        1.8% 677/EventThread: 0% user + 1.8% kernel
      7.1% 2661/com.android.systemui: 5.3% user + 1.7% kernel / faults: 38 minor
        8.9% 3614/RenderThread: 7.1% user + 1.7% kernel
      1.3% 508/logd: 1.3% user + 0% kernel
      1.3% 624/mm-pp-dpps: 1.3% user + 0% kernel
        2.7% 717/ABA_THREAD: 1.3% user + 1.3% kernel
      1.5% 15978/mdss_fb0: 0% user + 1.5% kernel
      1.6% 18228/logcat: 0% user + 1.6% kernel
    8.3% TOTAL: 5.8% user + 2.5% kernel

Logcat输出的信息中并没有什么有价值的信息。那么我们继续看一下traces.txt文件。

traces.txt中客户端相关信息

----- pid 18096 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlclient
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 18096搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=18096 nice=-10 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 464662186 22498334 359 ) utm=38 stm=8 core=0 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: binder_thread_read+0x4cc/0x13f0
  kernel: binder_ioctl+0x53c/0xbcc
  kernel: do_vfs_ioctl+0x570/0x5a8
  kernel: SyS_ioctl+0x60/0x88
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ad6c  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 000000000001fa48  /system/lib64/libc.so (ioctl+144)
  native: #02 pc 00000000000555a4  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+260)
  native: #03 pc 0000000000056388  /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+352)
  native: #04 pc 000000000004b250  /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
  native: #05 pc 0000000000103354  /system/lib64/libandroid_runtime.so (???)
  native: #06 pc 0000000000b36238  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:617)
  at com.hm.aidlserver.IMyAidlInterface$Stub$Proxy.add(IMyAidlInterface.java:90)
  at com.hm.aidlclient.BaseKnowledgeActivity.onClick(BaseKnowledgeActivity.java:109)
  at com.hm.aidlclient.BaseKnowledgeActivity_ViewBinding$1.doClick(BaseKnowledgeActivity_ViewBinding.java:41)
  at butterknife.internal.DebouncingOnClickListener.onClick(DebouncingOnClickListener.java:22)
  at android.view.View.performClick(View.java:5646)
  at android.view.View$PerformClick.run(View.java:22473)
  at android.os.Handler.handleCallback(Handler.java:761)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:156)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

这里我们看到Binder客户端主线程的状态是Native,这个状态是native线程的一个状态,对应java线程的RUNNABLE状态。更详细的对应关系可以参考VMThread.java。然后从上面的信息中我们只看到BinderProxy调用了transactNative()方法,这是一个本地方法,最终会调用服务端Binder对象的transact()方法,实现真正的跨进程通信。除了这些我们没有看到其他有用的信息了。那么我们接下来看一看服务端的一些信息,看看能不能找到一些线索。

traces.txt中服务端相关信息

----- pid 17773 at 2020-06-04 15:49:43 -----
Cmd line: com.hm.aidlserver
Build fingerprint: 'HUAWEI/MLA-AL10/HWMLA:7.0/HUAWEIMLA-AL10/C00B364:user/release-keys'

通过进程号pid 17773搜索

"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x77d21af8 self=0x7fa2ea2a00
  | sysTid=17773 nice=0 cgrp=default sched=0/0 handle=0x7fa6f4ba98
  | state=S schedstat=( 213791882 16481247 206 ) utm=18 stm=3 core=1 HZ=100
  | stack=0x7fd42e0000-0x7fd42e2000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: SyS_epoll_wait+0x2d4/0x394
  kernel: SyS_epoll_pwait+0xc4/0x150
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 000000000006ac80  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001e21c  /system/lib64/libc.so (epoll_pwait+64)
  native: #02 pc 00000000000181d8  /vendor/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+156)
  native: #03 pc 000000000001808c  /vendor/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+60)
  native: #04 pc 00000000000f66dc  /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  native: #05 pc 0000000000b91ec0  /data/dalvik-cache/arm64/system@framework@boot-framework.oat (Java_android_os_MessageQueue_nativePollOnce__JI+140)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:356)
  at android.os.Looper.loop(Looper.java:138)
  at android.app.ActivityThread.main(ActivityThread.java:6517)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:942)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:832)

服务端的进程号是pid 17773,我们看到服务端的主线程中也没有什么线索,不要慌,这里我们似乎忘了一点什么,那就是服务端的Binder对象是运行在服务端的Binder线程池中的。那我们怎么找到具体是Binder线程池中的哪个线程呢?其实在traces.txt文件中也是输出了的。

----- binder transactions -----
18096:18096(m.hm.aidlclient:m.hm.aidlclient) -> 17773:17788(m.hm.aidlserver:Binder:17773_2) code: 1

----- end binder transactions -----

上面这段信息的意思就是,我们是在进程id为18096,内核线程为18096的线程(就是主线程)向进程id为17773,内核线id为17788的线程发起跨进程通信。内核线程id为17788的线程的线程名称是Binder:17773_2。那么我们就搜索一下Binder:17773_2。搜索结果如下所示:

"Binder:17773_2" prio=5 tid=10 Sleeping
  | group="main" sCount=1 dsCount=0 obj=0x32c064c0 self=0x7f9a624800
  | sysTid=17788 nice=-10 cgrp=default sched=0/0 handle=0x7fa0fc3450
  | state=S schedstat=( 3077762 6086666 14 ) utm=0 stm=0 core=6 HZ=100
  | stack=0x7fa0ec9000-0x7fa0ecb000 stackSize=1005KB
  | held mutexes=
  at java.lang.Thread.sleep!(Native method)
  - sleeping on <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:379)
  - locked <0x05eea4a7> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:321)
  at com.hm.aidlserver.IRemoteService$1.add(IRemoteService.java:18)
  at com.hm.aidlserver.IMyAidlInterface$Stub.onTransact(IMyAidlInterface.java:55)
  at android.os.Binder.execTransact(Binder.java:565)

这里我们终于发现了原因,我们看到Binder:17773_2状态是Sleeping,就是服务端的Binder对象的add()方法内部第18行调用了Thread.sleep方法造成长时间无法返回,从而使客户端方法执行无法结束,最终导致ANR。

binder_anr.png

总结:本篇文章列举了几种常见原因造成ANR的示例,并分析了相关日志和traces.txt文件。然而在真实的场景中可能还会有各种稀奇古怪的原因造成ANR,排查起来也会复杂的多,所以最重要的还是防患于未然,在实际的开发过程中尽量避免主线程被长时间阻塞。

参考链接:

  1. Capture and read bug reports
  2. Android进阶知识:ANR的定位与解决
  3. ANR
  4. Android性能优化(七)之你真的理解ANR吗?
  5. Android应用ANR分析
  6. VMThread.java
  7. Android Binder学习笔记(一)
  8. AIDLDemo
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 159,716评论 4 364
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,558评论 1 294
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 109,431评论 0 244
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 44,127评论 0 209
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,511评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,692评论 1 222
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,915评论 2 313
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,664评论 0 202
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,412评论 1 246
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,616评论 2 245
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,105评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,424评论 2 254
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,098评论 3 238
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,096评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,869评论 0 197
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,748评论 2 276
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,641评论 2 271