×

Android DropboxManager介绍

96
Viking_Den
2016.09.20 12:04* 字数 1411

最近工作中需要对Crash的应用进行处理,看了下Android有提供相关的Manager对各类异常进行记录,也就是DropboxManager。

什么是 DropBoxManager ?

Enqueues chunks of data (from various sources – application crashes, kernel log records, etc.). The queue is size bounded and will drop old data if the enqueued data exceeds the maximum size. You can think of this as a persistent, system-wide, blob-oriented “logcat”.DropBoxManager entries are not sent anywhere directly, but other system. services and debugging tools may scan and upload entries for processing

DropBoxManager 是 Android 在 Froyo(API level 8) 引入的用来持续化存储系统数据的机制, 主要用于记录 Android 运行过程中, 内核, 系统进程, 用户进程等出现严重问题时的 log, 可以认为这是一个可持续存储的系统级别的 logcat.

我们可以通过用参数 DROPBOX_SERVICE 调用 getSystemService(String) 来获得这个服务, 并查询出所有存储在 DropBoxManager 里的系统错误记录.

Android 缺省能记录哪些系统错误 ?

具体能记录哪些系统错误,官方的文档中没有提及,我们在DropboxManager.java源代码文件中的EXTRA_TAG(tag)中找到相关信息。

  • crash (应用程序强制关闭, Force Close)
    当Java层遇到未被 catch 的例外时, ActivityManagerService 会记录一次 crash到 DropBoxManager中, 并弹出 Force Close对话框提示用户.
 /**
 * Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.
 * The application process will exit immediately after this call returns.
 * @param app object of the crashing app, null for the system server
 * @param crashInfo describing the exception
 */
 public void handleApplicationCrash(IBinder app, ApplicationErrorReport.CrashInfo crashInfo) {
       ProcessRecord r = findAppProcess(app, "Crash");
       final String processName = app == null ? "system_server": (r == null ? "unknown" : r.processName);
       handleApplicationCrashInner("crash", r, processName, crashInfo);
 }

 /* Native crash reporting uses this inner version because it needs to be somewhat
 * decoupled from the AM-managed cleanup lifecycle
 */
 void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,ApplicationErrorReport.CrashInfo crashInfo) {
      EventLog.writeEvent(EventLogTags.AM_CRASH, Binder.getCallingPid(),
      UserHandle.getUserId(Binder.getCallingUid()), processName, r == null ? -1 : r.info.flags,crashInfo.exceptionClassName,crashInfo.exceptionMessage,crashInfo.throwFileName,crashInfo.throwLineNumber);

      addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);

      crashApplication(r, crashInfo);
 }
  • anr (应用程序没响应, Application Not Responding, ANR)
    当应用程序的主线程(UI线程)长时间未能得到响应时, ActivityManagerService 会记录一次 anr到 DropBoxManager中, 并弹出 Application Not Responding对话框提示用户.
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { 
      //...... 
      addErrorToDropBox("anr", app, app.processName, activity, parent, annotation, cpuInfo, tracesFile, null);
      //......
}
  • wtf (What a Terrible Failure)
    ‘android.util.Log’ 类提供了静态的 wtf 函数, 应用程序可以在代码中用来主动报告一个不应当发生的情况. 依赖于系统设置, 这个函数会通过 ActivityManagerService 增加一个 wtf 记录到 DropBoxManager中, 并/或终止当前应用程序进程.
    /**
     * Used by {@link Log} via {@link com.android.internal.os.RuntimeInit} to report serious errors.
     * @param app object of the crashing app, null for the system server
     * @param tag reported by the caller
     * @param system whether this wtf is coming from the system
     * @param crashInfo describing the context of the error
     * @return true if the process should exit immediately (WTF is fatal)
     */
    public boolean handleApplicationWtf(final IBinder app, final String tag, boolean system,
            final ApplicationErrorReport.CrashInfo crashInfo) {
        final int callingUid = Binder.getCallingUid();
        final int callingPid = Binder.getCallingPid();
        if (system) {
            // If this is coming from the system, we could very well have low-level
            // system locks held, so we want to do this all asynchronously.  And we
            // never want this to become fatal, so there is that too.
            mHandler.post(new Runnable() {
                @Override public void run() {
                    handleApplicationWtfInner(callingUid, callingPid, app, tag, crashInfo);
                }
            });
            return false;
        }
        final ProcessRecord r = handleApplicationWtfInner(callingUid, callingPid, app, tag,
                crashInfo);
        if (r != null && r.pid != Process.myPid() &&
                Settings.Global.getInt(mContext.getContentResolver(),
                        Settings.Global.WTF_IS_FATAL, 0) != 0) {
            crashApplication(r, crashInfo);
            return true;
        } else {
            return false;
        }
    }
    ProcessRecord handleApplicationWtfInner(int callingUid, int callingPid, IBinder app, String tag,
            final ApplicationErrorReport.CrashInfo crashInfo) {
        final ProcessRecord r = findAppProcess(app, "WTF");
        final String processName = app == null ? "system_server"
                : (r == null ? "unknown" : r.processName);
        EventLog.writeEvent(EventLogTags.AM_WTF, UserHandle.getUserId(callingUid), callingPid,
                processName, r == null ? -1 : r.info.flags, tag, crashInfo.exceptionMessage);

        addErrorToDropBox("wtf", r, processName, null, null, tag, null, null, crashInfo);

        return r;
    }
  • strict_mode (StrictMode Violation)
    StrictMode (严格模式), 顾名思义, 就是在比正常模式检测得更严格, 通常用来监测不应当在主线程执行的网络, 文件等操作. 任何 StrictMode 违例都会被 ActivityManagerService 在 DropBoxManager 中记录为一次 strict_mode违例.

    public void handleApplicationStrictModeViolation(
            IBinder app,
            int violationMask,
            StrictMode.ViolationInfo info) {
        ProcessRecord r = findAppProcess(app, "StrictMode");
        if (r == null) {
            return;
        }
        if ((violationMask & StrictMode.PENALTY_DROPBOX) != 0) {
            Integer stackFingerprint = info.hashCode();
            boolean logIt = true;
            synchronized (mAlreadyLoggedViolatedStacks) {
                if (mAlreadyLoggedViolatedStacks.contains(stackFingerprint)) {
                    logIt = false;
                    // TODO: sub-sample into EventLog for these, with
                    // the info.durationMillis?  Then we'd get
                    // the relative pain numbers, without logging all
                    // the stack traces repeatedly.  We'd want to do
                    // likewise in the client code, which also does
                    // dup suppression, before the Binder call.
                } else {
                    if (mAlreadyLoggedViolatedStacks.size() >= MAX_DUP_SUPPRESSED_STACKS) {
                        mAlreadyLoggedViolatedStacks.clear();
                    }
                    mAlreadyLoggedViolatedStacks.add(stackFingerprint);
                }
            }
            if (logIt) {
                logStrictModeViolationToDropBox(r, info);
            }
        }
    //......
}
// Depending on the policy in effect, there could be a bunch of
    // these in quick succession so we try to batch these together to
    // minimize disk writes, number of dropbox entries, and maximize
    // compression, by having more fewer, larger records.
    private void logStrictModeViolationToDropBox(
            ProcessRecord process,
            StrictMode.ViolationInfo info) {
        if (info == null) {
            return;
        }
        final boolean isSystemApp = process == null ||
                (process.info.flags & (ApplicationInfo.FLAG_SYSTEM |
                                       ApplicationInfo.FLAG_UPDATED_SYSTEM_APP)) != 0;
        final String processName = process == null ? "unknown" : process.processName;
        final String dropboxTag = isSystemApp ? "system_app_strictmode" : "data_app_strictmode";
        final DropBoxManager dbox = (DropBoxManager)
                mContext.getSystemService(Context.DROPBOX_SERVICE);
        // Exit early if the dropbox isn't configured to accept this report type.
        if (dbox == null || !dbox.isTagEnabled(dropboxTag)) return;
        boolean bufferWasEmpty;
        boolean needsFlush;
        final StringBuilder sb = isSystemApp ? mStrictModeBuffer : new StringBuilder(1024);
        synchronized (sb) {
            bufferWasEmpty = sb.length() == 0;
            appendDropBoxProcessHeaders(process, processName, sb);
            sb.append("Build: ").append(Build.FINGERPRINT).append("\n");
            sb.append("System-App: ").append(isSystemApp).append("\n");
            sb.append("Uptime-Millis: ").append(info.violationUptimeMillis).append("\n");
            if (info.violationNumThisLoop != 0) {
                sb.append("Loop-Violation-Number: ").append(info.violationNumThisLoop).append("\n");
            }
            if (info.numAnimationsRunning != 0) {
                sb.append("Animations-Running: ").append(info.numAnimationsRunning).append("\n");
            }
            if (info.broadcastIntentAction != null) {
                sb.append("Broadcast-Intent-Action: ").append(info.broadcastIntentAction).append("\n");
            }
            if (info.durationMillis != -1) {
                sb.append("Duration-Millis: ").append(info.durationMillis).append("\n");
            }
            if (info.numInstances != -1) {
                sb.append("Instance-Count: ").append(info.numInstances).append("\n");
            }
            if (info.tags != null) {
                for (String tag : info.tags) {
                    sb.append("Span-Tag: ").append(tag).append("\n");
                }
            }
            sb.append("\n");
            if (info.crashInfo != null && info.crashInfo.stackTrace != null) {
                sb.append(info.crashInfo.stackTrace);
            }
            sb.append("\n");
            // Only buffer up to ~64k.  Various logging bits truncate
            // things at 128k.
            needsFlush = (sb.length() > 64 * 1024);
        }
        // Flush immediately if the buffer's grown too large, or this
        // is a non-system app.  Non-system apps are isolated with a
        // different tag & policy and not batched.
        //
        // Batching is useful during internal testing with
        // StrictMode settings turned up high.  Without batching,
        // thousands of separate files could be created on boot.
        if (!isSystemApp || needsFlush) {
            new Thread("Error dump: " + dropboxTag) {
                @Override
                public void run() {
                    String report;
                    synchronized (sb) {
                        report = sb.toString();
                        sb.delete(0, sb.length());
                        sb.trimToSize();
                    }
                    if (report.length() != 0) {
                        dbox.addText(dropboxTag, report);
                    }
                }
            }.start();
            return;
        }
        // System app batching:
        if (!bufferWasEmpty) {
            // An existing dropbox-writing thread is outstanding, so
            // we don't need to start it up.  The existing thread will
            // catch the buffer appends we just did.
            return;
        }
        // Worker thread to both batch writes and to avoid blocking the caller on I/O.
        // (After this point, we shouldn't access AMS internal data structures.)
        new Thread("Error dump: " + dropboxTag) {
            @Override
            public void run() {
                // 5 second sleep to let stacks arrive and be batched together
                try {
                    Thread.sleep(5000);  // 5 seconds
                } catch (InterruptedException e) {}
                String errorReport;
                synchronized (mStrictModeBuffer) {
                    errorReport = mStrictModeBuffer.toString();
                    if (errorReport.length() == 0) {
                        return;
                    }
                    mStrictModeBuffer.delete(0, mStrictModeBuffer.length());
                    mStrictModeBuffer.trimToSize();
                }
                dbox.addText(dropboxTag, errorReport);
            }
        }.start();
    }
  • lowmem (低内存)
    在内存不足的时候, Android 会终止后台应用程序来释放内存, 但如果没有后台应用程序可被释放时,ActivityManagerService 就会在 DropBoxManager 中记录一次 lowmem.
    public void handleMessage(Message msg) {
        switch (msg.what) {
        //...
        case REPORT_MEM_USAGE: {
            //......
            Thread thread = new Thread() {
                @Override public void run() {
                    StringBuilder dropBuilder = new StringBuilder(1024);
                    StringBuilder logBuilder = new StringBuilder(1024);
                    //......
                    addErrorToDropBox("lowmem", null, "system_server", null,
                            null, tag.toString(), dropBuilder.toString(), null, null);
                    //......
                }
            };
            thread.start();
            break;
        }
        //......
    }
  • watchdog
    如果 WatchDog 监测到系统进程(system_server)出现问题, 会增加一条 watchdog记录到 DropBoxManager 中, 并终止系统进程的执行.
/** This class calls its monitor every minute. Killing this process if they don't return **/
public class Watchdog extends Thread {
    //......
    @Override
    public void run() {
        boolean waitedHalf = false;
        while (true) {
            //......

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.

            //......

            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name, null, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}

            //......
        }
    }

    //......
}
  • netstats_error
    NetworkStatsService 负责收集并持久化存储网络状态的统计数据, 当遇到明显的网络状态错误时, 它会增加一条 netstats_error 记录到 DropBoxManager.
  • BATTERY_DISCHARGE_INFO
    BatteryService 负责检测充电状态, 并更新手机电池信息. 当遇到明显的 discharge 事件, 它会增加一条 BATTERY_DISCHARGE_INFO 记录到 DropBoxManager.
  • 系统服务(System Serve)启动完成后的检测
    系统服务(System Serve)启动完成后会进行一系列自检, 包括:
    -- 开机
    每次开机都会增加一条 SYSTEM_BOOT 记录.
    -- System Server 重启
    如果系统服务(System Server)不是开机后的第一次启动, 会增加一条 SYSTEM_RESTART 记录, 正常情况下系统服务(System Server)在一次开机中只会启动一次, 启动第二次就意味着 bug.
    -- Kernel Panic (内核错误)
    发生 Kernel Panic 时, Kernel 会记录一些 log 信息到文件系统, 因为 Kernel 已经挂掉了, 当然这时不可能有其他机会来记录错误信息了. 唯一能检测 Kernel Panic 的办法就是在手机启动后检查这些 log 文件是否存在, 如果存在则意味着上一次手机是因为 Kernel Panic 而宕机, 并记录这些日志到 DropBoxManager 中. DropBoxManager 记录 TAG 名称和对应的文件名分别是:
    SYSTEM_LAST_KMSG, 如果 /proc/last_kmsg 存在.
    APANIC_CONSOLE, 如果 /data/dontpanic/apanic_console 存在.
    APANIC_THREADS, 如果 /data/dontpanic/apanic_threads 存在.
    -- 系统恢复(System Recovery)
    通过检测文件 /cache/recovery/log 是否存在来检测设备是否因为系统恢复而重启, 并增加一条 SYSTEM_RECOVERY_LOG 记录到 DropBoxManager 中.
private void logBootEvents(Context ctx) throws IOException {
    final DropBoxManager db = (DropBoxManager) ctx.getSystemService(Context.DROPBOX_SERVICE);
    final SharedPreferences prefs = ctx.getSharedPreferences("log_files", Context.MODE_PRIVATE);
    final String headers = new StringBuilder(512)
        .append("Build: ").append(Build.FINGERPRINT).append("\n")
        .append("Hardware: ").append(Build.BOARD).append("\n")
        .append("Revision: ")
        .append(SystemProperties.get("ro.revision", "")).append("\n")
        .append("Bootloader: ").append(Build.BOOTLOADER).append("\n")
        .append("Radio: ").append(Build.RADIO).append("\n")
        .append("Kernel: ")
        .append(FileUtils.readTextFile(new File("/proc/version"), 1024, "...\n"))
        .append("\n").toString();

    String recovery = RecoverySystem.handleAftermath();
    if (recovery != null && db != null) {
        db.addText("SYSTEM_RECOVERY_LOG", headers + recovery);
    }

    if (SystemProperties.getLong("ro.runtime.firstboot", 0) == 0) {
        String now = Long.toString(System.currentTimeMillis());
        SystemProperties.set("ro.runtime.firstboot", now);
        if (db != null) db.addText("SYSTEM_BOOT", headers);

        // Negative sizes mean to take the *tail* of the file (see FileUtils.readTextFile())
        addFileToDropBox(db, prefs, headers, "/proc/last_kmsg",
                -LOG_SIZE, "SYSTEM_LAST_KMSG");
        addFileToDropBox(db, prefs, headers, "/cache/recovery/log",
                -LOG_SIZE, "SYSTEM_RECOVERY_LOG");
        addFileToDropBox(db, prefs, headers, "/data/dontpanic/apanic_console",
                -LOG_SIZE, "APANIC_CONSOLE");
        addFileToDropBox(db, prefs, headers, "/data/dontpanic/apanic_threads",
                -LOG_SIZE, "APANIC_THREADS");
    } else {
        if (db != null) db.addText("SYSTEM_RESTART", headers);
    }

    // Scan existing tombstones (in case any new ones appeared)
    File[] tombstoneFiles = TOMBSTONE_DIR.listFiles();
    for (int i = 0; tombstoneFiles != null && i < tombstoneFiles.length; i++) {
        addFileToDropBox(db, prefs, headers, tombstoneFiles[i].getPath(),
                LOG_SIZE, "SYSTEM_TOMBSTONE");
    }

    // Start watching for new tombstone files; will record them as they occur.
    // This gets registered with the singleton file observer thread.
    sTombstoneObserver = new FileObserver(TOMBSTONE_DIR.getPath(), FileObserver.CLOSE_WRITE) {
        @Override
        public void onEvent(int event, String path) {
            try {
                String filename = new File(TOMBSTONE_DIR, path).getPath();
                addFileToDropBox(db, prefs, headers, filename, LOG_SIZE, "SYSTEM_TOMBSTONE");
            } catch (IOException e) {
                Slog.e(TAG, "Can't log tombstone", e);
            }
        }
    };

    sTombstoneObserver.startWatching();
}
  • SYSTEM_TOMBSTONE (Native 进程的崩溃)
    Tombstone 是 Android 用来记录 native 进程崩溃的 core dump日志, 系统服务在启动完成后会增加一个 Observer 来侦测 tombstone 日志文件的变化, 每当生成新的 tombstone文件, 就会增加一条 SYSTEM_TOMBSTONE记录到 DropBoxManager 中.
    DropBoxManager 如何存储记录数据 ?
    DropBoxManager 使用的是文件存储, 所有的记录都存储在 /data/system/dropbox目录中, 一条记录就是一个文件, 当文本文件的尺寸超过文件系统的最小区块尺寸后, DropBoxManager 还会自动压缩该文件, 通常文件名以调用 DropBoxManager 的 TAG 参数开头.
$ adb shell ls -l /data/system/dropbox
-rw------- system   system        258 2012-11-21 11:36 SYSTEM_RESTART@1353469017940.txt
-rw------- system   system         39 2012-11-21 11:40 event_data@1353469222884.txt
-rw------- system   system         39 2012-11-21 12:10 event_data@1353471022975.txt
-rw------- system   system         34 2012-11-21 18:10 event_log@1353492624170.txt
-rw------- system   system         34 2012-11-21 18:40 event_log@1353494424296.txt
-rw------- system   system         34 2012-11-22 10:10 event_log@1353550227432.txt
-rw------- system   system       1528 2012-11-21 22:54 system_app_crash@1353509648395.txt
-rw------- system   system       1877 2012-11-21 11:36 system_app_strictmode@1353469014395.txt
-rw------- system   system       3724 2012-11-21 11:36 system_app_strictmode@1353469014924.txt.gz

如何利用 DropBoxManager ?

  • 利用 DropBoxManager 来记录需要持久化存储的错误日志信息
    DropBoxManager 提供了 logcat 之外的另外一种错误日志记录机制, 程序可以在出错的时候自动将相关信息记录到 DropBoxManager 中. 相对于 logcat, DropBoxManager 更适合于程序的自动抓错, 避免人为因素而产生的错误遗漏. 并且 DropBoxManager 是 Android 系统的公开服务, 相对于很多私有实现, 出现兼容性问题的几率会大大降低.

  • 错误自动上报
    可以将 DropBoxManager 和设备的 BugReport 结合起来, 实现自动上报错误到服务器. 每当生成新的记录, DropBoxManager 就会广播一个 DropBoxManager.ACTION_DROPBOX_ENTRY_ADDED Intent, 设备的 BugReport 服务需要侦听这个 Intent, 然后触发错误的自动上报.

参考
介绍 Android DropBoxManager Service
Android Official Site
DropBoxManager Overview
ActivityManager Service Overview
Android StrictMode Overview

Android笔记
Web note ad 1