深入卡顿优化

前言

我们经常会遇到卡顿问题而且卡顿问题往往很难解决与复现非常的依赖卡顿现场所以我们来深入分析一下卡顿优化

卡顿分析方法与工具

查看CPU性能

我们可以通过/proc/stat获得这个CPU的使用情况也可以通过/proc/[pid]/stat得到某个CPU的使用情况

卡顿排查工具

TraceView

我们可以通过TraceView直观的查看每个方法的耗时找到不符合预期的函数调用但是TraceView可能本身开销比较大会影响我们的判断
Systrace

我们在布局优化那边已经提到过Systrace的使用优点是轻量级系统级别也有很多使用Systrace 但是我们需要过滤大部分短函数
CPU Profile

Android Studio 提供了CPU Profile 来让我们直观的查看CPU的使用情况
- Sample Java Methods 的功能类似于 Traceview 的 sample 类型。
- Trace Java Methods 的功能类似于 Traceview 的 instrument 类型。
- Trace System Calls 的功能类似于 systrace。
- SampleNative (API Level 26+) 的功能类似于 Simpleperf。

StrictMode

if (BuildConfig.DEBUG) {
        StrictMode.setThreadPolicy(new StrictMode.ThreadPolicy.Builder()
                .detectCustomSlowCalls()
                .detectDiskReads()
                .detectDiskWrites()
                .detectNetwork()// or .detectAll() for all detectable problems
                .penaltyLog()
                .build());
        StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
                .detectLeakedSqlLiteObjects()
                .setClassInstanceLimit(NewsItem.class, 1)
                .detectLeakedClosableObjects() //API等级11
                .penaltyLog()
                .build());
    }

我们可以在Debug环境下开启严苛模式系统会自动检测出一些异常情况或者一些不符合预期的情况严苛模式主要分为两种检测策略

线程策略检测一些自定义的耗时调用磁盘网络io等等
虚拟机策略检测一些数据库调用内存泄漏以及检测实例数量

Profilo

Profilo是FaceBook开源的一个检测卡顿信息的库
它有以下几个优点:
1. 集成 atrace 功能
2. 快速获取JAVA堆栈 (我们也可以参考他的捕获方式)

线上自动化卡顿分析检测

下面详细讲一下如何做线上自动化卡顿分析

为啥要做线上卡顿分析检测?

我们可能会遇到一些反馈应用体验太卡抢购的时候卡了几秒? 然后我们却复现不出来因为用户现场对卡顿很重要所以我们需要加入线上自动化卡顿分析
在上面我们已经学习了几种工具的使用可以方便的线下分析卡顿接下来我们会使用几个方法来帮助我们分析卡顿

AndroidPerformanceMonitor

我们可以使用AndroidPerformanceMonitor库来很方便检测卡顿并且可以弹出Notification来查看卡顿堆栈

看一下使用配置

package com.dsg.androidperformance.block;

import android.content.Context;
import android.util.Log;

import com.github.moduth.blockcanary.BlockCanaryContext;
import com.github.moduth.blockcanary.internal.BlockInfo;

import java.io.File;
import java.util.LinkedList;
import java.util.List;

/**
 * @author DSG
 * @Project AndroidPerformance
 * @date 2020/7/18
 * @describe
 */
public class AppBlockCanaryContext extends BlockCanaryContext {

    /**
     * Implement in your project.
     *
     * @return Qualifier which can specify this installation, like version + flavor.
     */
    public String provideQualifier() {
        return "unknown";
    }

    /**
     * Implement in your project.
     *
     * @return user id
     */
    public String provideUid() {
        return "uid";
    }

    /**
     * Network type
     *
     * @return {@link String} like 2G, 3G, 4G, wifi, etc.
     */
    public String provideNetworkType() {
        return "unknown";
    }

    /**
     * Config monitor duration, after this time BlockCanary will stop, use
     * with {@code BlockCanary}'s isMonitorDurationEnd
     *
     * @return monitor last duration (in hour)
     */
    public int provideMonitorDuration() {
        return -1;
    }

    /**
     * Config block threshold (in millis), dispatch over this duration is regarded as a BLOCK. You may set it
     * from performance of device.
     *
     * @return threshold in mills
     */
    public int provideBlockThreshold() {
        return 500;
    }

    /**
     * Thread stack dump interval, use when block happens, BlockCanary will dump on main thread
     * stack according to current sample cycle.
     * <p>
     * Because the implementation mechanism of Looper, real dump interval would be longer than
     * the period specified here (especially when cpu is busier).
     * </p>
     *
     * @return dump interval (in millis)
     */
    public int provideDumpInterval() {
        return provideBlockThreshold();
    }

    /**
     * Path to save log, like "/blockcanary/", will save to sdcard if can.
     *
     * @return path of log files
     */
    public String providePath() {
        return "/blockcanary/";
    }

    /**
     * If need notification to notice block.
     *
     * @return true if need, else if not need.
     */
    public boolean displayNotification() {
        return true;
    }

    /**
     * Implement in your project, bundle files into a zip file.
     *
     * @param src  files before compress
     * @param dest files compressed
     * @return true if compression is successful
     */
    public boolean zip(File[] src, File dest) {
        return false;
    }

    /**
     * Implement in your project, bundled log files.
     *
     * @param zippedFile zipped file
     */
    public void upload(File zippedFile) {
        throw new UnsupportedOperationException();
    }


    /**
     * Packages that developer concern, by default it uses process name,
     * put high priority one in pre-order.
     *
     * @return null if simply concern only package with process name.
     */
    public List<String> concernPackages() {
        return null;
    }

    /**
     * Filter stack without any in concern package, used with @{code concernPackages}.
     *
     * @return true if filter, false it not.
     */
    public boolean filterNonConcernStack() {
        return false;
    }

    /**
     * Provide white list, entry in white list will not be shown in ui list.
     *
     * @return return null if you don't need white-list filter.
     */
    public List<String> provideWhiteList() {
        LinkedList<String> whiteList = new LinkedList<>();
        whiteList.add("org.chromium");
        return whiteList;
    }

    /**
     * Whether to delete files whose stack is in white list, used with white-list.
     *
     * @return true if delete, false it not.
     */
    public boolean deleteFilesInWhiteList() {
        return true;
    }

    /**
     * Block interceptor, developer may provide their own actions.
     */
    public void onBlock(Context context, BlockInfo blockInfo) {
        Log.i("main1","blockInfo "+blockInfo.toString());
    }
}

我们可以看到有很多自定义的配置项我们可以配置一些白名单不参与检测卡顿耗时标准等等

然后需要在Application中调用BlockCanary.install(this, new AppBlockCanaryContext()).start();就完成接入

原理分析

AndroidPerformanceMonitor的原理也很简单就是自定义了Looper对象的Printer对象在调用msg.target.dispatchMessage(msg);前后可以开启一个延时任务如果dispatchMessage在延时时间里完成了我们就认为没有发生卡顿否则就开启子线程生成当前堆栈信息

AndroidPerformanceMonitor源码分析

我们主要就通过BlockCanary.install(this, new AppBlockCanaryContext()).start();方法来接入
看一下start方法

 public void start() {
        if (!mMonitorStarted) {
            mMonitorStarted = true;
            Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
        }
    }

和我们前面讲的一样会使用自定义的Printer对象来实现看一下monitor对象的println方法

@Override
    public void println(String x) {
        if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
            return;
        }
        if (!mPrintingStarted) {
            mStartTimestamp = System.currentTimeMillis();
            mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
            mPrintingStarted = true;
            //开启延时任务
            startDump();
        } else {
            final long endTime = System.currentTimeMillis();
            mPrintingStarted = false;
            //是否超过阻塞时间 默认每3000毫秒就会采集一次堆栈信息
            if (isBlock(endTime)) {
                notifyBlockEvent(endTime);
            }
            //关闭
            stopDump();
        }
    }

startDump会分别启动堆采样器和cpu采样器来对任务栈进行采集我们取cpu采样器来看一下通过下面代码我们可以发现会开启一个任务来采集堆栈

 public void start() {
        if (mShouldSample.get()) {
            return;
        }
        mShouldSample.set(true);

        HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
        HandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable,
                BlockCanaryInternals.getInstance().getSampleDelay());
    }
    
long getSampleDelay() {
        return (long) (BlockCanaryInternals.getContext().provideBlockThreshold() * 0.8f);
    }

看一下如何采集cpu信息

@Override
    protected void doSample() {
        BufferedReader cpuReader = null;
        BufferedReader pidReader = null;

        try {
            cpuReader = new BufferedReader(new InputStreamReader(
                    new FileInputStream("/proc/stat")), BUFFER_SIZE);
            String cpuRate = cpuReader.readLine();
            if (cpuRate == null) {
                cpuRate = "";
            }
              
            if (mPid == 0) {
                mPid = android.os.Process.myPid();
            }
            //手机cpu信息 我们在文章开头也讲到过
            pidReader = new BufferedReader(new InputStreamReader(
                    new FileInputStream("/proc/" + mPid + "/stat")), BUFFER_SIZE);
            String pidCpuRate = pidReader.readLine();
            if (pidCpuRate == null) {
                pidCpuRate = "";
            }
              //分析cpu信息
            parse(cpuRate, pidCpuRate);
        } catch (Throwable throwable) {
            Log.e(TAG, "doSample: ", throwable);
        } finally {
            try {
                if (cpuReader != null) {
                    cpuReader.close();
                }
                if (pidReader != null) {
                    pidReader.close();
                }
            } catch (IOException exception) {
                Log.e(TAG, "doSample: ", exception);
            }
        }
    }

我们看到会查看"/proc/" + mPid + "/stat"这个文件但是这个文件在高版本上可能会没有权限查看

如果发生卡顿就分析卡顿日志

setMonitor(new LooperMonitor(new LooperMonitor.BlockListener() {

            @Override
            public void onBlockEvent(long realTimeStart, long realTimeEnd,
                                     long threadTimeStart, long threadTimeEnd) {
                // Get recent thread-stack entries and cpu usage
                ArrayList<String> threadStackEntries = stackSampler
                        .getThreadStackEntries(realTimeStart, realTimeEnd);
                if (!threadStackEntries.isEmpty()) {
                    BlockInfo blockInfo = BlockInfo.newInstance()
                            .setMainThreadTimeCost(realTimeStart, realTimeEnd, threadTimeStart, threadTimeEnd)
                            .setCpuBusyFlag(cpuSampler.isCpuBusy(realTimeStart, realTimeEnd))
                            .setRecentCpuRate(cpuSampler.getCpuRateInfo())
                            .setThreadStackEntries(threadStackEntries)
                            .flushString();
                    LogWriter.save(blockInfo.toString());

                    if (mInterceptorChain.size() != 0) {
                    //遍历所有拦截器 分别调用onBlock 这里会打印日志 弹出Notification 我们还会实现自定义卡顿手机操作
                        for (BlockInterceptor interceptor : mInterceptorChain) {
                            interceptor.onBlock(getContext().provideContext(), blockInfo);
                        }
                    }
                }
            }
        }, getContext().provideBlockThreshold(), getContext().stopWhenDebugging()));

AndroidPerformanceMonitor使用总结

使用mLogging的方式会有监控盲区的问题所以AndroidPerformanceMonitor采用高频采集的方式分析(每1s采集一次堆栈信息)

我们在使用这个库的过程中还是遇到了一些问题需要我们自己去修复一下

Notification在8.0以上必须要channel id
在高版本中 /cpu/pid/stat 文件已经没有权限读取了

ANR分析

ANR发生的情况比较多有几下几种

按键事件5s内未执行完成 KEY_DISPATCHING_TIMEOUT_MS
前台广播10s 后台广播20s未完成
前台服务20s 后台服务200s未完成

//AMS
static final int BROADCAST_FG_TIMEOUT = 10*1000;
static final int BROADCAST_BG_TIMEOUT = 60*1000;

//ATMS
KEY_DISPATCHING_TIMEOUT_MS

WatchDog源码分析

当ANR发生时系统收到异常终止信息写入进程ANR信息包括当时进程的堆栈 CPU IO等情况并且写入/data/anr目录下我们可以通过FileObserver监听这个文件变化查看是否发生ANR 但是在高版本中这个文件需要ROOT权限才可以查看

所以我们可以使用WatchDog这个库来帮助我们分析手机ANR

这个库的原理也比较简单

获取当前线程的Handler 然后发送一个runnable runnable里面执行的内容就是将一个局部变量+1
等待5s后查看局部变量是否+1 如果没有加那么就认为发生了ANR
如果发生了ANR 就手机当前堆栈信息并输出log 或者执行用户自定义操作

来看一下源码
ANRWatchDog继承自 Thread 所以我们来看一下run方法

@Override
    public void run() {
         //修改线程名
        setName("|ANR-WatchDog|");

        int lastTick;
        int lastIgnored = -1;
        while (!isInterrupted()) {
            lastTick = _tick;
            //往主线程post一个任务
            _uiHandler.post(_ticker);
            try {
                //睡眠5s(默认)
                Thread.sleep(_timeoutInterval);
            }
            catch (InterruptedException e) {
                //处理中断
                _interruptionListener.onInterrupted(e);
                return ;
            }

            // If the main thread has not handled _ticker, it is blocked. ANR.
            //如果没变 表示发生了ANR
            if (_tick == lastTick) {
                if (!_ignoreDebugger && Debug.isDebuggerConnected()) {
                    if (_tick != lastIgnored)
                        Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))");
                    lastIgnored = _tick;
                    continue ;
                }

                ANRError error;
                if (_namePrefix != null)
                    error = ANRError.New(_namePrefix, _logThreadsWithoutStackTrace);
                else
                    error = ANRError.NewMainOnly();//获取主线程堆栈的堆栈信息
                    //抛出异常
                _anrListener.onAppNotResponding(error);
                return;
            }
        }
    }
    
  //默认的ANR响应处理 直接抛出异常 所以遇到ANR直接就会闪退了
  private static final ANRListener DEFAULT_ANR_LISTENER = new ANRListener() {
        @Override public void onAppNotResponding(ANRError error) {
            throw error;
        }
    };

监控盲区

先来解释一下什么是监控盲区举个🌰
假如我们认为卡顿的阈值是2s 那么A方法中会调用B C方法 B方法耗时1.5s C方法耗时0.5s 这时候卡顿发生了我们收集信息当前任务堆栈是C方法而不是实际的B方法也就是监控盲区

监控盲区线下方案

线下时我们可以直接用TraceView 直观明了可以直接看到每个方法的耗时可以很快的定位到耗时

监控盲区线上方案

上面我们有讲过AndroidPerformanceMonitor 这个库使用mLogging来做监控但是只能知道系统当前任务栈并不知道Message是被谁抛出

所以我们可以会使用统一Handler 这样我们就可以收集sendMessageAtTime 和 dispatchMessages方法

看一下代码

package com.optimize.performance.handler;

import android.os.Handler;
import android.os.Looper;
import android.os.Message;
import android.util.Log;

import com.optimize.performance.utils.LogUtils;

import org.json.JSONObject;

public class SuperHandler extends Handler {

    private long mStartTime = System.currentTimeMillis();

    public SuperHandler() {
        super(Looper.myLooper(), null);
    }

    public SuperHandler(Callback callback) {
        super(Looper.myLooper(), callback);
    }

    public SuperHandler(Looper looper, Callback callback) {
        super(looper, callback);
    }

    public SuperHandler(Looper looper) {
        super(looper);
    }

    @Override
    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
        boolean send = super.sendMessageAtTime(msg, uptimeMillis);
        if (send) {
                //收集message堆栈信息
            GetDetailHandlerHelper.getMsgDetail().put(msg, Log.getStackTraceString(new Throwable()).replace("java.lang.Throwable", ""));
        }
        return send;
    }

    @Override
    public void dispatchMessage(Message msg) {
        mStartTime = System.currentTimeMillis();
        super.dispatchMessage(msg);

        if (GetDetailHandlerHelper.getMsgDetail().containsKey(msg)
                && Looper.myLooper() == Looper.getMainLooper()) {
            JSONObject jsonObject = new JSONObject();
            try {
                    //收集耗时
                jsonObject.put("Msg_Cost", System.currentTimeMillis() - mStartTime);
                //收集堆栈
                jsonObject.put("MsgTrace", msg.getTarget() + " " + GetDetailHandlerHelper.getMsgDetail().get(msg));
                   //这里可以做自定义操作
                LogUtils.i("MsgDetail " + jsonObject.toString());
                GetDetailHandlerHelper.getMsgDetail().remove(msg);
            } catch (Exception e) {
            }
        }
    }

}

我们还会使用一个辅助类来存放msg对应堆栈信息

public class GetDetailHandlerHelper {

    private static ConcurrentHashMap<Message, String> sMsgDetail = new ConcurrentHashMap<>();

    public static ConcurrentHashMap<Message, String> getMsgDetail() {
        return sMsgDetail;
    }

}

这样我们就可以收集msg耗时和抛出msg的堆栈信息

关于全局替换Handler 我们可以使用AOP的方式来实现可以使用滴滴出行的开源库DroidAssist

image.png

可以通过替换的方式将所有Handler替换成我们的SuperHandler

总结

卡顿问题分析牵扯的知识点会比较多我们可能会学习比较吃力但是坚持下去收获还是会很大
在分析卡顿的过程中
我们需要线下和线上同时重点关注线下使用ARTHook,第三方库以及TraceView 尽量在实验室环境将卡顿问题暴露出来线上使用SuperHandler和ANRWatchDog来收集卡顿和ANR信息

我们还可以通过之前讲过的启动优化布局优化的知识点来优化卡顿问题可以将一些耗时操作延时或者异步执行使用异步Inflate X2C 预加载数据减少IO等待等方法来优化卡顿问题

但是要优雅的优化代码