×

从Daemons到finalize timed out after 10 seconds

96
Jamin_正宗红罐辣酱
2017.12.13 16:52* 字数 1282

〇.序

起因,在Fabric遇到了很多BUG..出于想对于BUG的来龙去脉做进一步的了解.做了一番调查..本文也是Android Crash的防护与追踪的后续研究.

一.浅读Daemons

1.Daemons的由来:

代码位置:java.lang.Daemons.

Daemons开始于Zygote进程创建子进程.

三件事情:

  1. VM_HOOKS.preFork(),做准备工作。
  2. nativeForkAndSpecialize,创建子进程
  3. VM_HOOKS.postForkCommon();启动Zygote的4个Daemon线程,java堆整理,引用队列,以及析构线程。
    //第一步:
    //停止四个线程:Daemon线程,java堆整理,引用队列,析构线程 
    //也就是创建子进程的时候,不能有这几个线程搅和。 
    public void preFork() {
        Daemons.stop();
        waitUntilAllThreadsStopped();
        token = nativePreFork();
    }
    
    ....
    
    /**
     * Called by the zygote in both the parent and child processes after
     * every fork. In the child process, this method is called after
     * {@code postForkChild}.
     */
     //第三步,启动Daemons
    public void postForkCommon() {
        Daemons.start();
    }

2.代码开篇:


/**
 * Calls Object.finalize() on objects in the finalizer reference queue. The VM
 * will abort if any finalize() call takes more than the maximum finalize time
 * to complete.
 *
 * @hide
 */
public final class Daemons {

    private static final int NANOS_PER_MILLI = 1000 * 1000;
    private static final int NANOS_PER_SECOND = NANOS_PER_MILLI * 1000;
    private static final long MAX_FINALIZE_NANOS = 10L * NANOS_PER_SECOND;

    public static void start() {
        ReferenceQueueDaemon.INSTANCE.start();
        FinalizerDaemon.INSTANCE.start();
        FinalizerWatchdogDaemon.INSTANCE.start();
        HeapTaskDaemon.INSTANCE.start();
    }

    public static void stop() {
        HeapTaskDaemon.INSTANCE.stop();
        ReferenceQueueDaemon.INSTANCE.stop();
        FinalizerDaemon.INSTANCE.stop();
        FinalizerWatchdogDaemon.INSTANCE.stop();
    }
    
    ...
}

Daemons类的开头写下了关键的注释,当调用finalizer reference queue(终结引用队列)中的对象的Object.finalize()方法的时候.假如有任何调用finalize() 方法时.超出了最大终结时间(一般为10秒).VM就会中止.这个最大终结时间就是MAX_FINALIZE_NANOS

2. 1四个守护线程

  1. ReferenceQueueDaemon:引用队列守护线程。我们知道,在创建引用对象的时候,可以关联一个队列。当被引用对象引用的对象被GC回收的时候,被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道哪些被引用的对象已经被回收了。
  2. FinalizerDaemon:析构守护线程。对于重写了成员函数finalize的对象,它们被GC决定回收时,并没有马上被回收,而是被放入到一个队列中,等待FinalizerDaemon守护线程去调用它们的成员函数finalize,然后再被回收。
  3. FinalizerWatchdogDaemon:析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重写了finalize的对象在执行成员函数finalize时超出一定时间,那么就会退出VM。
  4. HeapTaskDaemon : 堆裁剪守护线程。用来执行裁剪堆的操作,也就是用来将那些空闲的堆内存归还给系统。

2.2 重要属性解析

  • MAX_FINALIZE_NANOS:此类开始就定义了一个MAX_FINALIZE_NANOS.可能有类似经历的同学,看了之后就能明白.这个是finalize方法执行的超时时间.(经常会遇到XXX.finalize() timed out after 10 seconds)

  • Daemons提供了start(),stop()方法,分别开启和停止四个守护进程.

3.四大守护线程的抽象类

  /**
     * A background task that provides runtime support to the application.
     * Daemons can be stopped and started, but only so that the zygote can be a
     * single-threaded process when it forks.
     */
    private static abstract class Daemon implements Runnable {
    
        private Thread thread;
        private String name;

        protected Daemon(String name) {
            this.name = name;
        }

        public synchronized void start() {
            if (thread != null) {
                throw new IllegalStateException("already running");
            }
            thread = new Thread(ThreadGroup.systemThreadGroup, this, name);
            thread.setDaemon(true);
            thread.start();
        }

        public abstract void run();
        
        ...
        
}

Daemon线程是四大守护线程的抽象类.核心是在start()方法的时候,将自己放到ThreadGroup.systemThreadGroup中.并设置为守护线程.这部分不过分解读.

二. 本文重点:FinalizerWatchdogDaemon

本文重点是因为遇到finalize() timed out after 10 seconds.这样的异常,想要对此调研一番.

1.初识FinalizerWatchdogDaemon

1.1 类注释解读

FinalizerWatchdogDaemon的类注释写的很清楚.当一个实例的finalizer函数,花费了超过MAX_FINALIZATION_MILLIS时间(默认为10秒).将被认定finalizer卡死,此时,FinalizerWatchdogDaemon将会让VM退出.

    /**
     * The watchdog exits the VM if the finalizer ever gets stuck. We consider
     * the finalizer to be stuck if it spends more than MAX_FINALIZATION_MILLIS
     * on one instance.
     */
    private static class FinalizerWatchdogDaemon extends Daemon {
        private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();

        FinalizerWatchdogDaemon() {
            super("FinalizerWatchdogDaemon");
        }

        @Override public void run() {
        ...
        }
}

1.2 错误日志元凶

  • 找你一万年.当读到这的时候,大家一定发现了.这就是我们看到的那个TimeoutException.拼装.finalize() timed out after 10 seconds.的整个作案现场.
  • h.uncaughtException(Thread.currentThread(), syntheticException);上面的注释也特别关键,虽然这个TimeoutException是我抛出的.但是并不是我这个线程超时,我只是静静的看着它.

当WatchDog线程检测finalizing!=null的时候.会提示TimeOutException.

  @Override public void run() {
            while (isRunning()) {
                if (!sleepUntilNeeded()) {
                    // We have been interrupted, need to see if this daemon has been stopped.
                    continue;
                }
                final Object finalizing = waitForFinalization();
                if (finalizing != null && !VMRuntime.getRuntime().isDebuggerActive()) {
                    finalizerTimedOut(finalizing);
                    break;
                }
            }
        }

TimeOutException的生成.

   private static void finalizerTimedOut(Object object) {
            // The current object has exceeded the finalization deadline; abort!
            String message = object.getClass().getName() + ".finalize() timed out after "
                    + (MAX_FINALIZE_NANOS / NANOS_PER_SECOND) + " seconds";
            Exception syntheticException = new TimeoutException(message);
            // We use the stack from where finalize() was running to show where it was stuck.
            syntheticException.setStackTrace(FinalizerDaemon.INSTANCE.getStackTrace());
            Thread.UncaughtExceptionHandler h = Thread.getDefaultUncaughtExceptionHandler();
            // Send SIGQUIT to get native stack traces.
            try {
                Os.kill(Os.getpid(), OsConstants.SIGQUIT);
                // Sleep a few seconds to let the stack traces print.
                Thread.sleep(5000);
            } catch (Exception e) {
                System.logE("failed to send SIGQUIT", e);
            } catch (OutOfMemoryError ignored) {
                // May occur while trying to allocate the exception.
            }
            if (h == null) {
                // If we have no handler, log and exit.
                System.logE(message, syntheticException);
                System.exit(2);
            }
            // Otherwise call the handler to do crash reporting.
            // We don't just throw because we're not the thread that
            // timed out; we're the thread that detected it.
            h.uncaughtException(Thread.currentThread(), syntheticException);
        }
        

观察FinalizerDaemons的回收.



        /**
         * Return an object that took too long to finalize or return null.
         * Wait MAX_FINALIZE_NANOS.  If the FinalizerDaemon took essentially the whole time
         * processing a single reference, return that reference.  Otherwise return null.
         */
        private Object waitForFinalization() {
            long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
            // Avoid remembering object being finalized, so as not to keep it alive.
            if (!sleepFor(MAX_FINALIZE_NANOS)) {
                // Don't report possibly spurious timeout if we are interrupted.
                return null;
            }
            if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
                // We assume that only remove() and doFinalize() may take time comparable to
                // MAX_FINALIZE_NANOS.
                // We observed neither the effect of the gotoSleep() nor the increment preceding a
                // later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
                // interval must have been followed by a wakeUp call before we checked needToWork.
                // But then we would have seen the counter increment.  Thus there cannot have
                // been such a remove() call.
                // The FinalizerDaemon must not have progressed (from either the beginning or the
                // last progressCounter increment) to either the next increment or gotoSleep()
                // call.  Thus we must have taken essentially the whole MAX_FINALIZE_NANOS in a
                // single doFinalize() call.  Thus it's OK to time out.  finalizingObject was set
                // just before the counter increment, which preceded the doFinalize call.  Thus we
                // are guaranteed to get the correct finalizing value below, unless doFinalize()
                // just finished as we were timing out, in which case we may get null or a later
                // one.  In this last case, we are very likely to discard it below.
                Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
                sleepFor(NANOS_PER_SECOND / 2);
                // Recheck to make it even less likely we report the wrong finalizing object in
                // the case which a very slow finalization just finished as we were timing out.
                if (getNeedToWork()
                        && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
                    return finalizing;
                }
            }
            return null;
        }

顺便学习下 System.exit(2);的数字的意思.

 System.exit(2);
 //0表示正常退出,0以上表示非正常退出.
 private static final int RUNNING = 0;
 private static final int HOOKS = 1;
 private static final int FINALIZERS = 2;

传送门 : StackOverflow传送门,System.exit()扩展阅读

1.3 不得不提的FinalizerDaemon

WatchDog顾名思义,观察的就是这个FinalizerDaemon.一旦FinalizerDaemon在处理一个对象的时候超过10s中,那么就会结束进程,导致崩溃。附上FinalizerDaemon的源码.源码中的注释很详尽.可以自行阅读.

   private static class FinalizerDaemon extends Daemon {
        private static final FinalizerDaemon INSTANCE = new FinalizerDaemon();
        private final ReferenceQueue<Object> queue = FinalizerReference.queue;
        private final AtomicInteger progressCounter = new AtomicInteger(0);
        // Object (not reference!) being finalized. Accesses may race!
        private Object finalizingObject = null;

        FinalizerDaemon() {
            super("FinalizerDaemon");
        }

        @Override public void run() {
            // This loop may be performance critical, since we need to keep up with mutator
            // generation of finalizable objects.
            // We minimize the amount of work we do per finalizable object. For example, we avoid
            // reading the current time here, since that involves a kernel call per object.  We
            // limit fast path communication with FinalizerWatchDogDaemon to what's unavoidable: A
            // non-volatile store to communicate the current finalizable object, e.g. for
            // reporting, and a release store (lazySet) to a counter.
            // We do stop the  FinalizerWatchDogDaemon if we have nothing to do for a
            // potentially extended period.  This prevents the device from waking up regularly
            // during idle times.

            // Local copy of progressCounter; saves a fence per increment on ARM and MIPS.
            int localProgressCounter = progressCounter.get();

            while (isRunning()) {
                try {
                    // Use non-blocking poll to avoid FinalizerWatchdogDaemon communication
                    // when busy.
                    FinalizerReference<?> finalizingReference = (FinalizerReference<?>)queue.poll();
                    if (finalizingReference != null) {
                        finalizingObject = finalizingReference.get();
                        progressCounter.lazySet(++localProgressCounter);
                    } else {
                        finalizingObject = null;
                        progressCounter.lazySet(++localProgressCounter);
                        // Slow path; block.
                        FinalizerWatchdogDaemon.INSTANCE.goToSleep();
                        finalizingReference = (FinalizerReference<?>)queue.remove();
                        finalizingObject = finalizingReference.get();
                        progressCounter.set(++localProgressCounter);
                        FinalizerWatchdogDaemon.INSTANCE.wakeUp();
                    }
                    doFinalize(finalizingReference);
                } catch (InterruptedException ignored) {
                } catch (OutOfMemoryError ignored) {
                }
            }
        }

        @FindBugsSuppressWarnings("FI_EXPLICIT_INVOCATION")
        private void doFinalize(FinalizerReference<?> reference) {
            FinalizerReference.remove(reference);
            Object object = reference.get();
            reference.clear();
            try {
                object.finalize();
            } catch (Throwable ex) {
                // The RI silently swallows these, but Android has always logged.
                System.logE("Uncaught exception thrown by finalizer", ex);
            } finally {
                // Done finalizing, stop holding the object as live.
                finalizingObject = null;
            }
        }
    }

三.问题分析:

从GC来看.JAVA在内存回收的时候.并不是马上将对象销毁,而是放入到一个队列中,由FinalizerDaemon(析构守护线程)调用他们的finalize方法,再被回收,所以当FinalizerDaemon回收过程中.超过10秒,FinalizerWatchdogDaemon线程就会抛出TimeoutException XXX.finalize() timed out after 10 seconds.结尾附赠了两篇文章,有提到析构相关的性能开销和可以去追踪问题的办法.供大家扩展阅读.

四.聊聊FinalizerReference

内存占用大户(FinalizerReference)
[待续]换个认知角度再来看看这个问题.

结尾:

两篇不错的文章.
传送门 :
提升Android下内存的使用意识和排查能力
再谈Finalizer对象--大型App中内存与性能的隐性杀手

Android
Web note ad 1