[Android] JNI ERROR (app bug): attempt to use stale Local 0xHHHHHHHH*

Android JNI时偶现attempt to use stale Local问题,同时报错log还给出提示说这个问题是app bug,立马坐不住了,必须看看这是不是诬告~
先解释一下报错是什么意思:

JNI ERROR (app bug): attempt to use stale Local 0xHHHHHHHH
JNI 错误(应用层缺陷):试图使用一个过期的局部引用0xHHHHHHHH

从Android 4开始JNI中引入了局部引用(Local Ref)和全局引用(Global Ref)的概念,JNI中的引用(Reference)是指在native代码中引用Java代码中的对象的句柄。局部引用(Local Ref)是用于在当前线程下持有的引用就在当前线程下使用,其他线程不可使用。全局引用(Global Ref)是用于在当前线程下持有的引用公开给本进程中所有线程使用,即可把局部引用转换为全局引用来使用。

问题代码

01 | #include <jni.h>
02 |
03 | static jclass gInputStream_Clazz;
04 | static jmethod gInputStream_Method_reset;
05 | static jmethod gInputStream_Method_mark;
06 | 
07 | void initJavaStream(JNIEnv* env) {
08 |     static bool gIsInitedJavaStream;
09 |     if (!gIsInitedJavaStream) {
10 |         gInputStream_Clazz = env->FindClass("java/io/InputStream");
11 |         gInputStream_Clazz = (jclass) env->NewGlobalRef(gInputStream_Clazz);
12 |         gInputStream_Method_reset = env->GetMethodID(gInputStream_Clazz, "reset", "()V");
13 |         gInputStream_Method_mark = env->GetMethodID(gInputStream_Clazz, "mark", "(I)V");
14 |         gIsInitedJavaStream = true;
15 |     }
16 | }
17 | 
18 | void doSomethingWithJavaStream(JNIEnv* env) {
19 |     gInputStream_Clazz .....
20 | }

报错堆栈

Build fingerprint: '...................................'
Revision: '0'
ABI: 'arm64'
Timestamp: 2021-03-11 11:40:17+0800
pid: 23537, tid: 29233, name: XXXThead  >>> com.example.demo <<<
uid: 10076
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'JNI ERROR (app bug): attempt to use stale Local 0x29 (should be 0x25)'
    x0  0000000000000000  x1  0000000000007231  x2  0000000000000006  x3  00000075946fe4d0
    x4  00000076b4cda000  x5  00000076b4cda000  x6  00000076b4cda000  x7  000000000121b87e
    x8  00000000000000f0  x9  0008c5ca4be56366  x10 0000000000000000  x11 ffffffc0fffffbdf
    x12 0000000000000001  x13 ffffffff9fb73d4c  x14 000000000bd92a3f  x15 ffffffffffffffff
    x16 00000076b13e87e0  x17 00000076b13c7da0  x18 0000007541580000  x19 0000000000005bf1
    x20 0000000000007231  x21 00000000ffffffff  x22 b400007593f54400  x23 0000000000000000
    x24 000000762b6b8000  x25 000000762b0be360  x26 000000762af3286c  x27 000000762b6ba000
    x28 000000762b6bb000  x29 00000075946fe550
    lr  00000076b137b460  sp  00000075946fe4b0  pc  00000076b137b48c  pst 0000000000000000

backtrace:
      #00 pc 000000000007848c  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164)
      #01 pc 000000000055db6c  /apex/com.android.art/lib64/libart.so (art::Runtime::Abort(char const*)+2308)
      #02 pc 0000000000013988  /system/lib64/libbase.so (android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_3::__invoke(char const*)+76)
      #03 pc 0000000000012fb4  /system/lib64/libbase.so (android::base::LogMessage::~LogMessage()+320) (BuildId: a09a41f1c2370328c811e735cdaa2860)
      #04 pc 00000000002f96ac  /apex/com.android.art/lib64/libart.so (art::IndirectReferenceTable::AbortIfNoCheckJNI(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)+224)
      #05 pc 000000000038c54c  /apex/com.android.art/lib64/libart.so (art::IndirectReferenceTable::GetChecked(void*) const+444)
      #06 pc 00000000005b2fb8  /apex/com.android.art/lib64/libart.so (art::Thread::DecodeJObject(_jobject*) const+96)
      #07 pc 0000000000391a08  /apex/com.android.art/lib64/libart.so (art::FindMethodJNI(art::ScopedObjectAccess const&, _jclass*, char const*, char const*, bool)+68)
      #08 pc 000000000039e87c  /apex/com.android.art/lib64/libart.so (art::JNI<false>::GetMethodID(_JNIEnv*, _jclass*, char const*, char const*)+660)
      #09 pc 0000000000009ac0  /my_stock/app/Demo/Demo.apk!libnative.so (offset 0x6671000) (initJavaStream(_JNIEnv*)+316)
      .................

原因

由于FindClass返回的gInputStream_Clazz为局部引用(LocalRef),而接下来gInputStream_Clazz会在doSomethingWithJavaStream中被任意线程调用,因此需要把gInputStream_Clazz转换为全局引用(GlobalRef)。

但当initJavaStream被两条线程一前一后紧接着调用时,问题就出现了:

Thread1 (line09) -> 判断gIsInitedJavaStream为false,进入if
Thread1 (line10) -> FindClass返回LocalRef给gInputStream_Clazz
Thread2 (line09) -> 判断gIsInitedJavaStream为false,进入if
Thread1 (line11) -> 通过NewGlobal将gInputStream_Clazz从LocalRef转为GlobalRef
Thread1 (line12) -> 使用有效的gInputStream_Clazz引用来获取InputStream的reset方法的句柄
Thread2 (line10) -> FindClass返回LocalRef给gInputStream_Clazz

......此时可能出现GC,JVM移动了gInputStream_Clazz指向的Java对象的内存位置......

Thread1 (line13) -> 使用有效的gInputStream_Clazz引用来获取InputStream的mark方法的句柄 【崩溃!gInputStream_Clazz无效!】

解决方法

01 | #include <jni.h>
02 | #include <pthread.h>
03 | 
04 | static jclass gInputStream_Clazz;
05 | static jmethod gInputStream_Method_reset;
06 | static jmethod gInputStream_Method_mark;
07 | 
08 | static pthread_mutex_t mutex;
09 | 
10 | void init() {
11 |     pthread_mutex_init(&mutex, NULL);
12 | }
13 | 
14 | void initJavaStream(JNIEnv* env) {
15 |     static bool gIsInitedJavaStream;
16 | 
17 |     pthread_mutex_lock(&mutex);
18 |     if (!gIsInitedJavaStream) {
19 |         jclass localClazz = env->FindClass("java/io/InputStream");
20 |         gInputStream_Clazz = (jclass) env->NewGlobalRef(localClazz);
21 |         gInputStream_Method_reset = env->GetMethodID(gInputStream_Clazz, "reset", "()V");
22 |         gInputStream_Method_mark = env->GetMethodID(gInputStream_Clazz, "mark", "(I)V");
23 |         gIsInitedJavaStream = true;
24 |     }
25 |     pthread_mutex_unlock(&mutex);
26 | }
27 | 
28 | void doSomethingWithJavaStream(JNIEnv* env) {
29 |     gInputStream_Clazz .....
30 | }
31 | 
32 | void deinit() {
33 |     pthread_mutex_destroy(&mutex);
34 | }

无论在Java层保证还是在native上保证都可以,只要能够保证initJavaStream的过程是线程安全的,此问题即可得到解决。
同时,从编码习惯上也应该规避一下这个问题,第19行FindClass返回的结果此时应该使用一个局部变量来持有,而不应该偷懒地直接赋值给全局可访问的static变量持有,这个编码习惯在一定程度上可以规避这个崩溃,虽然同样会造成Ref泄露,但这个习惯依然是一个可以防坑的好习惯。

推荐阅读更多精彩内容