SparseArray源码解析

SparseArray是android sdk提供的一个集合类,主要是用来替换是用于替代keyint类型,valueObject类型的HashMap
从原理上说, SparseArray内部实现是基于两个数组。 一个int[]数组mKeys,用于保存每个元素的key,key本身就是int类型,所以可以理解hashCode值就是key的值。一个Object[]数组mValues,保存value。两个数组的容量是保持同步的。

HashMap相比,SparseArray总的来说有以下特点:

  1. 由于存在了一个int[]数组mKeys,因此避免了自动装箱(auto-boxing)
  2. SparseArray扩容时只需要数组拷贝工作(System.arraycopy),不需要重建哈希表(HashMap的rehash过程十分消耗性能)。
  3. SparseArray以时间换空间,由于SparseArray使用了二分查找,因此时间会比HashMap略逊,但是空间十分节省(大家都是搞android开发的,肯定都知道移动平台对内存的高要求)。为啥SparseArray比较节省空间呢?因为mKeys数组的元素总是连续的(即使中间会存在DELETED元素,但是这些DELETED元素会在后续的自定义的gc()方法中被清除掉),不像HashMap数组里的元素由于内在的hash冲突无法彻底解决而导致数组(在HashMap中称之为哈希桶)空间的利用很浪费;
  4. 不适合大容量的数据存储。存储大量数据时,它的性能将退化至少50%。如果数据量在千以内,可以考虑使用SparseArray
  5. 按照key进行自然排序
  6. SparseArray的核心思想体现在GrowingArrayUtils这个类中,只要搞懂了GrowingArrayUtils这个类,就能明白SparseArray的原理。

简单的demo使用:

        SparseArray<String> stringSparseArray = new SparseArray<>();
        stringSparseArray.put(1, "a");
        stringSparseArray.put(5, "e");
        stringSparseArray.put(4, "d");
        stringSparseArray.put(10, "h");
        stringSparseArray.put(2, null);
        Log.d("zzh", "onCreate() called with: stringSparseArray = [" + stringSparseArray + "]");
       //输出结果
        //zzh: onCreate() called with: stringSparseArray = [{1=a, 2=null, 4=d, 5=e, 10=h}]

1 SparseArray类图

JDK中的Map集合都是实现了Map这个接口,但是SparseArray仅仅实现了Cloneable接口。

图1-UML类图

2 SparseArray成员变量和构造器

相比HashMapSparseArray的成员变量很少,只有5个:

//元素删除的标记
private static final Object DELETED = new Object();
//是否需要GC 
private boolean mGarbage = false;
//key数组
private int[] mKeys;
//value数组
private Object[] mValues;
//已经存储的元素的个数
private int mSize;

SparseArray有两个构造器:

    public SparseArray() {
        this(10);
    }

    /**
     * Creates a new SparseArray containing no mappings that will not
     * require any additional memory allocation to store the specified
     * number of mappings.  If you supply an initial capacity of 0, the
     * sparse array will be initialized with a light-weight representation
     * not requiring any additional array allocations.
     */
    public SparseArray(int initialCapacity) {
        if (initialCapacity == 0) {//初始容量为0的话,就赋值两个轻量级的引用
            mKeys = EmptyArray.INT;
            mValues = EmptyArray.OBJECT;
        } else {
            mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
            mKeys = new int[mValues.length];
        }
        mSize = 0;
    }

从构造器中可以看出,默认容量是10,但是生成mValues的时候调用了ArrayUtils.newUnpaddedObjectArray(initialCapacity),我测试了一行代码SparseArray<String> fs = new SparseArray<>();,但后打断点调试的结果如下图所示:

其容量并不是10,而是13,这个13应该跟Value的类型有关(我测试的是String类型),网上查了下,stackoverflow上有以下说法:

You can find more information in class VMRuntime, which is used by class ArrayUtils.
The JavaDoc of VMRuntime.newUnpaddedArray(...) says:

Returns an array of at least minLength, but potentially larger. The increased size comes from avoiding any padding after the array. The amount of padding varies depending on the componentType and the memory allocator implementation.

Java data types use different size in memory. If only the needed capacity would be allocated, some space at then end would be left unused before the next memory allocation. This space is called padding. So, in order to not waste this space, the array made a little bit larger.

3 SparseArray核心操作

集合的核心操作无外乎增删改查。

3.1 put(int key, E value)

    public void put(int key, E value) {
        // 利用二分查找,找到待插入key 的 下标index
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
        //如果返回的index是正数,说明之前这个key存在,直接覆盖value即可
        if (i >= 0) {
            mValues[i] = value;
        } else {
            //若返回的index是负数,说明 key不存在.
            //先对返回的i取反,得到应该插入的位置i
            i = ~i;
            //如果i没有越界,且对应位置是已删除的标记,则复用这个空间
            if (i < mSize && mValues[i] == DELETED) {
                mKeys[i] = key;
                mValues[i] = value;
                return;
            }
            //如果需要GC,且需要扩容
            if (mGarbage && mSize >= mKeys.length) {
                //先触发GC
                gc();
                //gc后,下标i可能发生变化,所以再次用二分查找找到应该插入的位置i
                i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
            }
            //插入key(可能需要扩容)
            mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
            //插入value(可能需要扩容)
            mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
            //集合大小递增
            mSize++;
        }
    }

put方法用到了数组的二分查找法,贴出代码:

    // This is Arrays.binarySearch(), but doesn't do any argument validation.
    static int binarySearch(int[] array, int size, int value) {
        int lo = 0;
        int hi = size - 1;
        while (lo <= hi) {
            final int mid = (lo + hi) >>> 1;
            final int midVal = array[mid];
            if (midVal < value) {
                lo = mid + 1;
            } else if (midVal > value) {
                hi = mid - 1;
            } else {
                return mid;  // value found
            }
        }
        return ~lo;  // value not present
    }

垃圾回收函数gc()的代码:

    //垃圾回收函数,压缩数组
    private void gc() {
        //保存GC前的集合大小
        int n = mSize;
        //既是下标index,又是GC后的集合大小
        int o = 0;
        int[] keys = mKeys;
        Object[] values = mValues;
        //遍历values集合,以下算法 意义为 从values数组中,删除所有值为DELETED的元素
        for (int i = 0; i < n; i++) {
            Object val = values[i];
            //如果当前value 没有被标记为已删除
            if (val != DELETED) {
                //压缩keys、values数组
                if (i != o) {
                    keys[o] = keys[i];
                    values[o] = val;
                    //并将当前元素置空,防止内存泄漏
                    values[i] = null;
                }
                o++;
            }
        }
        //修改标识位,不需要GC
        mGarbage = false;
        //gc后的元素个数变为压缩后的个数o
        mSize = o;
    }

gc()方法十分关键,该方法把DELETED元素全部置为null并压缩两个数组。
GrowingArrayUtils#insert源码:

    public static int[] insert(int[] array, int currentSize, int index, int element) {
        //断言 确认 当前集合长度 小于等于 array数组长度
        assert currentSize <= array.length;
        //这个if表示不需要扩容
        if (currentSize + 1 <= array.length) {
            //将array数组内元素,从index开始 后移一位
            System.arraycopy(array, index, array, index + 1, currentSize - index);
            //在index处赋值
            array[index] = element;
            return array;
        }
        //需要扩容
        //构建新的数组
        int[] newArray = new int[growSize(currentSize)];
        //将原数组中index之前的数据复制到新数组中
        System.arraycopy(array, 0, newArray, 0, index);
        //在index处赋值
        newArray[index] = element;
        //将原数组中index及其之后的数据赋值到新数组中
        System.arraycopy(array, index, newArray, index + 1, array.length - index);
        return newArray;
    }
    public static int growSize(int currentSize) {
        return currentSize <= 4 ? 8 : currentSize * 2;
    }

3.2 remove(int key)按照key删除,removeAt(int index)按照index删除

    /**
     * Alias for {@link #delete(int)}.
     */
    public void remove(int key) {
        delete(key);
    }
    /**
     * Removes the mapping from the specified key, if there was any.
     */
    public void delete(int key) {
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

        if (i >= 0) {
            if (mValues[i] != DELETED) {
                mValues[i] = DELETED;
                mGarbage = true;
            }
        }
    }
    public void removeAt(int index) {
        if (mValues[index] != DELETED) {
            mValues[index] = DELETED;
            mGarbage = true;
        }
    }

从删除方法的源码来看,SparseArray做了一些优化: 并不是从value数组中删除它,而是将其在value数组中标记为已删除DELETED,这样当存储相同的key的value时,可以直接命中这个下标而避免了GrowingArrayUtils.insertGrowingArrayUtils.growSize这两个耗时的操作。 而被置为DELETED的下标会在随后的合适的时机中执行gc方法时删除DELETED标记的元素并压缩数组,以节省空间。

3.3 removeAtRange(int index, int size)批量删除

    public void removeAtRange(int index, int size) {
        final int end = Math.min(mSize, index + size);
        for (int i = index; i < end; i++) {
            removeAt(i);
        }
    }

3.4 get(int key)按照key查询

    /**
     * Gets the Object mapped from the specified key, or <code>null</code>
     * if no such mapping has been made.
     */
    public E get(int key) {
        return get(key, null);
    }

    /**
     * Gets the Object mapped from the specified key, or the specified Object
     * if no such mapping has been made.
     */
    @SuppressWarnings("unchecked")
    public E get(int key, E valueIfKeyNotFound) {
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

        if (i < 0 || mValues[i] == DELETED) {
            return valueIfKeyNotFound;
        } else {
            return (E) mValues[i];
        }
    }

3.5 keyAt(int index)valueAt(int index)按照下标查询key和value

    /**
     * Given an index in the range <code>0...size()-1</code>, returns
     * the key from the <code>index</code>th key-value mapping that this
     * SparseArray stores.
     *
     * <p>The keys corresponding to indices in ascending order are guaranteed to
     * be in ascending order, e.g., <code>keyAt(0)</code> will return the
     * smallest key and <code>keyAt(size()-1)</code> will return the largest
     * key.</p>
     *
     * <p>For indices outside of the range <code>0...size()-1</code>,
     * the behavior is undefined.</p>
     */
    public int keyAt(int index) {
        if (mGarbage) {
            gc();
        }
        return mKeys[index];
    }

    /**
     * Given an index in the range <code>0...size()-1</code>, returns
     * the value from the <code>index</code>th key-value mapping that this
     * SparseArray stores.
     *
     * <p>The values corresponding to indices in ascending order are guaranteed
     * to be associated with keys in ascending order, e.g.,
     * <code>valueAt(0)</code> will return the value associated with the
     * smallest key and <code>valueAt(size()-1)</code> will return the value
     * associated with the largest key.</p>
     *
     * <p>For indices outside of the range <code>0...size()-1</code>,
     * the behavior is undefined.</p>
     */
    @SuppressWarnings("unchecked")
    public E valueAt(int index) {
        if (mGarbage) {
            gc();
        }
        return (E) mValues[index];
    }

3.6 indexOfKey(int key)indexOfValue(E value)indexOfValueByValue(E value),三个查询下标的方法

    /**
     * Returns the index for which {@link #keyAt} would return the
     * specified key, or a negative number if the specified
     * key is not mapped.
     */
    public int indexOfKey(int key) {
        if (mGarbage) {
            gc();
        }

        return ContainerHelpers.binarySearch(mKeys, mSize, key);
    }

    /**
     * Returns an index for which {@link #valueAt} would return the
     * specified key, or a negative number if no keys map to the
     * specified value.
     * <p>Beware that this is a linear search, unlike lookups by key,
     * and that multiple keys can map to the same value and this will
     * find only one of them.
     * <p>Note also that unlike most collections' {@code indexOf} methods,
     * this method compares values using {@code ==} rather than {@code equals}.
     */
    public int indexOfValue(E value) {
        if (mGarbage) {
            gc();
        }

        for (int i = 0; i < mSize; i++) {
            if (mValues[i] == value) {
                return i;
            }
        }

        return -1;
    }

    /**
     * Returns an index for which {@link #valueAt} would return the
     * specified key, or a negative number if no keys map to the
     * specified value.
     * <p>Beware that this is a linear search, unlike lookups by key,
     * and that multiple keys can map to the same value and this will
     * find only one of them.
     * <p>Note also that this method uses {@code equals} unlike {@code indexOfValue}.
     * @hide
     */
    public int indexOfValueByValue(E value) {
        if (mGarbage) {
            gc();
        }

        for (int i = 0; i < mSize; i++) {
            if (value == null) {
                if (mValues[i] == null) {
                    return i;
                }
            } else {
                if (value.equals(mValues[i])) {
                    return i;
                }
            }
        }
        return -1;
    }

4 类似集合:SparseBooleanArraySparseIntArraySparseLongArrayLongSparseArray

除了SparseArray,Android sdk中,还提供了三个类似思想的集合:

  • SparseBooleanArray, value为boolean
  • SparseIntArray, value为int
  • SparseLongArray, value为long

他们和SparseArray区别除了value的类型不一样之外,阅读源码,还发现有以下不同:

  • 这三个集合没有了自定义的数组压缩方法gc()和删除标记位DELETED,他们在移除元素的时候直接进行了数组的拷贝操作(System.arraycopy),而不像SparseArray那样先把当前下标的value设置为DELETED

我不太明白这三个集合类为哈要舍弃掉自定义的数组压缩方法gc()和删除标记位DELETED,这在SparseArray不是为了避免繁琐的数组拷贝操作而进行的所谓的优化吗?😊真不懂呢,请大神解释下[😊]。

第四个集合LongSparseArraySparseArray的区别只有一点,就是LongSparseArray的key类型为long而SparseArray的key的类型为int,其余都一样的。


参考文献

  1. 面试必备:SparseArray源码解析- zxt0601的博客- CSDN博客
  2. What does the method ArrayUtils.newUnpaddedObjectArray() do ...