4. Median of Two Sorted Arrays

Description:

There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Example 1:

nums1 = [1, 3]
nums2 = [2]
The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]
The median is (2 + 3)/2 = 2.5

Solution:

What is a median?

"The middle number in a given sequence of numbers, taken as the average of the two middle numbers when the sequence has an even number of numbers:
4 is the median of 1, 3, 4, 8, 9." (found in dictionary.com)
If a sequence is 1, 3, 4, 5, 8, 9, it's median is (4+5)/2=4.5.

So we must sort the sequence if it is unordered to find the median.

First let's see how we can find the median of one array and then we deal with two arrays.

Find the median of one sorted array

Let's say that "even case" is that the length of the array is even and "odd case" is that the sum length of the array is odd.
In even case, the median is the average value of the middle 2 elements and in odd case, it is just the value of the middle element.

  • even case
    Since the array is sorted, we just need to find the middle 2 elements. Preparing for the 2 arrays problem, we can split the array into 2 equal length parts. This way, all the elements in the left part are no greater than right. For example, if array x has 6 elements, it can be split as:
                 left       right
x    x[0] x[1] x[2]       x[3] x[4] x[5]

Then the median is (x[2] + x[3])/2. So in even case, if an array's length is n, the median is (x[n/2]+x[n/2+1])/2.
Meanwhile, x[n/2] is the first element of the right subset, and x[n/2-1] is the last element of the left subset.

  • odd case
    In odd case, the array cannot be split into 2 length-equal subsets. Say that array x has 7 elements, it can be split as:
                 left       right
x    x[0] x[1] x[2]       x[3] x[4] x[5] x[6]

But all the elements in the left part is still no greater than right. In this case, the median is x[3]. And clearly we know x[3] is the first element of the right subset.
However, we can just use x[3] = x[7/2] to get the median. So in odd case, if an array's length is n, the median is x[n/2].
Meanwhile, x[n/2] is the first element of the right subset and x[n/2-1] is the last element of the left subset.

Find the median of 2 sorted array

Let's say the "even case" is that the sum length of the arrays is even and the same meaning with the "odd case".
From the above we know that if we can split an array into 2 subsets that is equal or "nearly equal", the median can be easily gotten. In the same way, we can split each array into 2 subsets.
We assume the arrays are x and y. After split, x and y are as follows:

               left   right
x    x[0]... x[i-1]   x[i]... x[n-1]
y    y[0]... y[j-1]   y[j]... y[m-1]

To find the median, there are 2 conditions to be met.

  1. the sum length of the left is equal or "nearly equal" to the right.
  2. all the elements in the left subset are no greater than the right

This way we can get the median by comparing or calculating with x[i-1], x[i], y[j-1], y[j]. We'll talk about it later.

To meet the first condition, we calculate the sum length of the subsets is i+j, the right n-i+m-j. If left length is equal to right, then i+j = n-i+m-j.
So i can be expressed as

i = (m+n)/2 - j

But how do we split the 2 arrays. At first we just let

lo = 0;
hi = m;
j = (lo + hi)/2 = m/2;

and then we can get i through that equation. One more thing to note is that n >= m is a must. This is because:

i = (m+n)/2 - j = (m+n)/2 - m/2 = (n-m)/2

If n < m, i will be negative. This will cause ArrayIndexOutOfBoundaryException. To deal with this issue, we can just change x and y and call the function again if n < m. That is :

if (n < m) return f(y, x);

To meet the second condition, we let xl = x[i-1], xh = [xi], yl = y[j-1], yh = y[j]. Since xl < xh and yl < yh are definitely correct. We just need
xl < yh and yl < xh. If these two cannot be met, we need to use binary search to adjust the search range. We woulddiscuss these cases in 3 parts:

  1. if xl < yh and yl < xh
    This means the median is found. To find the median, however, the method of the even case is a little different from the odd case.
  • even case
    We have to find the 2 middle elements and calculate the average. That is, we have to find the maximum element in the left subset and the minimum elements in the right subset.
    As the arrays are sorted and split, this can be down by following steps:
leftMax = max(xl, yl);
minRight = min(xh, yh);
median = (leftMax + minRight)/2;
  • odd case
    As discussed in the one sorted array case, the middle element is the median, which is the same in the two sorted arrays. Now, the key becomes how can we know which element is the middle element? Can we make sure that the middle element is the one in [xl, yl, xh, yr]? More discussion is needed.
    Calling that at first j = m/2 and i = (m+n)/2 - j, we can use two example to make it simple.
    • x's length is odd and y's length is even
      That is n is odd and m is even. So m%2 = 0 and y can be perfectly split into 2 length-equal subset. However, i = (m+n)/2 - j = (m+n)/2 - m/2 = n/2. Since n is odd, so n/2 actually means (n-1)/2, which would lead the right subset of y is longer than the left (for example, n = 5, then n/2 = (n-1)/2 = 2/2 = 2, so the left subset is [x[0], x[1]] and the right is [x[2], x[3], x[4]]). The split result can be shown as following:
                                    left    right
x   x[0] ... x[(n-1)/2-1](x[i-1])    (x[i])x[(n-1)/2] ... x[n-1]
y   y[0] ... y[m/2-1](y[j-1])        (y[j])y[m/2] ... y[m-1]
           \_________________________/      \________________________/
             left_length = (m+n-1)/2         right_length = (m+n+1)/2

Then right_length - left_length = 1. As discussed above, the middle element is the minimum element of the right subset. So,

median = min(xh, yh)
- `x`'s length is odd and `y`'s length is even

I'll skip because the result is also

median = min(xh, yh)

You can do it yourself.

  1. if xl > yh
    It means that y[j] is too small, we need to split y in a way which make s the right subset shorter and left subset longer so that y[j] can be big enough, in the same way i will decrease and x[i-1] will be smaller. How can we do that?
    We can use binary search method. Since j = (lo + hi)/2, let lo = j+1, then j would be bigger and y[j] is too.

  2. if yl > xh
    It means that y[j-1] is too big, in contrast we set hi = j-1 and calculate j again to make y[j-1] smaller.

By updating lo and hi and then checking if xl < yh and yl < xh are met. We can finally find the median.

More discussion: boundary condition

From the above we know j and i can be change if some condition can not be met. Meanwhile, xl = x[i-1], xh = x[i], yl = y[j-1], yh = y[j]. Here comes the problem: what if i = 0 or n, j = 0 or m ? This would make x[-1], x[n], y[-1], y[m] senseless.
A little trick could handle this. We just need to extend both x and y, which means x[-1] = y[-1] = Integer.MIN_VALUE and x[n] = y[m] = Integer.MAX_VALUE. The sample code is:

xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
xh = (i == n ? Integer.MAX_VALUE : x[i]);
yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
yh = (j == m ? Integer.MAX_VALUE : y[j]);

At last, an accepted code is posted:

public class Solution {
    public double findMedianSortedArrays(int[] x, int[] y) {
        int n = x.length, m = y.length;

        // must note that 'return' is needed or the programme 
        // would continue if n < m and this will cause the 
        // programme to fail
        if (n < m) return findMedianSortedArrays(y, x);
        
        // set the initial lo and hi
        int lo = 0, hi = m;
        while (lo <= hi) {
            int j = (lo+hi)/2, i = (m+n)/2-j; // i+j == n-i+m-j
            
            // note that how to deal with the boundary condition
            int xl = (i == 0 ? Integer.MIN_VALUE : x[i-1]);
            int xh = (i == n ? Integer.MAX_VALUE : x[i]);
            int yl = (j == 0 ? Integer.MIN_VALUE : y[j-1]);
            int yh = (j == m ? Integer.MAX_VALUE : y[j]);
            
            if (xl > yh) lo = j+1; // y[j] too small, j need to be greater 
            else if (yl > xh) hi = j-1; // y[j-1] too big, j need to be smaller
            else {
                if ((m+n)%2 == 1) return xh < yh ? xh : yh; // odd case
                int leftMax = xl > yl ? xl : yl;
                int rightMin = xh < yh ? xh : yh;
                return (double)(leftMax + rightMin)/2; // even case
            }
        }
        return -1; // actually never execute
    }
}

推荐阅读更多精彩内容

  • 8月的第一天。 当它的内脏坏了之后 我只好一点点掏出来,给它换新血 接着缝合,可是它却睁不开眼睛了 我想大概是我第...
    空持罗带阅读 847评论 7 3
  • 1 拒绝真的是一门学问,特别是有时候遇到脑筋比较直的人,非要把话说到白了,他们才明白。 这样一来,让大家都尴尬了。...
    喵姬阅读 686评论 1 5
  • 文/韩乾昌 无论你年纪多大、走到哪里,只要还有个妈,就算全世界背叛了你、与你作对,总有个人向着你,那就是妈。 小时...
    温柔剑客路人甲阅读 439评论 11 13
  • 梧桐妹和李先生大老远过来看我,提了好大一堆水果,有些吓到我了,天,这是要让我去卖水果麽?我得吃多久啊!突...
    七七行记阅读 157评论 0 0