Kth Quantiles. Θ(nlgk)

The kth quantiles of an n-element set are the k-1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(nlgk)-time algorithm to list the kth quantiles of a set.

I did a lot of jobs to figure out what the hell the question is actually talking about. In fact, there would have been k groups fragments in an n-size set/list which are divided by k-1 order statistics named quantiles after evaluating one's algorithm. At first, the set/list is unsorted and it can be hardly unsorted after execution.

So hereafter I make up 2 functions - k_quantiles_sorted_n(which costs Θ(n)) & k_quantiles_sorted_lgk(which takes Θ(lgk)) - to solve the already-sorted-array problems. In these cases, I just work out an array of indices which are on behave of the positions of dividers.

In the end, I take out the original problem with function k_quantiles_unsorted_nlgk and make it by recursion which consumes Θ(nlgk). As you'll see below, the gross problem is separated into 2 subroutines, each of n(1-1/k)/2 to n/2 scale.

This works homogeneous with the same size of group when the number of elements (aka n) is ak+k−1 for a positive integer a. Once it were a different number, the interval among groups would be heterogeneous. some additive care with rounding needs to be taken in order to avoid creating two segments that differ by more than 1.

The inherent consumption takes place in the select function which takes up to Θ(n) time. The base case is that the subproblem is divided into a scale of T(n/k) with a item of Θ(1) so that you could immediately return. The recursion here is

T(n) <= 2T(n/2) + Θ(n) <= 4T(n/4) + 2Θ(n) <= 8T(n/8) + 3Θ(n) ...
<= kT(n/k) + lgkΘ(n) = Θ(nlgk)

It would be nice to illustrate by a binary recursion tree with a depth of lgk and leaves of T(n/k). As we could see, the last subproblem scale determines the log part in final Θ(nlgk), which reflected by the depth of the recursion tree. Furthermore, we could find that if the last problem scale is Θ(1), that is n/n, then the consumption goes up to Θ(nlgn), which means the groups count will be the last divisor. The less groups we divide at last, the less log part we spend.

i.e. Note that the slice of array in python is returning a new array(aka slice for copy) which doesn't correspond to Swift. I've spent a lot of time getting rid of the startIndex & endIndex parameters using slice of array. But it all went in vain as a result since we cannot regard the slice as a reference of original array partially. Probably it'd take effect with memoryview or something which is quite annoying.

As we used to say, unit tests run first.
Test Driven Code:

import unittest

class TestCase(unittest.TestCase):

    def setUp(self):
        self.maxDiff = None

    def test_k_quantiles_sorted(self):
        for j in range(0, 100):
            n = randint(10, 20)
            k = randint(0, n+1)
            # l = k_quantiles_sorted_n(n, k, 0)
            l = k_quantiles_sorted_lgk(n, k)
            count = len(l)
            self.assertGreaterEqual(count, 0)
            if count == 0:
                continue
            sentinel_1 = l[0]
            sentinel_2 = -1
            self.assertLess(sentinel_1, n) 
            for i in range(1, count+1):
                scope = 0
                if i == count:
                    scope = n - l[i-1] - 1
                else:
                    self.assertLess(l[i], n) 
                    scope = l[i] - l[i-1] - 1
                difference = abs(sentinel_1 - scope)
                self.assertLessEqual(difference, 1)
                if difference == 0:
                    continue
                else:
                    if sentinel_2 < 0:
                        sentinel_2 = difference
                    else:
                        self.assertEqual(sentinel_2, difference)

    def test_k_quantiles_unsorted_nlgk(self):
        for j in range(0, 100):
            n = randint(50, 100)
            k = randint(0, n+1)
            l = list(range(0, n))
            quantiles = k_quantiles_unsorted_nlgk(l, 0, len(l)-1, k)
            count = len(quantiles)
            self.assertGreaterEqual(count, 0)
            if count == 0:
                continue
            dividor = k - 1
            remain = n - dividor 
            base = remain // k
            additive = base + 1
            additiveGroup = remain % k
            baseGroup = k - additiveGroup
            current = -1
            quantiles.append(n)
            for i in range(0, len(quantiles)):
                scope = quantiles[i]-current-1
                current = quantiles[i]
                self.assertTrue(scope==base or scope==additive)

                if scope == base:
                    baseGroup -= 1
                elif scope == additive:
                    additiveGroup -= 1
            self.assertTrue(baseGroup==0 and additiveGroup==0)


if __name__ == '__main__':
    unittest.main()

Then it goes with source code:

def k_quantiles_sorted_n(n:int, k:int, base:int) -> []:
    if k <= 1 or k > n+1:
        return[]
    remainder = k % 2
    if remainder:
        midst = n // k
        remains = n-midst-2
        n_left = remains // 2
        n_right = remains - n_left
        k_child = (k-1) // 2
        left = base + n_left
        right = base + n - n_right - 1
        return k_quantiles_sorted_n(n_left, k_child, base) + \
        [left, right] + k_quantiles_sorted_n(n_right, k_child, right+1)
    else:
        k_child = k // 2
        n_left = n // 2
        n_right = n - n_left - 1
        halver = base + n_left
        return k_quantiles_sorted_n(n_left, k_child, base) + \
        [halver] + k_quantiles_sorted_n(n_right, k_child, halver+1)

def k_quantiles_sorted_lgk(n:int, k:int) -> []:
    if k < 2 or k > n+1:
        return[]
    dividor = k - 1
    remain = n - dividor 
    base = remain // k
    additiveGroup = remain % k
    additiveDividor = additiveGroup - 1
    baseGroup = k - additiveGroup
    baseDividor = baseGroup - 1
    if additiveGroup > 0:
        baseDividor += 1
    quantiles = []
    current = -1
    for i in range(0, baseDividor):
        current += base+1
        quantiles.append(current)
    for j in range(0, additiveDividor):
        current += base+2
        quantiles.append(current)
    return quantiles

def k_quantiles_unsorted_nlgk(array:[], startIndex:int, endIndex:int, k:int) -> []:
    n = endIndex - startIndex + 1
    if k < 2 or k > n+1:
        return[]
    k_child = k // 2
    remainder = k % 2
    if remainder:
        midstCount = n // k
        remainCount = n - midstCount - 2
        leftCount = remainCount // 2
        rightCount = remainCount - leftCount
        leftIndex = startIndex + leftCount
        rightIndex = leftIndex + midstCount + 1
        localLeftIndex = leftCount
        localRightIndex = midstCount
        left = select(array, startIndex, endIndex, localLeftIndex)
        right = select(array, leftIndex+1, endIndex, localRightIndex)
        return k_quantiles_unsorted_nlgk(array, startIndex, leftIndex-1, k_child) + \
        [left, right] + k_quantiles_unsorted_nlgk(array, rightIndex+1, endIndex, k_child)
    else:
        leftCount = n // 2
        halverIndex = startIndex + leftCount
        localHalverIndex = leftCount
        halver = select(array, startIndex, endIndex, localHalverIndex)
        return k_quantiles_unsorted_nlgk(array, startIndex, halverIndex-1, k_child) + \
        [halver] + k_quantiles_unsorted_nlgk(array, halverIndex+1, endIndex, k_child)

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 5,459评论 0 10
  • 看完了《钢之炼金术师》,还是会被里面人物的品质、人与人之间的感情、信任而感动。 最想记录的是第二代贪婪说的“我想要...
    樱苔阅读 215评论 0 1
  • 工作外的半小时,决定你的人生高度,最近通过简书认识了糖果,她利用工作外8小时,坚持写简书,从她的简书中了解,她离异...
    studay阅读 494评论 3 10
  • 儿时的端午是那样的温馨有趣又让人难忘,而今的端午却让人感到荡然犹存,是岁月的原故还是成长经历所致呢?总之是中国传统...
    耕耘者小牛阅读 145评论 0 2
  • 1,缺啥补啥,怕啥练啥; 2,一切为我所用,所用为团队家; 3,我想变,我要变,我不得不变,我会变得越来越好。 今...
    房傲东阅读 41评论 0 0