use redis(つづく)

bitmap

e.g. Manifest "State" ("PV": bitmap incrby)

127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> bitfield w get u4 0
1) (integer) 6
127.0.0.1:6379> bitfield w get u3 2
1) (integer) 5
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> bitfield w get i3 2
1) (integer) -3
127.0.0.1:6379> 
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 11
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> bitfield w incrby u4 2 
(error) ERR syntax error
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 12
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 13
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 14
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 15
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 0
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 1
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 2
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 3

hyperloglog

e.g. "UV"
Std. dev: 0.81%
set -> 2^14 buckets, each 6 bits: 12K

127.0.0.1:6379> 
127.0.0.1:6379> pfadd userview user1
(integer) 1
127.0.0.1:6379> pfcount userview
(integer) 1
127.0.0.1:6379> 
127.0.0.1:6379> pfadd userview user2 user3 user4
(integer) 1
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> pfcount userview
(integer) 4
127.0.0.1:6379> 
127.0.0.1:6379> pfmerge uv userview
OK
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> pfcount uv
(integer) 6
127.0.0.1:6379> pfcount userview
(integer) 4
127.0.0.1:6379> 
127.0.0.1:6379>

py/java -v: 1. k := #lowZeros; N= #bit in random number; => N = 2^k.

bloom filter

v >= Redis 4.0

e.g.

  • Recommendation de-dup
  • Crawler de-dup
  • NoSql e.g. HBase, Cassandra, LevelDB, RocksDB
  • Junk Mail
> docker pull redislabs/rebloom # image
> docker run -p6379:6379 redislabs/rebloom # run container
> redis-cli # 
127.0.0.1:6379> bf.add codehole user1 
(integer) 1
127.0.0.1:6379> bf.add codehole user2 
(integer) 1
127.0.0.1:6379> bf.add codehole user3 
(integer) 1
127.0.0.1:6379> bf.exists codehole user1 
(integer) 1
127.0.0.1:6379> bf.exists codehole user2 
(integer) 1
127.0.0.1:6379> bf.exists codehole user3 
(integer) 1
127.0.0.1:6379> bf.madd codehole user4 user5 user6
1) (integer) 1
2) (integer) 1
3) (integer) 1
127.0.0.1:6379> bf.mexists codehole user4 user5 user6 user7 1) (integer) 1
2) (integer) 1
3) (integer) 1
4) (integer) 0

parameters:

  • no. elements: N
  • false positive rate f

return:

  • bit array size: m
  • optimal no. hash f: k

then:

  • k = 0.7 * (m/N)
  • f = 0.6185^(m/N)

calculator: https://krisives.github.io/bloom-calculator/

If no. elements exceeds no. estimation: (magnify rate: t); limit approximate:

f = (1 - 0.5^t) ^ k

Choose k w.r.t. f = 10% / 1% / 0.1%:

curve

  • f = 10%, t = 2: => f = 40%
  • f = 1%, t = 2: => f = 15%
  • f = 0.1%, t = 2: => f = 5%

rate limiter(sliding window) ✅

e.g. UGC behavior; flow limiter

  • Goal:Limit a limit of N times * actions in given period
  • parameters desired: user_id, action_key, period, max_count
  • interface:
def is_action_allowed(user_id, action_key, period, max_count):
    return True
# invoke: reply a top of 5 times in a second
can_reply = is_action_allowed("Lei", "reply", 60, 5) 
if can_reply:
    do_reply() 
else:
    raise ActionThresholdOverflow()

3 subjects: key(action), val(record), & sliding time window! => zset (other data structures: only kv); i.e. ⚠️score to record given period

def is_action_allowed(user_id, action_key, period, max_count): 
    key = 'hist:%s:%s' % (user_id, action_key)
    now_ts = int(time.time() * 1000) # timestamp
    with client.pipeline() as pipe:
        pipe.zadd(key, now_ts, now_ts) # value, score: timestamp 
        # remove all before time window: left inside window
        pipe.zremrangebyscore(key, 0, now_ts - period * 1000)
        pipe.zcard(key)
        # set expire := period, extended time: 1 second         
        pipe.expire(key, period + 1)
        # batch execute
        _, _, current_count, _ = pipe.execute()
    # within
    return current_count <= max_count

Con: record each and every actions during period: would cost too much memory if no. actions limit are too large, say, limit 100w requests/60s. Use a funnel limiter instead.

rate limiter(funnel)

TODO

geohash

Essence: 2d mapping to 1d; store in a zset. Stored value: element's key, score: GeoHash value. Nearest: topK scores ranking.

zset, i.e. SortedSet + HashMap
A skiplist. binary search features implemented on linkedlist. Manifest: L0, L1, L2... layers

(compare with list: essence: linkedlist + ziplist = quicklist)

127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> geoadd company 116.489033 40.007669 meituan
(integer) 1
127.0.0.1:6379> geoadd company 116.562108 39.787602 jd 116.334255 40.027400 xiaomi
(integer) 2
127.0.0.1:6379> geoadd company 116.48105 39.996794 juejin
(integer) 1
127.0.0.1:6379> geodist company meituan juejin km
"1.3878"
127.0.0.1:6379> geodist company jd meituan km
"25.2590"
127.0.0.1:6379> geodist company jd xiaomi km
"33.0047"
127.0.0.1:6379> geodist company meituan xiaomi km
"13.3659"
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> geopos company meituan
1) 1) "116.48903220891952515"
   2) "40.00766997707732031"
127.0.0.1:6379> 
127.0.0.1:6379> geohash company jd
1) "wx4fk7jgtf0"
127.0.0.1:6379> geohash company meituan
1) "wx4gdg0tx40"
127.0.0.1:6379> 
127.0.0.1:6379> georadiusbymember company meituan 20 km count 3 asc
1) "meituan"
2) "juejin"
3) "xiaomi"
127.0.0.1:6379> georadiusbymember company meituan 20 km withcoord withdist withhash count 3 asc
1) 1) "meituan"
   2) "0.0000"
   3) (integer) 4069887179083478
   4) 1) "116.48903220891952515"
      2) "40.00766997707732031"
2) 1) "juejin"
   2) "1.3878"
   3) (integer) 4069887154388167
   4) 1) "116.48104995489120483"
      2) "39.99679348858259686"
3) 1) "xiaomi"
   2) "13.3659"
   3) (integer) 4069880904286516
   4) 1) "116.33425265550613403"
      2) "40.02740024658161389"
127.0.0.1:6379> 
127.0.0.1:6379> 
127.0.0.1:6379> georadius company 116.514202 39.905409 20 km withdist count 3 asc
1) 1) "juejin"
   2) "10.5501"
2) 1) "meituan"
   2) "11.5748"
3) 1) "jd"
   2) "13.7269"
127.0.0.1:6379> 
127.0.0.1:6379> 

geoadd, geodist, geopos, geohash

georadiusbymember: of a member/an entity
georadius: force a (lng, lat)