Python多进程开发中使用Manager进行数据共享的陷阱

使用Manager可以方便的进行多进程数据共享,但当使用Manager处理list、dict等可变数据类型时,需要非常注意一个陷阱。看下面的代码:

from multiprocessing import Process, Manager

manager = Manager()
m = manager.list()
m.append({'id':1})

def test():
    m[0]['id'] = 2

p = Process(target=test)
p.start()
p.join()
print(m[0])

执行结果是:

{'id': 1}

不是预期的:{'id': 2}
要达到预期的结果,代码应改为:

from multiprocessing import Process, Manager

manager = Manager()
m = manager.list()
m.append({'id':1})

def test():
    hack = m[0]
    hack['id'] = 2
    m[0] = hack

p = Process(target=test)
p.start()
p.join()
print(m[0])

以上代码中让人困惑的操作的目的是绕过Manager的一个隐秘问题,这个问题是指:Manager对象无法监测到它引用的可变对象值的修改,需要通过触发__setitem__方法来让它获得通知

代码中m[0] = hack这行代码就是用来故意触发proxy对象的__setitem__方法的,关于这个问题Python官方文档解释如下:

If standard (non-proxy) list or dict objects are contained in a referent, modifications to those mutable values will not be propagated through the manager because the proxy has no way of knowing when the values contained within are modified. However, storing a value in a container proxy (which triggers a__setitem__on the proxy object) does propagate through the manager and so to effectively modify such an item, one could re-assign the modified value to the container proxy.

详情请参考:

Python官方文档

推荐阅读更多精彩内容