进程的基本概念
进程是程序的一次执行,每个进程都有自己的地址空间,内存,数据栈以及其他记录其运行轨迹的辅助数据。多进程就是在一个程序中执行多个任务,可以提高脚本的并行执行能力。当然使用多进程往往是用来处理CPU密集型(科学计算)的需求。
使用fork创建进程
但是fork()调用一次,返回两次,因为操作系统自动把当前进程(称为父进程)复制了一份(称为子进程),然后,分别在父进程和子进程内返回,子进程永远返回0,而父进程返回子进程的ID
import os
# 此方法只在Unix、Linux平台上有效
print('Proccess {} is start'.format(os.getpid()))
subprocess = os.fork()
source_num = 9
if subprocess == 0:
print('I am in child process, my pid is {0}, and my father pid is {1}'.format(os.getpid(), os.getppid()))
source_num = source_num * 2
print('The source_num in ***child*** process is {}'.format(source_num))
else:
print('I am in father proccess, my child process is {}'.format(subprocess))
source_num = source_num ** 2
print('The source_num in ---father--- process is {}'.format(source_num))
print('The source_num is {}'.format(source_num))
Proccess 16600 is start
I am in father proccess, my child process is 19193
The source_num in ---father--- process is 81
The source_num is 81
Proccess 16600 is start
I am in child process, my pid is 19193, and my father pid is 16600
The source_num in ***child*** process is 18
The source_num is 18
很明显,多进程之间的数据并无相互影响
multiprocessing模块
Multiprocessing是一个Python模块,使用与threading模块类似的API产生进程。它通过使用进程代替线程可以为本地和远程并发性的、有效的避开GIL。因此,该multiprocessing模块允许程序员充分利用给定机器上的多个处理器。
创建管理进程模块:
- Process(用于创建进程):通过创建一个Process对象然后调用它的start()方法来生成进程。Process遵循threading.Thread的API。
- Pool(用于创建进程管理池):可以创建一个进程池,该进程将执行与Pool该类一起提交给它的任务,当子进程较多需要管理时使用。
- Queue(用于进程通信,资源共享):进程间通信,保证进程安全。
- Value,Array(用于进程通信,资源共享):
- Pipe(用于管道通信):管道操作。
- Manager(用于资源共享):创建进程间共享的数据,包括在不同机器上运行的进程之间的网络共享。
同步子进程模块:
- Condition
- Event:用来实现进程间同步通信。
- Lock:当多个进程需要访问共享资源的时候,Lock可以用来避免访问的冲突。
- RLock
- Semaphore:用来控制对共享资源的访问数量,例如池的最大连接数。
1.Process
创建进程的类:Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):
group永远为0
target表示run()方法要调用的对象
name为别名
args表示调用对象的位置参数元组
kwargs表示调用对象的字典
deamon设置守护进程
方法:
run():表示进程活动的方法
start():开始进程
join():表示阻塞当前进程,待调用join()的进程结束后,再开始当前方法
name:进程的名字
is_alive():返回进程是否活着(与death状态相反)
deamon:守护进程的标识
pid:返回进程ID
teminate():终结进程,强制终结
创建单个进程
import os
from multiprocessing import Process
def hello_pro(name):
print('I am in process {0}, It\'s PID is {1}' .format(name, os.getpid()))
if __name__ == '__main__':
print('Parent Process PID is {}'.format(os.getpid()))
p = Process(target=hello_pro, args=('test',), name='test_proc')
# 开始进程
p.start()
print('Process\'s ID is {}'.format(p.pid))
print('The Process is alive? {}'.format(p.is_alive()))
print('Process\' name is {}'.format(p.name))
# join方法表示阻塞当前进程,待p代表的进程执行完后,再执行当前进程
p.join()
Parent Process PID is 16600
I am in process test, It's PID is 19925
Process's ID is 19925
The Process is alive? True
Process' name is test_proc
创建多个进程
import os
from multiprocessing import Process, current_process
def doubler(number):
"""
A doubling function that can be used by a process
"""
result = number * 2
proc_name = current_process().name
print('{0} doubled to {1} by: {2}'.format(
number, result, proc_name))
if __name__ == '__main__':
numbers = [5, 10, 15, 20, 25]
procs = []
proc = Process(target=doubler, args=(5,))
for index, number in enumerate(numbers):
proc = Process(target=doubler, args=(number,))
procs.append(proc)
proc.start()
proc = Process(target=doubler, name='Test', args=(2,))
proc.start()
procs.append(proc)
for proc in procs:
proc.join()
5 doubled to 10 by: Process-8
20 doubled to 40 by: Process-11
10 doubled to 20 by: Process-9
15 doubled to 30 by: Process-10
25 doubled to 50 by: Process-12
2 doubled to 4 by: Test
将进程创建为类
import os
import time
from multiprocessing import Process
class DoublerProcess(Process):
def __init__(self, numbers):
Process.__init__(self)
self.numbers = numbers
# 重写run()函数
def run(self):
for number in self.numbers:
result = number * 2
proc_name = current_process().name
print('{0} doubled to {1} by: {2}'.format(number, result, proc_name))
if __name__ == '__main__':
dp = DoublerProcess([5, 20, 10, 15, 25])
dp.start()
dp.join()
5 doubled to 10 by: DoublerProcess-16
20 doubled to 40 by: DoublerProcess-16
10 doubled to 20 by: DoublerProcess-16
15 doubled to 30 by: DoublerProcess-16
25 doubled to 50 by: DoublerProcess-16
2.Lock
代码来自Python多进程编程
import multiprocessing
import sys
def worker_with(lock, f):
# lock支持上下文协议,可以使用with语句
with lock:
fs = open(f, 'a+')
n = 10
while n > 1:
print('Lockd acquired via with')
fs.write("Lockd acquired via with\n")
n -= 1
fs.close()
def worker_no_with(lock, f):
# 获取lock
lock.acquire()
try:
fs = open(f, 'a+')
n = 10
while n > 1:
print('Lock acquired directly')
fs.write("Lock acquired directly\n")
n -= 1
fs.close()
finally:
# 释放Lock
lock.release()
if __name__ == "__main__":
lock = multiprocessing.Lock()
f = "file.txt"
w = multiprocessing.Process(target = worker_with, args=(lock, f))
nw = multiprocessing.Process(target = worker_no_with, args=(lock, f))
w.start()
nw.start()
w.join()
nw.join()
print('END!')
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lockd acquired via with
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
Lock acquired directly
END!
3.Pool
Pool可以提供指定数量的进程,供用户调用,当有新的请求提交到pool中时,如果池还没有满,那么就会创建一个新的进程用来执行该请求;但如果池中的进程数已经达到规定最大值,那么该请求就会等待,直到池中有进程结束,才会创建新的进程来它
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 创建4个进程
p = Pool(4)
for i in range(5):
msg = 'Process {}'.format(i)
# 将函数和参数传入进程
p.apply_async(f, (msg, ))
# 禁止增加新的进程
p.close()
# 阻塞当前进程
p.join()
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 2, PID: 8332, Time: Fri Sep 1 08:53:12 2017
Starting: Process 1, PID: 8331, Time: Fri Sep 1 08:53:12 2017
Starting: Process 0, PID: 8330, Time: Fri Sep 1 08:53:12 2017
Starting: Process 3, PID: 8333, Time: Fri Sep 1 08:53:12 2017
Ending: Process 2, PID: 8332, Time: Fri Sep 1 08:53:15 2017
Ending: Process 3, PID: 8333, Time: Fri Sep 1 08:53:15 2017
Starting: Process 4, PID: 8332, Time: Fri Sep 1 08:53:15 2017
Ending: Process 1, PID: 8331, Time: Fri Sep 1 08:53:15 2017
Ending: Process 0, PID: 8330, Time: Fri Sep 1 08:53:15 2017
Ending: Process 4, PID: 8332, Time: Fri Sep 1 08:53:18 2017
All Done!!!
本机为4个CPU,所以前0-3号进程直接同时执行,4号进程等待,带0-3号中有进程执行完毕后,4号进程开始执行。而当前进程执行完毕后,再执行当前进程,打印“All Done!!!”。方法apply_async()是非阻塞式的,而方法apply()则是阻塞式的。
将apply_async()替换为apply()方法
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 创建4个进程
p = Pool(4)
for i in range(5):
msg = 'Process {}'.format(i)
# 将apply_async()方法替换为apply()方法
p.apply(f, (msg, ))
# 禁止增加新的进程
p.close()
# 阻塞当前进程
p.join()
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 0, PID: 8281, Time: Fri Sep 1 08:51:18 2017
Ending: Process 0, PID: 8281, Time: Fri Sep 1 08:51:21 2017
Starting: Process 1, PID: 8282, Time: Fri Sep 1 08:51:21 2017
Ending: Process 1, PID: 8282, Time: Fri Sep 1 08:51:24 2017
Starting: Process 2, PID: 8283, Time: Fri Sep 1 08:51:24 2017
Ending: Process 2, PID: 8283, Time: Fri Sep 1 08:51:27 2017
Starting: Process 3, PID: 8284, Time: Fri Sep 1 08:51:27 2017
Ending: Process 3, PID: 8284, Time: Fri Sep 1 08:51:30 2017
Starting: Process 4, PID: 8281, Time: Fri Sep 1 08:51:30 2017
Ending: Process 4, PID: 8281, Time: Fri Sep 1 08:51:33 2017
All Done!!!
可以看到阻塞式的在一个接一个执行,待上一个执行完毕后才执行下一个。
使用get方法获取结果
import time
import os
from multiprocessing import Pool, cpu_count
def f(msg):
print('Starting: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
time.sleep(3)
print('Ending: {}, PID: {}, Time: {}'.format(msg, os.getpid(), time.ctime()))
return 'Done {}'.format(msg)
if __name__ == '__main__':
print('Starting Main Function')
print('This Computer has {} CPU'.format(cpu_count()))
# 创建4个进程
p = Pool(4)
results = []
for i in range(5):
msg = 'Process {}'.format(i)
results.append(p.apply_async(f, (msg, )))
# 禁止增加新的进程
p.close()
# 阻塞当前进程
p.join()
for result in results:
print(result.get())
print('All Done!!!')
Starting Main Function
This Computer has 4 CPU
Starting: Process 0, PID: 8526, Time: Fri Sep 1 09:00:04 2017
Starting: Process 1, PID: 8527, Time: Fri Sep 1 09:00:04 2017
Starting: Process 2, PID: 8528, Time: Fri Sep 1 09:00:04 2017
Starting: Process 3, PID: 8529, Time: Fri Sep 1 09:00:04 2017
Ending: Process 1, PID: 8527, Time: Fri Sep 1 09:00:07 2017
Starting: Process 4, PID: 8527, Time: Fri Sep 1 09:00:07 2017
Ending: Process 3, PID: 8529, Time: Fri Sep 1 09:00:07 2017
Ending: Process 0, PID: 8526, Time: Fri Sep 1 09:00:07 2017
Ending: Process 2, PID: 8528, Time: Fri Sep 1 09:00:07 2017
Ending: Process 4, PID: 8527, Time: Fri Sep 1 09:00:10 2017
Done Process 0
Done Process 1
Done Process 2
Done Process 3
Done Process 4
All Done!!!
4.Queue
Queue是多进程安全的队列,可以使用Queue实现多进程之间的数据传递。
put方法用以插入数据到队列中,put方法还有两个可选参数:blocked和timeout。如果blocked为True(默认值),并且timeout为正值,该方法会阻塞timeout指定的时间,直到该队列有剩余的空间。如果超时,会抛出Queue.Full异常。如果blocked为False,但该Queue已满,会立即抛出Queue.Full异常。
get方法可以从队列读取并且删除一个元素。同样,get方法有两个可选参数:blocked和timeout。如果blocked为True(默认值),并且timeout为正值,那么在等待时间内没有取到任何元素,会抛出Queue.Empty异常。如果blocked为False,有两种情况存在,如果Queue有一个值可用,则立即返回该值,否则,如果队列为空,则立即抛出Queue.Empty异常
import os
import time
from multiprocessing import Queue, Process
def write_queue(q):
for i in ['first', 'two', 'three', 'four', 'five']:
print('Write "{}" to Queue'.format(i))
q.put(i)
time.sleep(3)
print('Write Done!')
def read_queue(q):
print('Start to read!')
while True:
data = q.get()
print('Read "{}" from Queue!'.format(data))
if __name__ == '__main__':
q = Queue()
wq = Process(target=write_queue, args=(q,))
rq = Process(target=read_queue, args=(q,))
wq.start()
rq.start()
# #这个表示是否阻塞方式启动进程,如果要立即读取的话,两个进程的启动就应该是非阻塞式的,
# 所以wq在start后不能立即使用wq.join(), 要等rq.start后方可
wq.join()
# 服务进程,强制停止,因为read_queue进程李是死循环
rq.terminate()
Write "first" to Queue
Start to read!
Read "first" from Queue!
Write "two" to Queue
Read "two" from Queue!
Write "three" to Queue
Read "three" from Queue!
Write "four" to Queue
Read "four" from Queue!
Write "five" to Queue
Read "five" from Queue!
Write Done!
5.Pipe
Pipe方法返回(conn1, conn2)代表一个管道的两个端。
Pipe方法有duplex参数,如果duplex参数为True(默认值),那么这个管道是全双工模式,也就是说conn1和conn2均可收发。duplex为False,conn1只负责接受消息,conn2只负责发送消息。
send和recv方法分别是发送和接受消息的方法。例如,在全双工模式下,可以调用conn1.send发送消息,conn1.recv接收消息。如果没有消息可接收,recv方法会一直阻塞。如果管道已经被关闭,那么recv方法会抛出EOFError。
可参考使用pipe管道使python fork多进程之间通信
import os, time, sys
from multiprocessing import Pipe, Process
def send_pipe(p):
for i in ['first', 'two', 'three', 'four', 'five']:
print('Send "{}" to Pipe'.format(i))
p.send(i)
time.sleep(3)
print('Send Done!')
def receive_pipe(p):
print('Start to receive!')
while True:
data = p.recv()
print('Read "{}" from Pipe!'.format(data))
if __name__ == '__main__':
sp_pipe, rp_pipe = Pipe()
sp = Process(target=send_pipe, args=(sp_pipe,))
rp = Process(target=receive_pipe, args=(rp_pipe,))
sp.start()
rp.start()
wq.join()
rq.terminate()
Start to receive!
Send "first" to Pipe
Read "first" from Pipe!
Send "two" to Pipe
Read "two" from Pipe!
Send "three" to Pipe
Read "three" from Pipe!
Send "four" to Pipe
Read "four" from Pipe!
Send "five" to Pipe
Read "five" from Pipe!
Send Done!
6.Semaphore
Semaphore用来控制对共享资源的访问数量,例如池的最大连接数
import multiprocessing
import time
def worker(s, i):
s.acquire()
print(multiprocessing.current_process().name + "acquire");
time.sleep(i)
print(multiprocessing.current_process().name + "release\n");
s.release()
if __name__ == "__main__":
s = multiprocessing.Semaphore(3)
for i in range(5):
p = multiprocessing.Process(target = worker, args=(s, i*2))
p.start()
Process-170acquire
Process-168acquire
Process-168release
Process-169acquire
Process-171acquire
Process-169release
Process-172acquire
Process-170release
Process-171release
Process-172release