最近在写MIT 6.824课程,在Lab1 的 partIII这里,遇到一个关于channel的bug,特此记录
先上代码
package mapreduce
import
(
"fmt"
"sync"
)
//
// schedule() starts and waits for all tasks in the given phase (mapPhase
// or reducePhase). the mapFiles argument holds the names of the files that
// are the inputs to the map phase, one per map task. nReduce is the
// number of reduce tasks. the registerChan argument yields a stream
// of registered workers; each item is the worker's RPC address,
// suitable for passing to call(). registerChan will yield all
// existing registered workers (if any) and new ones as they register.
//
func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
var ntasks int
var n_other int // number of inputs (for reduce) or outputs (for map)
switch phase {
case mapPhase:
ntasks = len(mapFiles)
n_other = nReduce
case reducePhase:
ntasks = nReduce
n_other = len(mapFiles)
}
fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, n_other)
// All ntasks tasks have to be scheduled on workers. Once all tasks
// have completed successfully, schedule() should return.
//
// Your code here (Part III, Part IV).
//
var wg sync.WaitGroup
//workIndex := make(chan int)
//
//workIndex <- 0
//flag := make(chan bool)
//flag <- false
for i:=0;i< ntasks;{
select {
case workAddress := <- registerChan:
//fmt.Printf("WorkName : %s the WorkNum : %d \n", workAddress, i)
wg.Add(1)
go func(workAdress string, jobName string , taskIndex int ,numOtherPhase int) {
//taskIndex := <- taskNumber
oneTask := DoTaskArgs{
JobName:jobName,
File:mapFiles[taskIndex],
Phase:phase,
TaskNumber: taskIndex,
NumOtherPhase:numOtherPhase}
ok := call(workAddress, "Worker.DoTask", oneTask, nil)
//fmt.Printf("the %d task have return value \n", taskIndex)
if ok {
i++
wg.Done()
registerChan <- workAddress
}else {
fmt.Printf("Phase: %s work error , work index is %d \n", phase,i)
}
}(workAddress, jobName, i ,n_other)
default:
}
}
//fmt.Println("jump jump !!")
wg.Wait()
fmt.Printf("Schedule: %v done\n", phase)
}
这是最终没有bug的版本,和我一开始有bug 的版本代码内容一点没差,只是在代码顺序上有一些变化,下面我贴上有 bug的版本
if ok {
i++
registerChan <- workAddress
wg.Done()
}else {
fmt.Printf("Phase: %s work error , work index is %d \n", phase,i)
}
这里仅仅只是把wg.Done 和 registerChan <- workAddress的顺序换了一下,bug就解除了,相信大家应该也看出来了,就是因为channel是阻塞的,如果我把registerChan <- workAddress 放在wg.Done() 的上面,在for 循环跳出之后,没有 <- registerChan 语句来取出 channel中的值,这个channel就会一直阻塞,导致 wg.Done() 一直不被执行从而使wg.Wait()一直被阻塞
在最上面的代码中,那些被注释掉的channel是没有用的,但是如果把注释取消,这段代码同样会陷入无尽的等待,道理是一样的。
综上所述,我们在使用channel的时候一定要注意,有放一定要有拿,放和拿一定要是配套的,不然很容易出现奇怪的bug ( 通常是陷入无尽的等待