Etcd client v3 学习笔记

本文是阅读了Etcd client v3官方代码后，结合个人理解做的一些整理，有理解不正确不到位的地方还望指正~

Etcd版本信息

代码库：https://github.com/etcd-io/etcd
版本：v3.3.18

Etcd client实现

概述

Etcd client v3是基于grpc实现的，而grpc又是基于http2.0实现的。因此整体上借用grpc的框架做地址管理、连接管理、负载均衡等；而底层对每个Etcd的server只需维持一个http2.0连接。
Etcd client v3包装了一个grpc的ClientConn，然后实现了Resolver和Balancer来管理它和与它交互。
Etcd client v3实现了grpc中的Resolver接口，用于Etcd server地址管理。当client初始化或者server集群地址发生变更（可以配置定时刷新地址）时，Resolver解析出新的连接地址，通知grpc ClientConn来响应变更。
Etcd client v3实现了grpc中的Balancer的接口，用于连接生命的周期管理和负载均衡等功能。当ClientConn或者其代表的SubConn状态发生变更时，会调用Balancer的接口以通知它。Balancer会生成一个视图Picker给ClientConn，供其选取可用的SubConn连接。
底层的SubConn又包装了Http2.0的连接ClientTransport。ClientTransport会维持与服务器的心跳，当连接断开会向上通知，让ClientConn通知Balancer去重建连接
总之，Balancer和Resolver像是ClientConn的手，但是由用户去定义它们的具体实现；而SubConn则是ClientConn的内脏，是其实现功能的基础。

Etcd client v3结构图

源码

Resolver

代码路径：https://github.com/grpc/grpc-go/blob/v1.26.0/resolver/resolver.go

Resolver接口定义了Resolver给ClientConn调用的方法

// Resolver watches for the updates on the specified target.
// Updates include address updates and service config updates.
type Resolver interface {
    // ResolveNow will be called by gRPC to try to resolve the target name
    // again. It's just a hint, resolver can ignore this if it's not necessary.
    //
    // It could be called multiple times concurrently.
    ResolveNow(ResolveNowOptions)
    // Close closes the resolver.
    Close()
}

ResolveNow：ClientConn在创建新连接前需要调用它来解析路径
Close：把Resolver关掉，关闭相关的goroutine等资源

ClientConn接口定义了ClientConn给Resolver调用的方法

// ClientConn contains the callbacks for resolver to notify any updates
// to the gRPC ClientConn.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type ClientConn interface {
    // UpdateState updates the state of the ClientConn appropriately.
    UpdateState(State)
    // ReportError notifies the ClientConn that the Resolver encountered an
    // error.  The ClientConn will notify the load balancer and begin calling
    // ResolveNow on the Resolver with exponential backoff.
    ReportError(error)
    // NewAddress is called by resolver to notify ClientConn a new list
    // of resolved addresses.
    // The address list should be the complete list of resolved addresses.
    //
    // Deprecated: Use UpdateState instead.
    NewAddress(addresses []Address)
    // NewServiceConfig is called by resolver to notify ClientConn a new
    // service config. The service config should be provided as a json string.
    //
    // Deprecated: Use UpdateState instead.
    NewServiceConfig(serviceConfig string)
    // ParseServiceConfig parses the provided service config and returns an
    // object that provides the parsed config.
    ParseServiceConfig(serviceConfigJSON string) *serviceconfig.ParseResult
}

UpdateState：当Resolver发生状态变更（即重新解析产生了新的Address），通知ClientConn去处理
ReportError：当Resolver发生异常，通知ClientConn去处理。
NewAddress：（已废弃，用UpdateState代替）
NewServiceConfig：（已废弃，用UpdateState代替）
ParseServiceConfig：解析服务的配置json串

总结： Resolver的作用是提供一个用户自定义的解析、修改地址的方法，使得用户可以自己去实现地址解析的逻辑、做服务发现、地址更新等等功能。在Etcd Client v3中主要是1. 将127.0.0.1:2379这样的地址解析成可供底层连接使用的格式；2. 当集群机器发生变更时，可以修改已经配置的地址

Balancer

代码路径：https://github.com/grpc/grpc-go/blob/v1.26.0/balancer/balancer.go

Balancer接口定义了Balancer给ClientConn调用的方法

// Balancer takes input from gRPC, manages SubConns, and collects and aggregates
// the connectivity states.
//
// It also generates and updates the Picker used by gRPC to pick SubConns for RPCs.
//
// HandleSubConnectionStateChange, HandleResolvedAddrs and Close are guaranteed
// to be called synchronously from the same goroutine.
// There's no guarantee on picker.Pick, it may be called anytime.
type Balancer interface {
    // HandleSubConnStateChange is called by gRPC when the connectivity state
    // of sc has changed.
    // Balancer is expected to aggregate all the state of SubConn and report
    // that back to gRPC.
    // Balancer should also generate and update Pickers when its internal state has
    // been changed by the new state.
    //
    // Deprecated: if V2Balancer is implemented by the Balancer,
    // UpdateSubConnState will be called instead.
    HandleSubConnStateChange(sc SubConn, state connectivity.State)
    // HandleResolvedAddrs is called by gRPC to send updated resolved addresses to
    // balancers.
    // Balancer can create new SubConn or remove SubConn with the addresses.
    // An empty address slice and a non-nil error will be passed if the resolver returns
    // non-nil error to gRPC.
    //
    // Deprecated: if V2Balancer is implemented by the Balancer,
    // UpdateClientConnState will be called instead.
    HandleResolvedAddrs([]resolver.Address, error)
    // Close closes the balancer. The balancer is not required to call
    // ClientConn.RemoveSubConn for its existing SubConns.
    Close()
}

HandleSubConnStateChange：当底层的SubConn连接状态发生变更时，ClientConn会调用该方法来通知Balancer，以维护内部的状态
HandleResolvedAddrs：当Resolver解析出新地址，调用UpdateState方法时，ClientConn会调用该方法，从而完成新连接创建等工作
Close：把Balancer关掉，关闭相关的goroutine等资源

ClientConn接口定义了ClientConn给Balancer调用的方法（注意这是balancer.go中的ClientConn接口，与resolver.go中的不同，但最终由同一个结构体来实现）

// ClientConn represents a gRPC ClientConn.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type ClientConn interface {
    // NewSubConn is called by balancer to create a new SubConn.
    // It doesn't block and wait for the connections to be established.
    // Behaviors of the SubConn can be controlled by options.
    NewSubConn([]resolver.Address, NewSubConnOptions) (SubConn, error)
    // RemoveSubConn removes the SubConn from ClientConn.
    // The SubConn will be shutdown.
    RemoveSubConn(SubConn)

    // UpdateBalancerState is called by balancer to notify gRPC that some internal
    // state in balancer has changed.
    //
    // gRPC will update the connectivity state of the ClientConn, and will call pick
    // on the new picker to pick new SubConn.
    //
    // Deprecated: use UpdateState instead
    UpdateBalancerState(s connectivity.State, p Picker)

    // UpdateState notifies gRPC that the balancer's internal state has
    // changed.
    //
    // gRPC will update the connectivity state of the ClientConn, and will call pick
    // on the new picker to pick new SubConns.
    UpdateState(State)

    // ResolveNow is called by balancer to notify gRPC to do a name resolving.
    ResolveNow(resolver.ResolveNowOptions)

    // Target returns the dial target for this ClientConn.
    //
    // Deprecated: Use the Target field in the BuildOptions instead.
    Target() string
}

NewSubConn：由Balancer来调用以产生底层的SubConn连接，连接的过程是非阻塞的
RemoveSubConn：由Balancer来调用以移除现有SubConn连接
UpdateBalancerState：（已废弃，用UpdateState代替）
UpdateState：由Balancer来调用，告诉ClientConn内部的状态发生了变更，并且会产生新的Picker
ResolveNow：由Balancer来调用，告诉ClientConn去调Resolver的ResolveNow以重新解析
Target：（已废弃）

Picker接口定义了Picker给ClientConn调用的方法（Picker其实是Balancer产生的一个快照，包含底层SubConn的当前状态，每次为ClientConn挑选一个可用的SubConn，保证其可用和负载均衡）

// Picker is used by gRPC to pick a SubConn to send an RPC.
// Balancer is expected to generate a new picker from its snapshot every time its
// internal state has changed.
//
// The pickers used by gRPC can be updated by ClientConn.UpdateBalancerState().
//
// Deprecated: use V2Picker instead
type Picker interface {
    // Pick returns the SubConn to be used to send the RPC.
    // The returned SubConn must be one returned by NewSubConn().
    //
    // This functions is expected to return:
    // - a SubConn that is known to be READY;
    // - ErrNoSubConnAvailable if no SubConn is available, but progress is being
    //   made (for example, some SubConn is in CONNECTING mode);
    // - other errors if no active connecting is happening (for example, all SubConn
    //   are in TRANSIENT_FAILURE mode).
    //
    // If a SubConn is returned:
    // - If it is READY, gRPC will send the RPC on it;
    // - If it is not ready, or becomes not ready after it's returned, gRPC will
    //   block until UpdateBalancerState() is called and will call pick on the
    //   new picker. The done function returned from Pick(), if not nil, will be
    //   called with nil error, no bytes sent and no bytes received.
    //
    // If the returned error is not nil:
    // - If the error is ErrNoSubConnAvailable, gRPC will block until UpdateBalancerState()
    // - If the error is ErrTransientFailure or implements IsTransientFailure()
    //   bool, returning true:
    //   - If the RPC is wait-for-ready, gRPC will block until UpdateBalancerState()
    //     is called to pick again;
    //   - Otherwise, RPC will fail with unavailable error.
    // - Else (error is other non-nil error):
    //   - The RPC will fail with the error's status code, or Unknown if it is
    //     not a status error.
    //
    // The returned done() function will be called once the rpc has finished,
    // with the final status of that RPC.  If the SubConn returned is not a
    // valid SubConn type, done may not be called.  done may be nil if balancer
    // doesn't care about the RPC status.
    Pick(ctx context.Context, info PickInfo) (conn SubConn, done func(DoneInfo), err error)
}

Pick：挑选一个可用的SubConn给ClientConn

SubConn接口定义了其作为子连接的功能。grpc中一个SubConn对应多个Address，gprc会按顺序尝试连接，直到第一次成功连接。实际在Etcd client v3的使用中，一个endpoint地址对应一个SubConn。
当连接状态为IDLE时，不会连接，直到Balancer显式调用Connect，当连接过程发生异常，会立即重试。如果状态恢复到IDLE，它就不会自动重连了，除非显式调用Connect。

// SubConn represents a gRPC sub connection.
// Each sub connection contains a list of addresses. gRPC will
// try to connect to them (in sequence), and stop trying the
// remainder once one connection is successful.
//
// The reconnect backoff will be applied on the list, not a single address.
// For example, try_on_all_addresses -> backoff -> try_on_all_addresses.
//
// All SubConns start in IDLE, and will not try to connect. To trigger
// the connecting, Balancers must call Connect.
// When the connection encounters an error, it will reconnect immediately.
// When the connection becomes IDLE, it will not reconnect unless Connect is
// called.
//
// This interface is to be implemented by gRPC. Users should not need a
// brand new implementation of this interface. For the situations like
// testing, the new implementation should embed this interface. This allows
// gRPC to add new methods to this interface.
type SubConn interface {
    // UpdateAddresses updates the addresses used in this SubConn.
    // gRPC checks if currently-connected address is still in the new list.
    // If it's in the list, the connection will be kept.
    // If it's not in the list, the connection will gracefully closed, and
    // a new connection will be created.
    //
    // This will trigger a state transition for the SubConn.
    UpdateAddresses([]resolver.Address)
    // Connect starts the connecting for this SubConn.
    Connect()
}

UpdateAddresses：更新SubConn的Address列表，如果已有连接在新列表中，则保持；如果没有则优雅关闭；如果有新的则创建
Connect：开始连接

总结： Balancer的作用是提供一个用户自定义的负载均衡、内部状态管理结构。Etcd client v3中实现了一个round robin的负载均衡器

Etcd client创建流程

New

创建新的Etcd client v3对象的过程主要包含:

创建Client结构体 -> 2. 配置参数 -> 3. 配置Resolver -> 4. 配置Balancer -> 5. grpc dial -> 6. 创建子模块结构体 -> 7. 启动地址自动更新协程-> 8. 返回Client结构体

核心代码

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/client.go

func newClient(cfg *Config) (*Client, error) {
    // 一些预处理
    // ……

    // 1. 创建Client结构体
    ctx, cancel := context.WithCancel(baseCtx)
    client := &Client{
        conn:     nil,
        cfg:      *cfg,
        creds:    creds,
        ctx:      ctx,
        cancel:   cancel,
        mu:       new(sync.RWMutex),
        callOpts: defaultCallOpts,
    }
    
    // 2. 配置参数
    // ……

    // 3. 配置Resolver
    // 其实Etcd对Resolver还封装了一层ResolverGroup，可能是为了支持多集群，但一般就一个集群，也就一个Resolver
    // Prepare a 'endpoint://<unique-client-id>/' resolver for the client and create a endpoint target to pass
    // to dial so the client knows to use this resolver.
    client.resolverGroup, err = endpoint.NewResolverGroup(fmt.Sprintf("client-%s", uuid.New().String()))
    if err != nil {
        client.cancel()
        return nil, err
    }
    client.resolverGroup.SetEndpoints(cfg.Endpoints)

    if len(cfg.Endpoints) < 1 {
        return nil, fmt.Errorf("at least one Endpoint must is required in client config")
    }
    dialEndpoint := cfg.Endpoints[0]

    // 4. 配置Balancer / 5. grpc dial
    // 由于在Etcd cient v3模块中注册了一个以roundRobinBalancerName为名的Balancer，所以这里只需要调用grpc.WithBalancerName，在grpc中会调用相应的Builder去创建已经配置好的Balancer。
    // dialWithBalancer中调用了grpc.dialWithContext，从而创建ClientConn以及底层的SubConn
    // Use a provided endpoint target so that for https:// without any tls config given, then
    // grpc will assume the certificate server name is the endpoint host.
    conn, err := client.dialWithBalancer(dialEndpoint, grpc.WithBalancerName(roundRobinBalancerName))
    if err != nil {
        client.cancel()
        client.resolverGroup.Close()
        return nil, err
    }
    // TODO: With the old grpc balancer interface, we waited until the dial timeout
    // for the balancer to be ready. Is there an equivalent wait we should do with the new grpc balancer interface?
    client.conn = conn

    // 6. 创建子模块结构体
    // 封装一些子模块，创建结构体
    client.Cluster = NewCluster(client)
    client.KV = NewKV(client)
    client.Lease = NewLease(client)
    client.Watcher = NewWatcher(client)
    client.Auth = NewAuth(client)
    client.Maintenance = NewMaintenance(client)

    if cfg.RejectOldCluster {
        if err := client.checkVersion(); err != nil {
            client.Close()
            return nil, err
        }
    }

    // 7. 启动地址自动更新协程
    // 如果配置了AutoSyncInterval参数，该协程会自动定时从Etcd服务器拉取当前集群列表，然后更新地址
    // 如果没配置，协程会马上退出
    go client.autoSync()
    
    // 8. 返回Client结构体
    return client, nil
}

Etcd client调用流程

从Client接口可以看出，Client组合了一些功能模块

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/client.go

// Client provides and manages an etcd v3 client session.
type Client struct {
    Cluster
    KV
    Lease
    Watcher
    Auth
    Maintenance
    
    // 其他
    // ……
}

Cluster：集群模块，主要是集群服务器相关，可以通过rpc请求实现对集群节点的增删改查，本文不深究。
KV：键值模块，核心的key-value增删查改，是最常用的模块。
Lease：租约模块，用于管理自动续租的会话，基于该会话写入的key-value记录会被服务器保持到会话结束，会话正常结束或者续租失效，记录会被服务器自动清除。很多功能诸如服务发现、分布式锁等都可以基于该模块实现。
Watcher：监听模块，用于监听记录的插入、修改和删除事件。
Auth：鉴权模块，本文不深究。
Maintenance：监控模块，本文不深究。

Get/Put/Delete

这部分功能是由KV模块实现的，主要实现的结构体是KVClient

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/etcdserver/etcdserverpb/rpc.pb.go


// KVClient 是KV模块的底层支撑
// KVClient is the client API for KV service.
//
// For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream.
type KVClient interface {
    // Range gets the keys in the range from the key-value store.
    Range(ctx context.Context, in *RangeRequest, opts ...grpc.CallOption) (*RangeResponse, error)
    // Put puts the given key into the key-value store.
    // A put request increments the revision of the key-value store
    // and generates one event in the event history.
    Put(ctx context.Context, in *PutRequest, opts ...grpc.CallOption) (*PutResponse, error)
    // DeleteRange deletes the given range from the key-value store.
    // A delete request increments the revision of the key-value store
    // and generates a delete event in the event history for every deleted key.
    DeleteRange(ctx context.Context, in *DeleteRangeRequest, opts ...grpc.CallOption) (*DeleteRangeResponse, error)
    // Txn processes multiple requests in a single transaction.
    // A txn request increments the revision of the key-value store
    // and generates events with the same revision for every completed request.
    // It is not allowed to modify the same key several times within one txn.
    Txn(ctx context.Context, in *TxnRequest, opts ...grpc.CallOption) (*TxnResponse, error)
    // Compact compacts the event history in the etcd key-value store. The key-value
    // store should be periodically compacted or the event history will continue to grow
    // indefinitely.
    Compact(ctx context.Context, in *CompactionRequest, opts ...grpc.CallOption) (*CompactionResponse, error)
}

// kVClient结构体实现KVClient接口
type kVClient struct {
    // 与ClientConn关联
    cc *grpc.ClientConn
}

// 以Put方法为例最终会调用ClientConn的Invoke方法
func (c *kVClient) Put(ctx context.Context, in *PutRequest, opts ...grpc.CallOption) (*PutResponse, error) {
    out := new(PutResponse)
    err := c.cc.Invoke(ctx, "/etcdserverpb.KV/Put", in, out, opts...)
    if err != nil {
        return nil, err
    }
    return out, nil
}

// ……

可以看出最后调用grpc是调用了ClientConn的Invoke方法；最终会调用以下方法：

代码路径：https://github.com/grpc/grpc-go/blob/v1.26.0/call.go

func invoke(ctx context.Context, method string, req, reply interface{}, cc *ClientConn, opts ...CallOption) error {
    // 创建一个ClientStream，这一步会获取一个底层的Transport，在其基础上创建一个Stream作为本次请求的通信流
    cs, err := newClientStream(ctx, unaryStreamDesc, cc, method, opts...)
    if err != nil {
        return err
    }
    // 用通信流发送request数据
    if err := cs.SendMsg(req); err != nil {
        return err
    }
    //  用通信流接收reply数据
    return cs.RecvMsg(reply)
}

注意：Get/Put/Delete等方法是同步阻塞的，而且如果在请求过程中发生网络异常，会直接返回错误而不会自动重试

Watch

这部分功能是由Watcher模块实现的，主要实现的结构体是watcher

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go

type Watcher interface {
    // Watch watches on a key or prefix. The watched events will be returned
    // through the returned channel. If revisions waiting to be sent over the
    // watch are compacted, then the watch will be canceled by the server, the
    // client will post a compacted error watch response, and the channel will close.
    // If the context "ctx" is canceled or timed out, returned "WatchChan" is closed,
    // and "WatchResponse" from this closed channel has zero events and nil "Err()".
    // The context "ctx" MUST be canceled, as soon as watcher is no longer being used,
    // to release the associated resources.
    //
    // If the context is "context.Background/TODO", returned "WatchChan" will
    // not be closed and block until event is triggered, except when server
    // returns a non-recoverable error (e.g. ErrCompacted).
    // For example, when context passed with "WithRequireLeader" and the
    // connected server has no leader (e.g. due to network partition),
    // error "etcdserver: no leader" (ErrNoLeader) will be returned,
    // and then "WatchChan" is closed with non-nil "Err()".
    // In order to prevent a watch stream being stuck in a partitioned node,
    // make sure to wrap context with "WithRequireLeader".
    //
    // Otherwise, as long as the context has not been canceled or timed out,
    // watch will retry on other recoverable errors forever until reconnected.
    //
    // TODO: explicitly set context error in the last "WatchResponse" message and close channel?
    // Currently, client contexts are overwritten with "valCtx" that never closes.
    // TODO(v3.4): configure watch retry policy, limit maximum retry number
    // (see https://github.com/etcd-io/etcd/issues/8980)
    Watch(ctx context.Context, key string, opts ...OpOption) WatchChan

    // RequestProgress requests a progress notify response be sent in all watch channels.
    RequestProgress(ctx context.Context) error

    // Close closes the watcher and cancels all watch requests.
    Close() error
}

// watcher implements the Watcher interface
type watcher struct {
    remote   pb.WatchClient
    callOpts []grpc.CallOption

    // mu protects the grpc streams map
    mu sync.RWMutex

    // streams holds all the active grpc streams keyed by ctx value.
    streams map[string]*watchGrpcStream
}

可以看出watcher结构体中包含一个streams map[string]*watchGrpcStream的map，里面保存了所有活跃的watchGrpcStream。watchGrpcStream代表建立在grpc上的一条通信流，每条watchGrpcStream的map上又承载了若干watcherStream。每次Watch请求都会创建一个watcherStream，watcherStream会一直维持，直到人为cancel或者发生不能恢复的内部异常，如果只是网络断开等可恢复的异常，会尝试重建watcherStream。

再看Watch方法，主要分为三个阶段

1 准备阶段，配置参数、构建watchRequest结构体
2 获取watchGrpcStream阶段，会复用或者创建watchGrpcStream
3 提交阶段，将watchRequest提交给WatcherGrpcStream的reqc，然后等待watchRequest的返回channel: retc来返回WatchChan

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go

// Watch posts a watch request to run() and waits for a new watcher channel
func (w *watcher) Watch(ctx context.Context, key string, opts ...OpOption) WatchChan {
    // 1. 准备阶段，配置参数、构建watchRequest结构体
    ow := opWatch(key, opts...)

    var filters []pb.WatchCreateRequest_FilterType
    if ow.filterPut {
        filters = append(filters, pb.WatchCreateRequest_NOPUT)
    }
    if ow.filterDelete {
        filters = append(filters, pb.WatchCreateRequest_NODELETE)
    }

    wr := &watchRequest{
        ctx:            ctx,
        createdNotify:  ow.createdNotify,
        key:            string(ow.key),
        end:            string(ow.end),
        rev:            ow.rev,
        progressNotify: ow.progressNotify,
        fragment:       ow.fragment,
        filters:        filters,
        prevKV:         ow.prevKV,
        retc:           make(chan chan WatchResponse, 1),
    }

    ok := false
    ctxKey := streamKeyFromCtx(ctx)

    // 2. 复用或者创建WatcherGrpcStream
    // find or allocate appropriate grpc watch stream
    w.mu.Lock()
    if w.streams == nil {
        // closed
        w.mu.Unlock()
        ch := make(chan WatchResponse)
        close(ch)
        return ch
    }
    wgs := w.streams[ctxKey]
    if wgs == nil {
        wgs = w.newWatcherGrpcStream(ctx)
        w.streams[ctxKey] = wgs
    }
    // 这里获取WatcherGrpcStream的reqc和donec两个channel，分别用来向它提交请求和异常的信号
    donec := wgs.donec
    reqc := wgs.reqc
    w.mu.Unlock()

    // couldn't create channel; return closed channel
    closeCh := make(chan WatchResponse, 1)

    // 3. 将请求提交给WatcherGrpcStream的reqc，然后等待watchRequest的返回channel: retc来返回WatchChan
    // submit request
    select {
    case reqc <- wr:
        ok = true
    case <-wr.ctx.Done():
    case <-donec:
        if wgs.closeErr != nil {
            closeCh <- WatchResponse{Canceled: true, closeErr: wgs.closeErr}
            break
        }
        // retry; may have dropped stream from no ctxs
        return w.Watch(ctx, key, opts...)
    }

    // receive channel
    if ok {
        select {
        case ret := <-wr.retc:
            return ret
        case <-ctx.Done():
        case <-donec:
            if wgs.closeErr != nil {
                closeCh <- WatchResponse{Canceled: true, closeErr: wgs.closeErr}
                break
            }
            // retry; may have dropped stream from no ctxs
            return w.Watch(ctx, key, opts...)
        }
    }

    close(closeCh)
    return closeCh
}

其中需要关注的是wgs = w.newWatcherGrpcStream(ctx)这行代码，它会创建一个watchGrpcStream
这个方法会创建一个watchGrpcStream，然后起一个goroutine去执行它的run方法，主要的逻辑都在这个方法中完成。
它主要包含以下几个工作：

1 一些准备工作
2 创建一个grpc的stream，也就是watch client，用以承载每个watch产生的substreams
3 进入一个大的等待循环
- 3.1 等待新的watch请求。一旦有新的watch，会创建一个substream，然后启动一个goroutine去实际接收这个substream返回的WatchResponse，最后将请求通过watch client发送出去
- 3.2 从watch client接收到response事件，会将事件分派给对应的substream去处理
- 3.3 如果watch client通信发生异常（可能是网络连不上，或者服务器挂了），会尝试重新获取一个watch client，并重发请求，这可以保证watch不会因为网络抖动等原因而中断
- 3.4 异常或结束时的处理

当然这里面还有很多细节，这里不讨论了。

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/watch.go

func (w *watcher) newWatcherGrpcStream(inctx context.Context) *watchGrpcStream {
    ctx, cancel := context.WithCancel(&valCtx{inctx})
    wgs := &watchGrpcStream{
        owner:      w,
        remote:     w.remote,
        callOpts:   w.callOpts,
        ctx:        ctx,
        ctxKey:     streamKeyFromCtx(inctx),
        cancel:     cancel,
        substreams: make(map[int64]*watcherStream),
        respc:      make(chan *pb.WatchResponse),
        reqc:       make(chan watchStreamRequest),
        donec:      make(chan struct{}),
        errc:       make(chan error, 1),
        closingc:   make(chan *watcherStream),
        resumec:    make(chan struct{}),
    }
    go wgs.run()
    return wgs
}

// run is the root of the goroutines for managing a watcher client
func (w *watchGrpcStream) run() {
    
    // 1. 一些准备工作
    // ……

    // 2. 创建一个grpc的stream，也就是watch client，用以承载每个watch产生的substreams
    // start a stream with the etcd grpc server
    if wc, closeErr = w.newWatchClient(); closeErr != nil {
        return
    }

    cancelSet := make(map[int64]struct{})

    // 3. 进入一个大的等待循环
    var cur *pb.WatchResponse
    for {
        select {
        // 3.1 等待新的watch请求
        // 一旦有新的watch，会创建一个substream，然后启动一个goroutine去实际接收这个substream返回的WatchResponse，最后将请求通过watch client发送出去
        // Watch() requested
        case req := <-w.reqc:
            switch wreq := req.(type) {
            case *watchRequest:
                outc := make(chan WatchResponse, 1)
                // TODO: pass custom watch ID?
                ws := &watcherStream{
                    initReq: *wreq,
                    id:      -1,
                    outc:    outc,
                    // unbuffered so resumes won't cause repeat events
                    recvc: make(chan *WatchResponse),
                }

                ws.donec = make(chan struct{})
                w.wg.Add(1)
                // 这里就是创建一个goroutine去实际处理这个substream返回的WatchResponse
                go w.serveSubstream(ws, w.resumec)

                // queue up for watcher creation/resume
                w.resuming = append(w.resuming, ws)
                if len(w.resuming) == 1 {
                    // head of resume queue, can register a new watcher
                    wc.Send(ws.initReq.toPB())
                }
            case *progressRequest:
                wc.Send(wreq.toPB())
            }

        // 3.2 从watch client接收到response事件
        // 会将事件分派给对应的substream去处理
        // new events from the watch client
        case pbresp := <-w.respc:
            if cur == nil || pbresp.Created || pbresp.Canceled {
                cur = pbresp
            } else if cur != nil && cur.WatchId == pbresp.WatchId {
                // merge new events
                cur.Events = append(cur.Events, pbresp.Events...)
                // update "Fragment" field; last response with "Fragment" == false
                cur.Fragment = pbresp.Fragment
            }

            switch {
            case pbresp.Created:
                // response to head of queue creation
                if ws := w.resuming[0]; ws != nil {
                    w.addSubstream(pbresp, ws)
                    w.dispatchEvent(pbresp)
                    w.resuming[0] = nil
                }

                if ws := w.nextResume(); ws != nil {
                    wc.Send(ws.initReq.toPB())
                }

                // reset for next iteration
                cur = nil

            case pbresp.Canceled && pbresp.CompactRevision == 0:
                delete(cancelSet, pbresp.WatchId)
                if ws, ok := w.substreams[pbresp.WatchId]; ok {
                    // signal to stream goroutine to update closingc
                    close(ws.recvc)
                    closing[ws] = struct{}{}
                }

                // reset for next iteration
                cur = nil

            case cur.Fragment:
                // watch response events are still fragmented
                // continue to fetch next fragmented event arrival
                continue

            default:
                // dispatch to appropriate watch stream
                ok := w.dispatchEvent(cur)

                // reset for next iteration
                cur = nil

                if ok {
                    break
                }

                // watch response on unexpected watch id; cancel id
                if _, ok := cancelSet[pbresp.WatchId]; ok {
                    break
                }

                cancelSet[pbresp.WatchId] = struct{}{}
                cr := &pb.WatchRequest_CancelRequest{
                    CancelRequest: &pb.WatchCancelRequest{
                        WatchId: pbresp.WatchId,
                    },
                }
                req := &pb.WatchRequest{RequestUnion: cr}
                wc.Send(req)
            }

        // 3.3 如果watch client通信发生异常（可能是网络连不上，或者服务器挂了），会尝试重新获取一个watch client，并重发请求，这可以保证watch不会因为网络抖动等原因而中断
        // watch client failed on Recv; spawn another if possible
        case err := <-w.errc:
            if isHaltErr(w.ctx, err) || toErr(w.ctx, err) == v3rpc.ErrNoLeader {
                closeErr = err
                return
            }
            if wc, closeErr = w.newWatchClient(); closeErr != nil {
                return
            }
            if ws := w.nextResume(); ws != nil {
                wc.Send(ws.initReq.toPB())
            }
            cancelSet = make(map[int64]struct{})

        // 3.4 异常或结束时的处理
        case <-w.ctx.Done():
            return

        case ws := <-w.closingc:
            w.closeSubstream(ws)
            delete(closing, ws)
            // no more watchers on this stream, shutdown
            if len(w.substreams)+len(w.resuming) == 0 {
                return
            }
        }
    }
}

Lease

这部分功能是由Lease模块实现的，主要实现的结构体是lessor

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go

type Lease interface {
    // Grant creates a new lease.
    Grant(ctx context.Context, ttl int64) (*LeaseGrantResponse, error)

    // Revoke revokes the given lease.
    Revoke(ctx context.Context, id LeaseID) (*LeaseRevokeResponse, error)

    // TimeToLive retrieves the lease information of the given lease ID.
    TimeToLive(ctx context.Context, id LeaseID, opts ...LeaseOption) (*LeaseTimeToLiveResponse, error)

    // Leases retrieves all leases.
    Leases(ctx context.Context) (*LeaseLeasesResponse, error)

    // KeepAlive keeps the given lease alive forever. If the keepalive response
    // posted to the channel is not consumed immediately, the lease client will
    // continue sending keep alive requests to the etcd server at least every
    // second until latest response is consumed.
    //
    // The returned "LeaseKeepAliveResponse" channel closes if underlying keep
    // alive stream is interrupted in some way the client cannot handle itself;
    // given context "ctx" is canceled or timed out. "LeaseKeepAliveResponse"
    // from this closed channel is nil.
    //
    // If client keep alive loop halts with an unexpected error (e.g. "etcdserver:
    // no leader") or canceled by the caller (e.g. context.Canceled), the error
    // is returned. Otherwise, it retries.
    //
    // TODO(v4.0): post errors to last keep alive message before closing
    // (see https://github.com/coreos/etcd/pull/7866)
    KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error)

    // KeepAliveOnce renews the lease once. The response corresponds to the
    // first message from calling KeepAlive. If the response has a recoverable
    // error, KeepAliveOnce will retry the RPC with a new keep alive message.
    //
    // In most of the cases, Keepalive should be used instead of KeepAliveOnce.
    KeepAliveOnce(ctx context.Context, id LeaseID) (*LeaseKeepAliveResponse, error)

    // Close releases all resources Lease keeps for efficient communication
    // with the etcd server.
    Close() error
}

type lessor struct {
    mu sync.Mutex // guards all fields

    // donec is closed and loopErr is set when recvKeepAliveLoop stops
    donec   chan struct{}
    loopErr error

    remote pb.LeaseClient

    stream       pb.Lease_LeaseKeepAliveClient
    streamCancel context.CancelFunc

    stopCtx    context.Context
    stopCancel context.CancelFunc

    keepAlives map[LeaseID]*keepAlive

    // firstKeepAliveTimeout is the timeout for the first keepalive request
    // before the actual TTL is known to the lease client
    firstKeepAliveTimeout time.Duration

    // firstKeepAliveOnce ensures stream starts after first KeepAlive call.
    firstKeepAliveOnce sync.Once

    callOpts []grpc.CallOption
}

主要说三个方法：

Grant：通过一次grpc调用，通知服务器创建一个带ttl的lease，返回一个lease id，并将其保存在自己的keepAlives map中
Revoke：通过一次grpc调用，通知服务器销毁对应lease id的lease
KeepAlive：会创建一个stream，并且创建两个协程分别用来发送keepalive和接收响应，从而维持这个lease

先看Grant和Revoke

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go

func (l *lessor) Grant(ctx context.Context, ttl int64) (*LeaseGrantResponse, error) {
    r := &pb.LeaseGrantRequest{TTL: ttl}
    resp, err := l.remote.LeaseGrant(ctx, r, l.callOpts...)
    if err == nil {
        gresp := &LeaseGrantResponse{
            ResponseHeader: resp.GetHeader(),
            ID:             LeaseID(resp.ID),
            TTL:            resp.TTL,
            Error:          resp.Error,
        }
        return gresp, nil
    }
    return nil, toErr(ctx, err)
}

func (l *lessor) Revoke(ctx context.Context, id LeaseID) (*LeaseRevokeResponse, error) {
    r := &pb.LeaseRevokeRequest{ID: int64(id)}
    resp, err := l.remote.LeaseRevoke(ctx, r, l.callOpts...)
    if err == nil {
        return (*LeaseRevokeResponse)(resp), nil
    }
    return nil, toErr(ctx, err)
}

其中底层调用与Get/Put/Delete类似

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/etcdserver/etcdserverpb/rpc.pb.go

func (c *leaseClient) LeaseGrant(ctx context.Context, in *LeaseGrantRequest, opts ...grpc.CallOption) (*LeaseGrantResponse, error) {
    out := new(LeaseGrantResponse)
    err := c.cc.Invoke(ctx, "/etcdserverpb.Lease/LeaseGrant", in, out, opts...)
    if err != nil {
        return nil, err
    }
    return out, nil
}

func (c *leaseClient) LeaseRevoke(ctx context.Context, in *LeaseRevokeRequest, opts ...grpc.CallOption) (*LeaseRevokeResponse, error) {
    out := new(LeaseRevokeResponse)
    err := c.cc.Invoke(ctx, "/etcdserverpb.Lease/LeaseRevoke", in, out, opts...)
    if err != nil {
        return nil, err
    }
    return out, nil
}

重点看KeepAlive的实现

准备工作
创建或复用LeaseID所对应的keepAlive结构体，并将当前的ctx和ch与之关联
创建相关goroutine。其中最重要的是go l.recvKeepAliveLoop()这行，在内部实现了定期与服务器续租的逻辑

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go

func (l *lessor) KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error) {
    
    // 1. 准备工作
    ch := make(chan *LeaseKeepAliveResponse, LeaseResponseChSize)

    l.mu.Lock()
    // ensure that recvKeepAliveLoop is still running
    select {
    case <-l.donec:
        err := l.loopErr
        l.mu.Unlock()
        close(ch)
        return ch, ErrKeepAliveHalted{Reason: err}
    default:
    }
    
    // 2. 创建或复用LeaseID所对应的keepAlive结构体，并将当前的ctx和ch与之关联
    ka, ok := l.keepAlives[id]
    if !ok {
        // create fresh keep alive
        ka = &keepAlive{
            chs:           []chan<- *LeaseKeepAliveResponse{ch},
            ctxs:          []context.Context{ctx},
            deadline:      time.Now().Add(l.firstKeepAliveTimeout),
            nextKeepAlive: time.Now(),
            donec:         make(chan struct{}),
        }
        l.keepAlives[id] = ka
    } else {
        // add channel and context to existing keep alive
        ka.ctxs = append(ka.ctxs, ctx)
        ka.chs = append(ka.chs, ch)
    }
    l.mu.Unlock()

    // 3. 创建相关goroutine
    // 其中最重要的是go l.recvKeepAliveLoop()这行
    // 在内部实现了定期与服务器续租的逻辑
    go l.keepAliveCtxCloser(id, ctx, ka.donec)
    l.firstKeepAliveOnce.Do(func() {
        go l.recvKeepAliveLoop()
        go l.deadlineLoop()
    })

    return ch, nil
}

再来看下recvKeepAliveLoop这个方法：

1 准备工作
2 进入一个大的循环，实现续租逻辑。大循环的目的是当一个stream失效时可以重试
- 2.1 resetRecv中创建一个stream，并在内部创建一个goroutine不断地发送续租请求
- 2.2 进入一个小的循环，小循环会不断地从前面创建的stream中读取服务器的返回消息
  - 2.2.1 stream.Recv()会阻塞，直到服务器返回消息或者产生异常
  - 2.2.2 将服务器返回的消息发送到与keepAlive相关联的channel中去
- 2.3 如果小循环退出了，说明stream失效了，延时500ms后重新开始大循环

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go

func (l *lessor) recvKeepAliveLoop() (gerr error) {
    // 1. 准备工作
    defer func() {
        l.mu.Lock()
        close(l.donec)
        l.loopErr = gerr
        for _, ka := range l.keepAlives {
            ka.close()
        }
        l.keepAlives = make(map[LeaseID]*keepAlive)
        l.mu.Unlock()
    }()

    // 2. 进入一个大的循环，实现续租逻辑。大循环的目的是当一个stream失效时可以重试
    for {
        // 2.1 resetRecv中创建一个stream，并在内部创建一个goroutine不断地发送续租请求
        stream, err := l.resetRecv()
        if err != nil {
            if canceledByCaller(l.stopCtx, err) {
                return err
            }
        } else {
        // 2.2 进入一个小的循环，小循环会不断地从前面创建的stream中读取服务器的返回消息
            for {
                // 2.2.1 stream.Recv()会阻塞，直到服务器返回消息或者产生异常
                resp, err := stream.Recv()
                // 异常处理
                if err != nil {
                    if canceledByCaller(l.stopCtx, err) {
                        return err
                    }

                    if toErr(l.stopCtx, err) == rpctypes.ErrNoLeader {
                        l.closeRequireLeader()
                    }
                    break
                }

                // 2.2.2 将服务器返回的消息发送到与keepAlive相关联的channel中去
                l.recvKeepAlive(resp)
            }
        }

        // 2.3 如果小循环退出了，说明stream失效了，延时500ms后重新开始大循环
        select {
        case <-time.After(retryConnWait):
            continue
        case <-l.stopCtx.Done():
            return l.stopCtx.Err()
        }
    }
}

再看下resetRecv这个方法：

1 调用remote的LeaseKeepAlive方法，创建一个grpc stream
2 启动goroutine来执行sendKeepAliveLoop，不断发送续租请求
3 返回该stream

其中sendKeepAliveLoop中每500ms会续租一次

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/lease.go

// resetRecv opens a new lease stream and starts sending keep alive requests.
func (l *lessor) resetRecv() (pb.Lease_LeaseKeepAliveClient, error) {
    // 1 调用remote的LeaseKeepAlive方法，创建一个grpc stream
    sctx, cancel := context.WithCancel(l.stopCtx)
    stream, err := l.remote.LeaseKeepAlive(sctx, l.callOpts...)
    if err != nil {
        cancel()
        return nil, err
    }

    l.mu.Lock()
    defer l.mu.Unlock()
    if l.stream != nil && l.streamCancel != nil {
        l.streamCancel()
    }

    l.streamCancel = cancel
    l.stream = stream

    // 2 启动goroutine来不断发送续租请求
    go l.sendKeepAliveLoop(stream)
    
    // 3 返回该stream
    return stream, nil
}

// sendKeepAliveLoop sends keep alive requests for the lifetime of the given stream.
func (l *lessor) sendKeepAliveLoop(stream pb.Lease_LeaseKeepAliveClient) {
    for {
        var tosend []LeaseID

        now := time.Now()
        l.mu.Lock()
        for id, ka := range l.keepAlives {
            if ka.nextKeepAlive.Before(now) {
                tosend = append(tosend, id)
            }
        }
        l.mu.Unlock()

        for _, id := range tosend {
            r := &pb.LeaseKeepAliveRequest{ID: int64(id)}
            if err := stream.Send(r); err != nil {
                // TODO do something with this error?
                return
            }
        }

        select {
        case <-time.After(500 * time.Millisecond):
        case <-stream.Context().Done():
            return
        case <-l.donec:
            return
        case <-l.stopCtx.Done():
            return
        }
    }
}

注意：由以上代码可知：

不管Grant时候设置的ttl是多少，Keepalive永远都会以500ms左右的间隔发送续租请求，加上recvKeepAliveLoop中大循环中异常重试等待的500ms，可知正常情况下两次续租请求的间隔最多在1s左右。
除非某个lease被取消或已经超时，否则如果出现网络抖动，是会自动重试的。

Session

其实Session功能完全是对Lease功能的封装，从NewSession可见一斑：
只是调用了Grant和KeepAlive，将信息都封装在Session结构体中，方便使用

代码路径：https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/concurrency/session.go

// NewSession gets the leased session for a client.
func NewSession(client *v3.Client, opts ...SessionOption) (*Session, error) {
    ops := &sessionOptions{ttl: defaultSessionTTL, ctx: client.Ctx()}
    for _, opt := range opts {
        opt(ops)
    }

    id := ops.leaseID
    if id == v3.NoLease {
        resp, err := client.Grant(ops.ctx, int64(ops.ttl))
        if err != nil {
            return nil, err
        }
        id = v3.LeaseID(resp.ID)
    }

    ctx, cancel := context.WithCancel(ops.ctx)
    keepAlive, err := client.KeepAlive(ctx, id)
    if err != nil || keepAlive == nil {
        cancel()
        return nil, err
    }

    donec := make(chan struct{})
    s := &Session{client: client, opts: ops, id: id, cancel: cancel, donec: donec}

    // keep the lease alive until client error or cancelled context
    go func() {
        defer close(donec)
        for range keepAlive {
            // eat messages until keep alive channel closes
        }
    }()

    return s, nil
}

Mutex

该模块实现了分布式锁的功能。它其实是通过使用Session（也就是Lease），在客户端实现的分布式锁，其实服务器除了支持Lease功能并不需要提供额外的功能。它的大体思路是：

在锁的key prefix下创建一个key，并用Session来不断续租它
key prefix下的所有key按照先后顺序有不同的revision，revision最小的那个key将获得锁
revision不是最小的key的持有者将阻塞，直到revision比它小的所有key都被删除时，它才获得锁

首先看Mutex结构体

s是相关联的Session
pfx是锁的前缀
myKey是当前持有的key
myRev是当前持有的key的revision
hdr是最新返回结果的Header

再看Lock方法

1 在pfx下注册myKey，这里用了事务，也就是在一次grpc请求中完成以下操作：
- 1.1 判断myKey是否已经存在
- 1.2 如果不存在则执行put写入，如果存在则执行get读取
- 1.3 执行getOwner读取pfx信息
2 将put或get的Response中的值的CreateRevision，取出来作为myRev
3 判断一下如果pfx下没有key，或者最小的CreateRevision和myRev一致，则表示已经获得锁了
4 如果没获得锁，则需要等待pfx前缀下revision比myRev小的key都被删除，才能拿到key

注意：其实这个方法这样实现有几个问题，这可能是Etcd client v3的设计bug：

如果服务器网络异常导致pfx下的所有key的都超时了，或者这些key被外部删除了，那所有竞争这个锁的节点都会认为自己拿到了锁
如果一个client拿到了锁，但是它的key超时了，它并不会有感知，仍然会认为自己拿着锁；但与此同时会有另一个client获取这把锁

为了解决这个问题，需要人为做两件事：

1 监听一下Mutex关联的Session有没有失效，如果失效则要手动放弃锁
2 监听一下当前的myKey有没有被删除，如果被删除则也要手动放弃锁

最后看一下Unlock，它比较简单，只是把myKey删掉就行了

https://github.com/etcd-io/etcd/blob/v3.3.18/clientv3/concurrency/mutex.go


// Mutex implements the sync Locker interface with etcd
type Mutex struct {
    s *Session

    pfx   string
    myKey string
    myRev int64
    hdr   *pb.ResponseHeader
}

func NewMutex(s *Session, pfx string) *Mutex {
    return &Mutex{s, pfx + "/", "", -1, nil}
}

// Lock locks the mutex with a cancelable context. If the context is canceled
// while trying to acquire the lock, the mutex tries to clean its stale lock entry.
func (m *Mutex) Lock(ctx context.Context) error {
    s := m.s
    client := m.s.Client()

    // 1 在pfx下注册myKey
    // 这里用了事务，也就是在一次grpc请求中完成以下操作：
    // 1.1 判断myKey是否已经存在
    // 1.2 如果不存在则执行put写入，如果存在则执行get读取
    // 1.3 执行getOwner读取pfx信息
    m.myKey = fmt.Sprintf("%s%x", m.pfx, s.Lease())
    cmp := v3.Compare(v3.CreateRevision(m.myKey), "=", 0)
    // put self in lock waiters via myKey; oldest waiter holds lock
    put := v3.OpPut(m.myKey, "", v3.WithLease(s.Lease()))
    // reuse key in case this session already holds the lock
    get := v3.OpGet(m.myKey)
    // fetch current holder to complete uncontended path with only one RPC
    getOwner := v3.OpGet(m.pfx, v3.WithFirstCreate()...)
    resp, err := client.Txn(ctx).If(cmp).Then(put, getOwner).Else(get, getOwner).Commit()
    if err != nil {
        return err
    }
    // 2 将put或get的Response中的值的CreateRevision，取出来作为myRev
    m.myRev = resp.Header.Revision
    if !resp.Succeeded {
        m.myRev = resp.Responses[0].GetResponseRange().Kvs[0].CreateRevision
    }
    // 3 判断一下如果pfx下没有key，或者最小的CreateRevision和myRev一致，则表示已经获得锁了
    // if no key on prefix / the minimum rev is key, already hold the lock
    ownerKey := resp.Responses[1].GetResponseRange().Kvs
    if len(ownerKey) == 0 || ownerKey[0].CreateRevision == m.myRev {
        m.hdr = resp.Header
        return nil
    }
    
    // 4 如果没获得锁，则需要等待pfx前缀下revision比myRev小的key都被删除，才能拿到key
    // wait for deletion revisions prior to myKey
    hdr, werr := waitDeletes(ctx, client, m.pfx, m.myRev-1)
    // release lock key if wait failed
    if werr != nil {
        m.Unlock(client.Ctx())
    } else {
        m.hdr = hdr
    }
    return werr
}

func (m *Mutex) Unlock(ctx context.Context) error {
    client := m.s.Client()
    if _, err := client.Delete(ctx, m.myKey); err != nil {
        return err
    }
    m.myKey = "\x00"
    m.myRev = -1
    return nil
}