Membership change is a core component of the Raft protocol (specified in section 6 of the extended paper and discussed at length in Diego’s dissertation). But you bring up some good questions. In practice, there are certainly some requirements for safe configuration and a few different approaches common in real-world Raft implementations.
Generally, there are a couple of ways to bootstrap a Raft cluster: initialize the nodes with a configuration identifying each member of the cluster, or start the cluster with a single node and add nodes to the configuration (using the membership change protocol) to scale the cluster up to its intended size. Both will give you the same end result, it’s just a matter of preference.
One requirement for the cluster configuration is that each member has a fixed identity. If a follower acknowledges it persisted entries up to some index i and the leader marks that index committed, the leader should be able to assume entries 1-i will exist on that follower in perpetuity, even if the follower restarts. So, the replica with that identity must always have that log.
But this requirement brings us to another use case for membership changes: replacing failed members. I’d that follower’s log gets corrupted or the host crashes and never returns, it should only be replaced by executing the membership change protocol: adding a new replica and removing the old one. Again, it’s important that one of the membership change protocols discussed in the Raft literature be used.
Keep in mind that changing the number of nodes in the cluster can mean the quorum size changes as well, and this is what makes membership changes difficult to handle. When changing the quorum size the protocol needs to ensure commits are still stored on a majority of nodes. To resize the quorum safely to avoid disruptions, the membership protocol must be implemented precisely.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…