Migrate Etcd
This procedure aims to replace the current nodes of your Etcd cluster with new ones. Migrating Etcd is a delicate operation, so we recommend a careful assessment of the steps required before starting.
The following procedure is scoped for Etcd clusters that are provisioned outside Kubernetes on their own nodes.
Please note that this procedure involves full downtime of StorageOS and access to its data
The following procedure is designed for Etcd clusters that are provisioned externally to Kubernetes, with members that are situated on their own nodes.
Migration
To perform Etcd migrations safely, we recommend adding new Etcd nodes to the existing Etcd cluster, before editing the StorageOS custom resource definition and finally deleting the old Etcd nodes one by one. The full procedure is as follows:
-
Prepare all the nodes that will host etcd. The Etcd provisioning steps can be found in the etcd section of the StorageOS prerequisites page.
On each new node, edit the
/etc/etcd.conf
file to contain the IP addresses of all old and new nodes. In the/etc/systemd/system/etcd3.service
file, ensure that the--initial-cluster
argument includes all old and new nodes. Also ensure that the--initial-cluster-state
argument is set toexisting
.For example, adding 3 further nodes to an initially 3 node cluster would involve populating
/etc/etcd.conf
on each of the fourth, fifth and sixth nodes with the IP addresses of all 6 nodes intended for the cluster, as well as environment variables for the client port and peers port:# NODE_IP is the IP of the node where this file resides. NODE_IP=192.168.152.142 # Node 1 IP NODE1_IP=192.168.195.168 # Node 2 IP NODE2_IP=192.168.202.40 # Node 3 IP NODE3_IP=192.168.174.117 # Node 4 IP NODE4_IP=192.168.152.142 # Node 5 IP NODE5_IP=192.168.198.252 # Node 6 IP NODE6_IP=192.168.150.122 CLIENT_PORT=2379 PEERS_PORT=2380
The executable section of
/etc/systemd/system/etcd3.service
on each of the new nodes should then be edited to reflect these environment variables. For example, the service file for the second new node to be added to an initially 3 node cluster would refer to the three old nodes and two new nodes. Note that in the--initial-cluster
variable we do not include members that have not yet been added to the cluster, apart from the member we are currently adding.... ExecStart=/usr/local/sbin/etcd3 --name etcd-${NODE_IP} \ ... --initial-cluster-state existing \ ... --initial-cluster etcd-${NODE1_IP}=http://${NODE1_IP}:${PEERS_PORT},etcd-${NODE2_IP}=http://${NODE2_IP}:${PEERS_PORT},etcd-${NODE3_IP}=http://${NODE3_IP}:${PEERS_PORT},etcd-${NODE4_IP}=http://${NODE4_IP}:${PEERS_PORT},etcd-${NODE5_IP}=http://${NODE5_IP}:${PEERS_PORT} ...
At runtime, the value of
initial-cluster
will resolve to the following:"etcd-192.168.195.168=http://192.168.195.168:2380,etcd-192.168.202.40=http://192.168.202.40:2380,etcd-192.168.174.117=http://192.168.174.117:2380,etcd-192.168.152.1"
-
Connect to an Etcd node that already belongs to the cluster, and add one of the new nodes to the cluster, using the
etcdctl member add --learner
command per the etcd documentation. The output includes environment variables with the name of the new member and the state and constituents of the cluster now that the new node has been added:# Name of the new member of etcd ETCD_NEW_MEMBER="etcd-192.168.152.142" # Peer url for the new member of etcd, including the port ETCD_NEW_MEMBER_PEER="http://192.168.152.142:2380" # Client url for the new member of etcd, including the port ETCD_NEW_MEMBER_CLIENT="http://192.168.195.168:2379" # Add the new member to the cluster ETCDCTL_API=3 etcdctl member add $ETCD_NEW_MEMBER \ --peer-urls="$ETCD_NEW_MEMBER_PEER" \ --endpoints="$ETCD_NEW_MEMBER_CLIENT" Member 4a5820690b65300c added to cluster 97f788d9ca9b3357 ETCD_NAME="etcd-192.168.152.142" ETCD_INITIAL_CLUSTER="etcd-192.168.202.40=http://192.168.202.40:2380,etcd-192.168.174.117=http://192.168.174.117:2380,etcd-192.168.152.142=http://192.168.152.142:2380,etcd-192.168.195.168=http://192.168.195.168:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.152.142:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
-
Start the Etcd service on the new node:
systemctl daemon-reload systemctl enable etcd3.service systemctl start etcd3.service
List the members in the cluster and confirm that the new node is present and has started, and has therefore successfully been added:
$ ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 member list aa9dec86cc9c2d1, started, etcd-192.168.202.40, http://192.168.202.40:2380, http://192.168.202.40:2379, false 48823f29326a24f7, started, etcd-192.168.174.117, http://192.168.174.117:2380, http://192.168.174.117:2379, false 4a5820690b65300c, started, etcd-192.168.152.142, http://192.168.152.142:2380, http://192.168.152.142:2379, true gwk09ksjs5a6n862, started, etcd-192.168.152.142, http://192.168.152.142:2380, http://192.168.195.168:2379, false
-
Promote the learner to be a full voting member of the cluster with the
etcdctl member promote <member id>
command.As per the etcd documentation, the learner will fail to be promoted if it is not ready.
-
Repeat the previous two steps for all the remaining nodes onto which you wish to migrate etcd. Per the Etcd documentation, when adding more than one member to a cluster, it best practice to configure each member one at a time, verifying that each starts successfully before moving on to the next.
-
Delete the StorageOS Custom Resource per the instructions in the uninstall operations page.
Although deleting the StorageOSCluster Custom Resource will stop StorageOS, the data will be safe. When StorageOS starts again, the Volumes will be available.
-
Connect to one of the newly-added nodes, and use
etcdctl
to remove the old Etcd nodes 1 by 1:$ ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 member remove 48823f29326a24f7 Member 48823f29326a24f7 removed from cluster 97f788d9ca9b3357
After removing a node, check that Etcd is healthy and quorum has been maintained with the following commands:
export ETCD_ENDPOINTS="http://192.168.2020.40:2379,http://192.168.174.117.2379,..." etcdctl --endpoints $ETCD_ENDPOINTS endpoint health -w table etcdctl --endpoints $ETCD_ENDPOINTS endpoint status -w table
-
Edit the
kvBackend.address
parameter of the StorageOS Custom Resource Definition to reflect the locations of the new Etcd nodes, removing the references to the old nodes -
Recreate the StorageOS Custom Resource with the newly-edited
kvBackend.address
parameter -
Perform a
member list
again, to confirm that only the new machines are still present in the cluster:$ ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 member list 4a5820690b65300c, started, etcd-192.168.152.142, http://192.168.152.142:2380, http://192.168.152.142:2379, false b1780933f495adb3, started, etcd-192.168.150.122, http://192.168.150.122:2380, http://192.168.150.122:2379, false ee47ff85984afcc0, started, etcd-192.168.198.252, http://192.168.198.252:2380, http://192.168.198.252:2379, false
At this stage Etcd has been successfully migrated.
Etcd Learners
As of v3.4, Etcd includes the functionality to add a new member as a “learner” (a non-voting member of the cluster). This makes the process of adding new members safer, as it allows the new member to complete the process of synchronisation before participating in quorum, minimising cluster downtime.
If you are using an earlier version of etcd, make the following substitutions to the above process:
- In step 2, replace the
etcdctl member add --learner
command withetcdctl member add
- Skip the
etcdctl member promote <member id>
command
Using Etcd learners provides added resilience, so consider upgrading to a version of Etcd that supports this as soon as possible.