Etcd

StorageOS requires an etcd cluster in order to function. For more information on why etcd is required please see our etcd concepts page.

We do not support using the Kubernetes etcd for StorageOS installations.

We provide two methods for installing etcd. For those looking for a quick route to evaluating StorageOS, our first method installs etcd into your Kubernetes cluster using the CoreOS operator. Due to limitations in the CoreOS operator, this installation is not backed by any persistent storage, and therefore is unsuitable for production installations.

For production installations, there is currently no satisfactory way of installing a production-grade etcd cluster inside Kubernetes (although the landscape is changing rapidly, watch this space), and our production guidance remains to install etcd on separate machines outside of your Kubernetes cluster. This method is the best way to ensure a stable StorageOS cluster. Please see our etcd operations page for additional information on deployment best practices and concerns.

Ephemeral pods within Kubernetes (Testing)
External Virtual Machines (Production)

Click the tabs below to select the installation method of your choice.

Testing - Installing Etcd Into Your Kubernetes Cluster

This fast and convenient method is useful for quickly creating an etcd cluster in order to evaluate StorageOS. Do not use it for production installations.

This method uses the CoreOS etcd-operator to install a 3 node etcd cluster within your Kubernetes cluster, in the storageos-etcd namespace. We then install a Kubernetes service in that same namespace.

The official etcd-operator repository also has a backup deployment operator that can help backup etcd data. A restore of the etcd keyspace from a backup might cause issues due to the disparity between the cluster state and its metadata in a different point in time. If you need to restore from a backup after a failure of etcd, contact the StorageOS support team.

Quick Install

For a one command install, the following script uses kubectl to create an etcd cluster in the storageos-etcd namespace. It requires kubectl in the system path, and the context set to the appropriate cluster.

curl -s https://docs.storageos.com/v2.3/sh/deploy-etcd.sh | bash

Installation Step by Step

For those who would prefer to execute the steps by themselves, they are as follows:

Configure Namespace
```
export NAMESPACE=storageos-etcd
```
Create Namespace
```
kubectl create namespace $NAMESPACE
```

If running in Openshift, an SCC is needed to start Pods

oc adm policy add-scc-to-user anyuid system:serviceaccount:$NAMESPACE:default

Create ClusterRole and ClusterRoleBinding

 $ kubectl -n $NAMESPACE create -f-<<END
 apiVersion: rbac.authorization.k8s.io/v1beta1
 kind: ClusterRoleBinding
 metadata:
   name: etcd-operator
 roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: etcd-operator
 subjects:
   - kind: ServiceAccount
     name: default
     namespace: $NAMESPACE
END

$ kubectl -n $NAMESPACE create -f-<<END
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: etcd-operator
rules:
- apiGroups:
  - etcd.database.coreos.com
  resources:
  - etcdclusters
  - etcdbackups
  - etcdrestores
  verbs:
  - "*"
- apiGroups:
  - apiextensions.k8s.io
  resources:
   - customresourcedefinitions
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - endpoints
  - persistentvolumeclaims
  - events
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - "*"
# The following permissions can be removed if not using S3 backup and TLS
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
END

Deploy Etcd Operator

$ kubectl -n $NAMESPACE create -f - <<END
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-operator
spec:
  selector:
    matchLabels:
      app: etcd-operator
  replicas: 1
  template:
    metadata:
      labels:
        app: etcd-operator
    spec:
      containers:
      - name: etcd-operator
        image: quay.io/coreos/etcd-operator:v0.9.4
        command:
        - etcd-operator
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
END

Wait for the Etcd Operator Pod to start

kubectl -n $NAMESPACE get pod -lapp=etcd-operator

Create the EtcdCluster resource

$ kubectl -n $NAMESPACE create -f - <<END
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "storageos-etcd"
spec:
  size: 3
  version: "3.4.7"
  pod:
    etcdEnv:
    - name: ETCD_QUOTA_BACKEND_BYTES
      value: "2147483648"  # 2 GB
    - name: ETCD_AUTO_COMPACTION_RETENTION
      value: "1000" # Keep 1000 revisions (default)
    - name: ETCD_AUTO_COMPACTION_MODE
      value: "revision" # Set the revision mode
    resources:
      requests:
        cpu: 200m
        memory: 300Mi
    securityContext:
      runAsNonRoot: true
      runAsUser: 9000
      fsGroup: 9000
    tolerations:
    - operator: "Exists"
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: etcd_cluster
                operator: In
                values:
                - storageos-etcd
            topologyKey: kubernetes.io/hostname
END

Installation Verification

$ kubectl -n storageos-etcd get pod,svc
NAME                                 READY   STATUS    RESTARTS   AGE
pod/etcd-operator-55978c4587-8kx7b   1/1     Running   0          2h
pod/storageos-etcd-qm9tmrpnlm        1/1     Running   0          2h
pod/storageos-etcd-rzhjdz74hp        1/1     Running   0          2h
pod/storageos-etcd-wvvv2d9g98        1/1     Running   0          2h

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP PORT(S)             AGE
service/storageos-etcd          ClusterIP   None             <none>      2379/TCP,2380/TCP   22h
service/storageos-etcd-client   ClusterIP   172.30.132.255   <none>      2379/TCP            22h

The URL from the Service storageos-etcd-client.storageos-etcd.svc:2379 will be used later in the StorageOS Cluster CustomResource the kvBackend.address.

Known etcd-operator issues

Etcd is a distributed key-value store database focused on strong consistency. That means that etcd nodes perform operations across the cluster to ensure quorum. If quorum is lost, etcd nodes stop and etcd marks its contents as read-only. This is because it cannot guarantee that new data will be valid. Quorum is fundamental for etcd operations. When running etcd using the CoreOS Operator, it is important to consider that a loss of quorum could arise from etcd pods being evicted from nodes.

Operations such as Kubernetes Upgrades with rolling node pools could cause a total failure of the etcd cluster as nodes are discarded in favor of new ones.

A 3 etcd node cluster can survive losing one node and recover, a 5 node cluster can survive the loss of two nodes. Loss of further nodes will result in quorum being lost.

The etcd-operator doesn’t support a full stop of the cluster. Stopping the etcd cluster causes the loss of all the etcd keystore and make StorageOS unable to perform metadata changes.

Production - Etcd on External Virtual Machines

For production installations, StorageOS strongly recommends running etcd outside of Kubernetes on a minimum of 3 dedicated virtual machines. This topology offers strong guarantees of resilience and uptime. We recommend this architecture in all environments, including those where Kubernetes is being deployed as a managed service.

StorageOS doesn’t require a high performance etcd cluster, as the throughput of metadata to the cluster is low. However, we recommend a careful assessment of IOPS capacity best practices to ensure that etcd operates normally.

Depending on the level of redundancy you feel comfortable with you can install etcd on the Kubernetes Master nodes. Take extreme care to avoid collisions of the StorageOS etcd installation with the Kubernetes etcd when using the Kubernetes Master nodes. Precautions such as changing the default configuration for the client and peer ports, and ensuring the etcd data directory is modified. The ansible playbook below will default the etcd installation directory to /var/lib/storageos-etcd.

You can choose between two installation options.

Manual Installation
Ansible Installation

Installation - Manual

This section documents the steps required for manual installation of etcd using standard package management commands and systemd manifests.

Repeat the following steps on all the nodes that will run etcd as a systemd service.

Configure Etcd version and ports
```
export ETCD_VERSION="3.4.9"
export CLIENT_PORT="2379"
export PEERS_PORT="2380"
```
If targeting Kubernetes Master nodes, you must change CLIENT_PORT, PEERS_PORT

Download Etcd from CoreOS official site

curl -L https://github.com/coreos/etcd/releases/download/v${ETCD_VERSION}/etcd-v${ETCD_VERSION}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
mkdir -p /tmp/etcd-v${ETCD_VERSION}-linux-amd64
tar -xzvf /tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C /tmp/etcd-v${ETCD_VERSION}-linux-amd64 --strip-components=1
rm /tmp/etcd-${ETCD_VERSION}-linux-amd64.tar.gz

Install Etcd binaries

cd /tmp/etcd-v${ETCD_VERSION}-linux-amd64
mv etcd /usr/local/sbin/etcd3
mv etcdctl /usr/local/sbin/etcdctl
chmod 0755 /usr/local/sbin/etcd3 /usr/local/sbin/etcdctl

Set up persistent Etcd data directory
```
mkdir /var/lib/storageos-etcd
```

Create the systemd environment file

On all nodes that will run etcd create a systemd environemnt file /etc/etcd.conf which has the IPs of all the nodes. The NODE_IP will need to change to correspond to the node IP where the environment file resides. NODE1_IP, NODE2_IP and NODE3_IP will remain the same across all three files.

$ cat <<END > /etc/etcd.conf
# NODE_IP is the IP of the node where this file resides.
NODE_IP=10.64.10.228
# Node 1 IP
NODE1_IP=10.64.10.228
# Node 2 IP
NODE2_IP=10.64.14.233
# Node 3 IP  
NODE3_IP=10.64.12.111
CLIENT_PORT=${CLIENT_PORT}
PEERS_PORT=${PEERS_PORT}
END

# Verify that variables are expanded in the file
$ cat /etc/etcd.conf

Create the systemd unit file for etcd3 service

Create a systemd unit file /etc/systemd/system/etcd3.service with the following information:

[Unit]
Description=etcd3
Documentation=https://github.com/coreos/etcd
Conflicts=etcd2.service

[Service]
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=0
EnvironmentFile=/etc/etcd.conf

ExecStart=/usr/local/sbin/etcd3 --name etcd-${NODE_IP} \
    --data-dir /var/lib/storageos-etcd \
    --quota-backend-bytes 8589934592 \
    --auto-compaction-retention 1000 \
    --auto-compaction-mode revision \
    --initial-cluster-state new \
    --initial-cluster-token etcd-token \
    --listen-client-urls http://${NODE_IP}:${CLIENT_PORT},http://127.0.0.1:${CLIENT_PORT} \
    --advertise-client-urls http://${NODE_IP}:${CLIENT_PORT} \
    --listen-peer-urls http://${NODE_IP}:${PEERS_PORT} \
    --initial-advertise-peer-urls http://${NODE_IP}:${PEERS_PORT} \
    --initial-cluster etcd-${NODE1_IP}=http://${NODE1_IP}:${PEERS_PORT},etcd-${NODE2_IP}=http://${NODE2_IP}:${PEERS_PORT},etcd-${NODE3_IP}=http://${NODE3_IP}:${PEERS_PORT}


[Install]
WantedBy=multi-user.target

$NODE_IP is the IP address of the machine you are installing etcd on.`

Note that setting the advertise-client-urls incorrectly will cause any client connection to fail. StorageOS will fail to communicate to Etcd.

If enabling TLS, it is recomended to generate your own CA certificate and key. You will need to distribute the keys and certificates for the client auth on all etcd nodes. Moreover, the ExecStart value should look as below:

    ExecStart=/usr/local/sbin/etcd3 --name etcd-${NODE_IP} \
    --data-dir /var/lib/storageos-etcd \
    --quota-backend-bytes 8589934592 \
    --auto-compaction-retention 1000 \
    --auto-compaction-mode revision \
    --peer-auto-tls \
    --client-cert-auth --trusted-ca-file=/path/to/client-cert.pem \
    --cert-file=/path/to/ca.pem \
    --key-file=/path/to/client-key.pem \
    --initial-cluster-state new \
    --initial-cluster-token etcd-token \
    --listen-client-urls https://${NODE_IP}:${CLIENT_PORT} \
    --advertise-client-urls https://${NODE_IP}:${CLIENT_PORT} \
    --listen-peer-urls https://${NODE_IP}:${PEERS_PORT} \
    --initial-advertise-peer-urls https://${NODE_IP}:${PEERS_PORT} \
    --initial-cluster etcd-${NODE1_IP}=https://${NODE1_IP}:${PEERS_PORT},etcd-${NODE2_IP}=https://${NODE2_IP}:${PEERS_PORT},etcd-${NODE3_IP}=https://${NODE3_IP}:${PEERS_PORT}

Reload and start the etc3 systemd service

$ systemctl daemon-reload
$ systemctl enable etcd3.service
$ systemctl start  etcd3.service

Installation Verification

The etcdctl binary is installed at /usr/local/bin on the nodes.

$ ssh $NODE # Any node running the new etcd
$ ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:${CLIENT_PORT} member list # $NODE_IP - the IP of the node
66946cff1224bb5, started, etcd-b94bqkb9rf,  http://172.28.0.1:2380, http://172.28.0.1:2379
17e7256953f9319b, started, etcd-gjr25s4sdr, http://172.28.0.2:2380, http://172.28.0.2:2379
8b698843a4658823, started, etcd-rqdf9thx5p, http://172.28.0.3:2380, http://172.28.0.3:2379

Read the etcd operations page for our etcd recommendations.

Installation - Ansible

For a repeatable and automated installation, use of a configuration management tool such as ansible is recommended. StorageOS provides an ansible playbook to help you deploy etcd on standalone virtual machines.

Clone StorageOS deployment repository

git clone https://github.com/storageos/deploy.git
cd k8s/deploy-storageos/etcd-helpers/etcd-ansible-systemd

Edit the inventory file

The inventory file targets the nodes that will run etcd. The file hosts is an example of such an inventory file.

$ cat hosts
[nodes]
centos-1 ip="10.64.10.228" fqdn="ip-10-64-10-228.eu-west-2.compute.internal"
centos-2 ip="10.64.14.233" fqdn="ip-10-64-14-233.eu-west-2.compute.internal"
centos-3 ip="10.64.12.111" fqdn="ip-10-64-12-111.eu-west-2.compute.internal"

# Edit the inventory file
$ vi hosts # Or your own inventory file

The ip or fqdn are used to expose the advertise-client-urls of Etcd. Failing to provide valid ip/fqdn will cause any client connection to fail. StorageOS will fail to communicate to Etcd.

Edit the etcd configuration

If targeting Kubernetes Master nodes, you must change etcd_port_client, etcd_port_peers

$ cat group_vars/all
etcd_version: "3.4.9"
etcd_port_client: "2379"
etcd_port_peers: "2380"
etcd_quota_bytes: 8589934592  # 8 GB
etcd_auto_compaction_mode: "revision"
etcd_auto_compaction_retention: "1000"
members: "{{ groups['nodes'] }}"
installation_dir: "/var/lib/storageos-etcd"
advertise_format: 'fqdn' # fqdn || ip
backup_file: "/tmp/backup.db"

tls:
  enabled: false
  ca_common_name: "eu-west-2.compute.internal"
  etcd_common_name: "*.eu-west-2.compute.internal"
  cert_dir: "/etc/etcdtls"
  ca_cert_file: "etcd-ca.pem"
  etcd_server_cert_file: "server.pem"
  etcd_server_key_file: "server-key.pem"
  etcd_client_cert_file: "etcd-client.crt"
  etcd_client_key_file: "etcd-client.key"

$ vi group_vars/all

Choose between using IP addressing or FQDN in the advertise_format parameter. It allows you to decide how Etcd advertises its address to clients. This is particularly relevant when using TLS.

If enabling TLS, it is recomended to generate your own CA certificate and key. You can do it by generating the CA from the machine running Ansible by: ansible-playbook create_ca.yaml.

Install
```
ansible-playbook -i hosts install.yaml
```

Installation Verification

The playbook installs the etcdctl binary on the nodes, at /usr/local/bin.

$ ssh $NODE # Any node running the new etcd
$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 member list
66946cff1224bb5, started, etcd-b94bqkb9rf,  http://172.28.0.1:2380, http://172.28.0.1:2379
17e7256953f9319b, started, etcd-gjr25s4sdr, http://172.28.0.2:2380, http://172.28.0.2:2379
8b698843a4658823, started, etcd-rqdf9thx5p, http://172.28.0.3:2380, http://172.28.0.3:2379

Benefits of Running External to Kubernetes

Etcd is a distributed key-value store database focused on strong consistency. That means that etcd nodes perform operations across the cluster to ensure quorum. In the case that quorum is lost, an etcd node stops and marks its contents as read-only. Another peer might have a newer version that has not been committed to the database. Quorum is fundamental for etcd operations.

In a Kubernetes environment, applications are scheduled across and in some scenarios such as “DiskPressure” they may need to be evicted from a node, and be scheduled onto a different node. With an application such as etcd, the scenario described can result in quorum being lost, making the cluster unable to recover automatically. Usually a 3 node etcd cluster can survive losing one node and recover. However, losing a second node at the same time or even having a network partition between them will result in quorum lost.

Bind Etcd IPs to Kubernetes Service

Kubernetes external services use a DNS name to reference external endpoints, making them easy to reference from inside the cluster. You can use the example from the helper github repository to deploy the external Service. Using an external service can make monitoring of etcd from Prometheus easier.

Using Etcd with StorageOS

During installation of StorageOS the kvBackend.address parameter of the StorageOS operator is used to specify the address of the etcd cluster. See the StorageOS cluster operator configuration examples for more information.