StorageOS Best Practices
Use an external Etcd cluster
StorageOS uses the etcd
distributed key-value store to store essential cluster
metadata and manage distributed configuration state. For production environments
and testing of production workloads, we recommend deploying an external etcd
cluster. For more details about, and an example of, how to run etcd, see the
External etcd Operations page.
It is highly recommended to use external etcd for cloud environments and place the etcd cluster on stable nodes. Placing the etcd on nodes that are recycled often might affect the normal operations of StorageOS.
Etcd low latency IO
It is recommended to run etcd on low-latency disks and keep other IO-intensive applications separate from the etcd nodes. Etcd is very sensitive to IO latency. Thus, the effect of disk contention can cause etcd downtime.
Batch jobs such as backups, builds or application bundling can easily cause a high usage of disks making etcd unstable. It is recommended to run such workloads apart from the etcd servers.
Setup of storage on the hosts
We recommend creating a separate filesystem for StorageOS to mitigate the risk of filling the root filesystem on nodes. This has to be done for each node in the cluster.
Follow the managing host storage best practices page for more details.
Resource reservations
StorageOS resource consumption depends on the workloads and the StorageOS features in use.
The recommended minimum memory reservation for the StorageOS Pods is 512MB for non-production environments. However it is recommended to prepare nodes so StorageOS can operate with at least with 1-2GB of memory. StorageOS frees memory when possible.
For production environments, we recommend 4GB of Memory and 1 CPU as a minimum and to test StorageOS using realistic workloads and tune resources accordingly.
StorageOS Pods resource allocation will impact directly on the availability of volumes in case of eviction or resource limit triggered restart. It is recommended to not limit StorageOS Pods.
StorageOS implements a storage engine, therefore limiting CPU consumption might affect the I/O throughput of your volumes.
Setting a Kubernetes PID limit
StorageOS recommends that a PID cgroup limit of 32768 be set. StorageOS is a multi-threaded application and while most Kubernetes distributions set the PID cgroup limit to 32768, some environments can set a limit as low as 1024. The StorageOS init container will print a log message warning if the PID cgroup limit is too low. See our prerequisites for more information.
Maintain a sufficient number of nodes for replicas to be created
To ensure that a new replica can always be created, an additional node should be available. To guarantee high availability, clusters using Volumes with 1 replica must have at least 3 storage nodes. When using Volumes with 2 replicas, at least 4 storage nodes, 3 replicas, 5 nodes, etc.
Minimum number of storage nodes = 1 (primary) + N (replicas) + 1
For more information, see the section on replication.
StorageOS API username/password
The API grants full access to StorageOS functionality, therefore we recommend that the default administrative password of ‘storageos’ is reset to something unique and strong.
You can change the default parameters by encoding the apiUsername
and
apiPassword
values (in base64) into the storageos-api
secret.
To generate a unique password, a technique such as the following, which generates a pseudo-random 24 character string, may be used:
# Generate strong password
PASSWORD=$(cat -e /dev/urandom | tr -dc 'a-zA-Z0-9-!@#$%^&*()_+~' | fold -w 24 | head -n 1)
# Convert password to base64 representation for embedding in a K8S secret
BASE64PASSWORD=$(echo -n $PASSWORD | base64)
Note that the Kubernetes secret containing a strong password must be created before bootstrapping the cluster. Multiple installation procedures use this Secret to create a StorageOS account when the cluster first starts.
StorageOS Pod placement
StorageOS must run on all nodes that will contribute storage capacity to the cluster or that will host Pods which use StorageOS volumes. For production environments, it is recommended to avoid placing StorageOS Pods on Master nodes.
StorageOS is deployed with a DaemonSet controller, and therefore tolerates the standard unschedulable (:NoSchedule) action. If that is the only taint placed on master or cordoned nodes StorageOS pods might start on them (see the Kubernetes docs for more details). To avoid scheduling StorageOS pods on master nodes, you can add an arbitrary taint to them for which the StorageOS DaemonSet won’t have a toleration.
Dedicated instance groups
Cloud environments give users the ability to quickly scale the number of nodes in a cluster in response to their needs. Because of the ephemeral nature of the cloud, StorageOS recommends setting conservative downscaling policies.
For production clusters, it recommended to use dedicated instance groups for Stateful applications that allow the user to set different scaling policies and define StorageOS pools based on node selectors to collocate volumes.
Losing a few nodes at the same time could cause the loss of data even when volume replicas are being used.
Port blocking
StorageOS exposes ports to operate. It is recommended that the ports are not accessible from outside the scope of your cluster.
StorageOS in Docker EE
StorageOS does not support running on Swarm nodes nor on mixed (Kubernetes and Swarm) nodes. StorageOS volumes have to be provisioned and used from Kubernetes nodes.