Node name different from Hostname


StorageOS nodes can’t join the cluster showing the following log entries.

time="2018-09-24T13:47:02Z" level=error msg="failed to start api" error="error verifying UUID: UUID aed3275f-846b-1f75-43a1-adbfec8bf974 has already been registered and has hostname 'debian-4', not 'node4'" module=command


The StorageOS registration process to start the cluster uses the hostname of the node where StorageOS container is running, provided by DockerCE. However, StorageOS verifies the network hostname of the OS as a prestart check to make sure it can communicate with other nodes. If those 2 names don’t match, StorageOS will remain unable to start.


Make sure the hostnames match with the DockerCE advertised names. If you have changed the hostname of your nodes, make sure that you restart the nodes to apply the change.

One node clusters


StorageOS nodes have started creating multiple clusters of one node, rather than one cluster of many nodes.

[email protected]:~# storageos -H node1 node ls
NAME                ADDRESS             HEALTH                   SCHEDULER           VOLUMES             TOTAL
node1             Healthy About a minute   true                M: 0, R: 0          8.699GiB
[email protected]:~# storageos -H node2 node ls 
NAME                ADDRESS             HEALTH                   SCHEDULER           VOLUMES             TOTAL
node2             Healthy About a minute   true                M: 0, R: 0          8.699GiB
[email protected]:~# storageos -H node3 node ls 
NAME                ADDRESS             HEALTH                   SCHEDULER           VOLUMES             TOTAL
node3             Healthy About a minute   true                M: 0, R: 0          8.699GiB
[email protected]:~# storageos -H node4 node ls 
NAME                ADDRESS             HEALTH                   SCHEDULER           VOLUMES             TOTAL
node4             Healthy About a minute   true                M: 0, R: 0          8.699GiB


The JOIN variable has been misconfigured. One common mistake is to set the variable to localhost or set to the value of the ADVERTISE_IP.

Installations with Helm might cause this behaviour unless the JOIN parameter is explicitly defined.

StorageOS uses the JOIN variable to discover other nodes in the cluster during the node bootstrapping process. It must be set to one or more active nodes.

You don’t actually need to specify all the nodes. Once a new StorageOS node can connect to a member of the cluster the gossip protocol discovers the whole list of members. For high availability during the bootstrap process, it is recommended to set up as many as possible, so if one node is unavailable at the bootstrap process the next in the list will be queried.


Define the JOIN variable according to the discovery documentation.

Peer discovery - Networking


StorageOS nodes can’t join the cluster showing the following logs after one minute of container uptime.

time="2018-09-24T13:40:20Z" level=info msg="not first cluster node, joining first node" action=create address= category=etcd host=node3 module=cp target=
time="2018-09-24T13:40:20Z" level=error msg="could not retrieve cluster config from api" status_code=503
time="2018-09-24T13:40:20Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint=",,," error="503 Service Unavailable" module=cp
time="2018-09-24T13:40:20Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp


StorageOS uses a gossip protocol to discover the nodes in the cluster. When StorageOS starts, one or more nodes can be referenced so new nodes can query existing ones for the list of members. This error indicates that the node can’t connect to any of the nodes in the known list. The known list is defined in the JOIN variable.


It is likely that ports are block by a firewall.

SSH into one of your nodes and check connectivity to the rest of the nodes.

# Successfull execution:
[[email protected] ~]# nc -zv node04 5705
Ncat: Version 7.50 (  )
Ncat: Connected to
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

StorageOS exposes network diagnostics in its API, viewable from the CLI. To use this feature, the CLI must query the API of a running node. The diagnostics show information from all known cluster members. If all the ports are blocked during the first bootstrap of the cluster, the diagnostics won’t show any data as nodes couldn’t register.

StorageOS networks diagnostics are available for storageos-rc5 and storageos-cli-rc3 and above.

# Example:
[email protected]:~# storageos cluster connectivity
node4   node2.nats  1.949275ms   OK
node4   node3.api  3.070574ms   OK
node4   node3.nats  2.989238ms   OK
node4   node2.directfs  2.925707ms   OK
node4   node3.etcd  2.854726ms   OK
node4   node3.directfs  2.833371ms   OK
node4   node1.api  2.714467ms   OK
node4   node1.nats  2.613752ms   OK
node4   node1.etcd  2.594159ms   OK
node4   node1.directfs  2.601834ms   OK
node4   node2.api  2.598236ms   OK
node4   node2.etcd  16.650625ms  OK
node3   node4.nats  1.304126ms   OK
node3   node4.api  1.515218ms   OK
node3   node2.directfs  1.359827ms   OK
node3   node1.api  1.185535ms   OK
node3   node4.directfs  1.379765ms   OK
node3   node1.etcd  1.221176ms   OK
node3   node1.nats  1.330122ms   OK
node3   node2.api  1.238541ms   OK
node3   node1.directfs  1.413574ms   OK
node3   node2.etcd  1.214273ms   OK
node3   node2.nats  1.321145ms   OK
node1   node4.directfs  1.140797ms   OK
node1   node3.api  1.089252ms   OK
node1   node4.api  1.178439ms   OK
node1   node4.nats  1.176648ms   OK
node1   node2.directfs  1.529612ms   OK
node1   node2.etcd  1.165681ms   OK
node1   node2.api  1.29602ms    OK
node1   node2.nats  1.267454ms   OK
node1   node3.nats  1.485657ms   OK
node1   node3.etcd  1.469429ms   OK
node1   node3.directfs  1.503015ms   OK
node2   node4.directfs  1.484ms      OK
node2   node1.directfs  1.275304ms   OK
node2   node4.nats  1.261422ms   OK
node2   node4.api  1.465532ms   OK
node2   node3.api  1.252768ms   OK
node2   node3.nats  1.212332ms   OK
node2   node3.directfs  1.192792ms   OK
node2   node3.etcd  1.270076ms   OK
node2   node1.etcd  1.218522ms   OK
node2   node1.api  1.363071ms   OK
node2   node1.nats  1.349383ms   OK


Open ports following the prerequisites page.

LIO Init:Error


StorageOS init container failure log.

~# docker logs enable-lio
Checking configfs
configfs mounted on sys/kernel/config
Module target_core_mod is not running
executing modprobe -b target_core_mod
Module tcm_loop is not running
executing modprobe -b tcm_loop
modprobe: FATAL: Module tcm_loop not found.


This error indicates that one or more kernel modules cannot be loaded because they are not installed on the system.


Install the appropriate kernel modules (usually found in the linux-image-extra-$(uname -r) package of your distribution) on your nodes following this prerequisites page and delete StorageOS pods, causing the DaemonSet to create the pods again.

The logs of the container indicate which kernel modules couldn’t be loaded or that they are not properly configured:

LIO not enabled


StorageOS node can’t start and shows the following log entries.

time="2018-09-24T14:34:40Z" level=error msg="liocheck returned error" category=liocheck error="exit status 1" module=dataplane stderr="Sysfs root '/sys/kernel/config/target' is missing, is kernel configfs present and target_core_mod loaded? category=fslio level=warn\nRuntime error checking stage 'target_core_mod': SysFs root missing category=fslio level=warn\nliocheck: FAIL (lio_capable_system() returns failure) category=fslio level=fatal\n" stdout=
time="2018-09-24T14:34:40Z" level=error msg="failed to start dataplane services" error="system dependency check failed: exit status 1" module=command


This indicates that one or more kernel modules required for StorageOS are not loaded.


The following kernel modules must be enabled in the host.

lsmod  | egrep "^tcm_loop|^target_core_mod|^target_core_file|^configfs"


Install the required kernel modules (usually found in the linux-image-extra-$(uname -r) package of your distribution) on your nodes following this prerequisites page and restart the container.