Erlang cluster peer discovery on Kubernetes

Chanaka Fernando
11 min readMay 2, 2021

Modern software deployments specially micro service based applications use Kubernetes extensively in both production as well as staging environments. During Erlang application deployments in K8s, peer discovery is one of the challenges. Although there are clustering capabilities in Erlang, which are leveraged natively by distributed applications, running distributed Erlang application cluster on K8s requires additional effort, as we will discuss.

In order to form an Erlang cluster, there are few requirements to be satisfied.

First one is, each node in cluster must have same Erlang-cookie which is the shared secret used for authentication between cluster nodes. This can easily be fulfilled with predefined value for Erlang-cookie store in config-map or secrete resource and share among pods as read-only storage.

The next one is, each Erlang node in cluster should be aware of every other node name (node@host.domain) in the cluster and must be able to communicate with each other. If the Erlang epmd is operational within each pod (default configuration), all the node names and associated port numbers of epmd registered nodes (default configuration) within the pod can be discovered. But to access epmd service, the challenge is that, each pod must be able to discover, resolve FQDN (host.domain) and communicate with each other. In this article I will attempt to describe a straightforward solution for the aforementioned challenge.

In a Kubernetes cluster, service acts as a Service Discovery mechanism. When deploying a service, a DNS name is assigned to it. These DNS names can be used to communicate with the services and the upstream backing pods.

Kubernetes DNS service named as kube-dns, which has a static IP. This address of name-server is stored under /etc/resolv.conf on each pod and generally it should pickup from host /etc/resolv.conf. DNS naming rules resolve pod and service DNS to their corresponding cluster IPs. Kubernetes DNS support forward lookup (A records), port lookup (SRV records) and some other options.

When deploying regular service resource K8s allocates IP address (cluster Ip) for it. This is being used as an the single entry point for pods backed by the service,which enables it to act as a proxy for upstream pods. And also regular services are assigned a DNS A record which resolves to the cluster IP of the service. And also DNS SRV Records are created for named ports of the service which resolves to the port number and the domain name.

In case of headless service K8s does not allocate any IP address for it. Headless services are also assigned a DNS A record which resolves to a set of IPs corresponds to back-end pods managed by the service, which allows interact directly with the pods instead of proxy. And also when deploying of headless service, K8s will still create the DNS A records for individual upstream pods which enables to resolve headless service domain name to IP addresses of connected pods. Additionally DNS SRV records are also created for headless services, which resolves to set of upstream pod FQDNs (i.e. domain name of the pod of the form <pod name>.<service name>.<namespace>.<cluster base DNS>).

Therefore it is obvious that each pod can discover every other pod FQDN and can establish direct communication among them, when all the pods are backed by a K8s-headless-service.

The upstream pods can be deployed as a regular deployment or as a statefulset.

In regular deployments, pod names will be randomly generated and pod FQDNs are derived from its IP address and service name. I.e. A-B-C-D.<service name>.<namespace>.<cluster base DNS> where A-B-C-D is the IP address with ‘.’ converted to ‘-’.

However in statefulsets, pods will be allocated names derived from statefulset resource name, suffixed by sequential order number. And the pod FQDNs are derived from its pod name and service name. I.e. <Pod name>.<service name>.<namespace>.<cluster base DNS> where <Pod name> would be like <Staefulset resource name>-0.

Pod FQDN in headless services DNS SRV records are also in this pod FQDN form for each deployment type. Therefore host.domain part of the Erlang node name required to be in this form (i.e. pod FQDN) as DNS SRV query for front-ended headless-service is used to discover upstream pods.

As Statefulsets gives pods user-friendly names which makes debugging simpler, in this discussion Erlang cluster is created as statefulset backed by headless service to demonstrate peer discovery.

We will see how we can implement the solution discussed above with the Erlang application. Complete code related to the discussion is available in git hub repository mentioned in Resource section.

Functionality

Below depicts the code snippet which is executed (peer discovery) in time intervals, when the name of headless-service is configured during deployment instead of predefined host names. In this configuration, horizontal scaling is expected and the discovery of the pod names dynamically is required.

Fig-1.1 code snippet (peer discovery with k8s-headless-service)

This code snippet resolves a DNS record of the specified type (SRV) for the specified host (FQDN of the headless service path). On success, function returns a hostent() record with dns_data() elements in the AddressList field.

FQDN host names of pods backed by headless-service are derived from AddressList . Then, by calling of net_adm:world_list/2, all the node names and associated port numbers of the Erlang nodes that epmd registered at all the specified host names are taken (in order this to be success, epmd needs to be in operational) and then, evaluates ping(Node) on all those nodes. Returns the list of all nodes that are successfully pinged (Information about which nodes it is pinging is printed to std-out, if the verbosity level is set to verbose, The default is silent).

Peer discovery during horizontal scaling is supported in this configuration. Additional/ removed pods during scaling and temporary left/re-joined pods during restart will be detected on next immediate peer-discovery cycle.

Below depicts the code snippet which is executed(peer discovery) in time intervals, when the list of Erlang host names are known and pre-configured during deployment instead of K8s headless-service name configuration (But dns resolution for pre-configured host names is required). In this configuration, horizontal scaling with dynamic discovery of the pod names not expected.

Fig-1.2 code snippet (peer discovery with predefined FQDN hostnames)

This is the code snippet reads the FQDN host names of pods from system configuration which is pre-configured during deployment . Then, same as above scenario, node names and associated port numbers of the Erlang nodes that epmd registered at all the specified host names are taken and evaluates ping(Node) on all those nodes.

Peer discovery during horizontal scaling is not supported in this configuration as list of host names of of cluster is predefined in deployment stage. Temporary left/re-joined pods during restart will be detected on next immediate peer-discovery cycle.

Packaging

Application is packaged as a docker image with multi-stage build.

Fig-2.1 Dockerfile

Application is built in first stage with in the similar environment(same image). Then the first stage is used as the base for copying of built artefacts during final image creation in second stage.

ENTRYPOINT shell script contains the command to start application in foreground mode as depicted below (in this mode, Erlang logs will be printed in std-out)

#!/bin/sh# work dir=/opt/erlang_k8s_cluster/system/./bin/erlang_k8s_cluster-0.0.1 foreground

Running application

Deploy on docker engine with docker-compose

Peer discovery in fixed (predefined) number of Erlang nodes (without horizontal scaling support) configuration can be demonstrated with deploying of Erlang cluster with docker-compose.

Part of docker-compose file is depicted bellow (complete file can be found in git hub location mentioned at the end) where each Erlang cluster member is defined as a service.

version: '3'
networks:
app-tier:
driver: bridge
services:
erlang.k8s.node-1:
...
networks:
- app-tier
environment:
- ERLANG_NODENAME='erlang_k8s@erlang.k8s.node-1'
- CLUSTER_ERLANG_COOKIE='erlang.k8s.cluster'
- K8S_HEADLESS_SVC=[]
- WORLD_LIST=['erlang.k8s.node-1', 'erlang.k8s.node-2']
erlang.k8s.node-2:
...
networks:
- app-tier
environment:
- ERLANG_NODENAME='erlang_k8s@erlang.k8s.node-2'
- CLUSTER_ERLANG_COOKIE='erlang.k8s.cluster'
- K8S_HEADLESS_SVC=[]
- WORLD_LIST=['erlang.k8s.node-1', 'erlang.k8s.node-2']

User defined bridge type network is used as default network in docker do not support dns resolution for host names.

Application node name should be in FQDN format according to the configuration file vm.args.src (i.e host name should contain at-least single dot). As service names are used as host names in docker-compose, naming of those should be comply with FQDN naming requirement.

$ vi vm.args.src-name ${ERLANG_NODENAME}
-setcookie ${CLUSTER_ERLANG_COOKIE}
+K true
+A30

Required system configurations has been passed as environmental variables. Here ERLANG_NODENAME is defined as FQDN. CLUSTER_ERLANG_COOKIE which is used as node-cookie is same in all members. K8S_HEADLESS_SVC is defined as [] indicating peer discovery needs to be done with predefined host names(FQDN) which is defined in WORLD_LIST.

The output when application deployed with docker-compose (foreground) is depicted below.

$ docker-compose up
Fig-3.1 docker-compose output

As verbosity of net_adm:world_list/2 function is enabled (verbose), pinging result is printed to std-out. As indicated above, pinging is success from every node to each other nodes (successful peer discovery).

Deploy on K8s cluster with K8s-manifests

Peer discovery in Erlang cluster with dynamic number of nodes (with horizontal scaling support) configuration can be demonstrated with deploying Erlang pods as K8s-statefulset which is backed by K8s-headless-service.

Part of K8s-manifest file is depicted bellow (complete file can be found in git hub location mentioned at the end) where Erlang cluster is defined as K8s-statefulset.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: erlang-k8s-cluster
...
spec:
selector:
...
serviceName: "erlang-k8s-service"
replicas: 2
template:
...
spec:
terminationGracePeriodSeconds: 5
containers:
- name: erlang-k8s-node
...
env:
- name: ERLANG_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ERLANG_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: K8S_HEADLESS_SVC
value: '"erlang-k8s-service.$(ERLANG_POD_NAMESPACE).svc.cluster.local"'
- name: ERLANG_NODENAME
value: erlang-k8s@$(ERLANG_POD_NAME).erlang-k8s-service.$(ERLANG_POD_NAMESPACE).svc.cluster.local
- name: CLUSTER_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: erlang-k8s-secret
key: ERLANG_COOKIE
- name: WORLD_LIST
value: "[]"
restartPolicy: Always

Required system configurations has been passed as environmental variables. Here ERLANG_NODENAME is defined as FQDN which is derived from ERLANG_POD_NAME and ERLANG_POD_NAMESPACE retrieved from K8s downward API. CLUSTER_ERLANG_COOKIE which is node-cookie is read from defined k8s-secret. K8S_HEADLESS_SVC is derived from headless-service name and ERLANG_POD_NAMESPACE. WORLD_LIST is assigned [], as peer discovery is dynamic and based on K8s-headless-service.

The output when application deployed on K8s cluster is depicted below.

# Deploying of k8s manifest 
$ kubectl apply -f erlang-k8s-cluster.yaml
secret/erlang-k8s-secret created
service/erlang-k8s-service created
statefulset.apps/erlang-k8s-cluster created

Deployment can be verified as below

# Get Service information
$ kubectl get services -l app=erlang -o wide
Fig-3.2 output ( headless services)
# Get Statefulset information
$ kubectl get statefulsets -l app=erlang -o wide
Fig-3.3 output (statefulset)
# Get pods information
$ kubectl get pods -l app=erlang -o wide
Fig-3.4 output (pods)

Above outputs indicates K8s deployment is success. Then we can attached to the shell of already running pod and execute nslookup for headless service FQDN.

$ kubectl exec -it erlang-k8s-cluster-0 -- sh

After connecting to the bash shell following command can be executed to get dns ‘A’ record and ‘SRV’ records separately.

# 'A' dns query type of service FQDN
$ nslookup -q=A erlang-k8s-service.default.svc.cluster.local
Fig-3.5 output (Service dns A records)

As indicate is ANSWER SECTION service FQDN has been resolved to two different IPs, which are actual IPs of upstream pods of headless service.

# 'SRV' dns query type of service FQDN
$ nslookup -q=SRV erlang-k8s-service.default.svc.cluster.local
Fig-3.6 output (Service dns SRV records)

As indicate is ANSWER SECTION service FQDN has been resolved to two different host names, which are actual FQDN host names of upstream pods of headless service.

Also IP address of each upstream pod FQDN can be resolved as below

$ nslookup -q=A erlang-k8s-cluster-0.erlang-k8s-service.default.svc.cluster.local
Fig-3.7 output (pod-0 dns A record)
$ nslookup -q=A erlang-k8s-cluster-1.erlang-k8s-service.default.svc.cluster.local
Fig-3.8 output (pod-1 dns A record)

After that Erlang log which is printed in std-out can be checked to verify peer discovery.

$ kubectl logs -f erlang-k8s-cluster-0
Fig-3.9 output when with 2 pods (pod erlang log)

According to above output, the Erlang log on the pod indicates that particular Erlang node has identified peer node and connected successfully.

Then we can try horizontal scaling.

$ kubectl scale statefulsets erlang-k8s-cluster --replicas=3
statefulset.apps/erlang-k8s-cluster scaled

After that Erlang log which is printed in std-out can be checked again on same pod to verify whether newly join peer has been discovered.

$ kubectl logs -f erlang-k8s-cluster-0
Fig-3.10 output when with 3 pods (pod erlang log)

According to above output, the Erlang log on the pod indicates that particular Erlang node has identified newly joined peer node and connected successfully.

Like wise peer discovery during horizontal scaling can be demonstrated by increasing (scale-out) or decreasing (scale-in) replicas.

Finally deployment can be clear out.

$ kubectl delete -f erlang-k8s-cluster.yaml
secret "erlang-k8s-secret" deleted
service "erlang-k8s-service" deleted
statefulset.apps "erlang-k8s-cluster" deleted

When horizontal scaling is not expected. It is also possible to deploy similar configuration described in previous section, on K8s cluster, by configuring headless-service name as ‘[]’ (although it is front-ended) and configure generated host names by statefulset as predefined host names during deployment. As only few simple changes are required in manifest file to demonstrate that, the scenario should be obvious.

Conclusion

As described, One of the straight forward method for Erlang node peer discovery in K8s cluster is deploying Erlang node cluster as K8s-statefulset backed by K8s-headless-service. Although here we discussed with a single statefulset, multiple of them can also be backed by a single headless services for peer discovery purpose.

And also it is required to mentioned that, although Statefulset is the simplest solution for Erlang cluster as discussed, using a Deployment can also have advantages. For example during rolling updates, in a Deployment each pod will temporarily scale up to two, and then back down again to one at the end. Therefore one instance of the pod running normally can ensure no downtime during rolling updates. But in a StatefulSet, each pod will stop and start again. Therefore it is required to have least two pods running to ensure no downtime during rolling updates.

Resources :

Git Hub repository

https://github.com/myErlangProjects/erlang_k8s_cluster

--

--