Erlang and Elixir distribution without epmd
- Erlang Solutions Team
- 26th Oct 2016
- 16 min of reading time
When you deploy a distributed Erlang application, you’ll most likely have to answer the question “which ports need to be open in the firewall?”. Unless configured otherwise, the answer is:
That answer usually doesn’t translate neatly into firewall rules. The usual solution to that is to use the environment variables inet_dist_listen_min
and inet_dist_listen_max
(described in the kernel
documentation) to limit the distribution ports to a small port range, or even a single port.
But if we’ve limited the Erlang node to a single port, do we really need a port mapper? There are a few potential disadvantages with running epmd:
So let’s explore how epmd works, and what we can do to run an Erlang cluster without it.
Let’s have a look at the code! epmd is started in the function start_epmd
in erlexec.c. In fact, epmd is started unconditionally every time a distributed node is started. If an epmd instance is already running, the new epmd will fail to listen on port 4369, and thus exits silently.
In fact, that’s what will happen even if the existing epmd instance was started by another user. Any epmd instance is happy to serve Erlang nodes started by any user, so usually this doesn’t cause any problem.
Anyone! By default, epmd listens on all available interfaces, and responds to queries about what nodes are present, and what ports they are listening on. Hearing this tends to make sysadmins slightly nervous.
You can change that by manually starting epmd and specifying the -address
option, or by setting the ERL_EPMD_ADDRESS
environment variable before epmd gets started. This is described in the epmd documentation. That requires that the place where you do this is actually the first place where epmd gets started — otherwise, the existing epmd instance will keep running unperturbed.
Clearly, epmd is just the middleman. Can we cut out the middleman?
We could make every Erlang node listen on a well-known port — perhaps use the port reserved for epmd, 4369, if we’re going to get rid of epmd. But that means that we can only run one Erlang node on each host (of course, for some use cases that might be enough).
So let’s specify some other port number. I mentioned inet_dist_listen_min
and inet_dist_listen_max
earlier. Those two variables define a port range, but if we set them to the same value, we narrow down the “range” to a single port:
erl -sname foo \ -kernel inet_dist_listen_min 4370 \ inet_dist_listen_max 4370
That’s all well and good, but we’d also need a way to tell other nodes not to bother asking epmd about the port number, and just use this number instead. And if we have several nodes on the same host, we’d need some kind of configuration to specify the different port numbers for those nodes.
In Erlang/OTP 19.0, there are two new command line options:
-start_epmd false
, Erlang won’t try to start epmd when starting a distributed node.-epmd_module foo
lets you specify a different module to use for node name registration and lookup, instead of the default erl_epmd
.Those are the building blocks we need!
I want to use a state-less scheme for this: since the connecting node already knows the name of the node it wants to connect to, I use that as the source of the port number. I pick a “base” port number — why not 4370, one port higher than epmd. Then I extract the number at the end of the “node” part of the node name, such that myapp3@foo.example.com
becomes 3
. Then I add that number to the base port number. As a result, I know a priori that the nodemyapp3@foo.example.com
is listening on port 4373. If there is no number in the node name, I treat that as a zero. This means that the nodes myapp3
and myotherapp3
couldn’t run on the same host, but I’m ready to live with that. (Thanks to Luca Favatella for perfecting this idea.)
Let’s write a little module for that:
-module(epmdless). -export([dist_port/1]). %% Return the port number to be used by a certain node. dist_port(Name) when is_atom(Name) -> dist_port(atom_to_list(Name)); dist_port(Name) when is_list(Name) -> %% Figure out the base port. If not specified using the %% inet_dist_base_port kernel environment variable, default to %% 4370, one above the epmd port. BasePort = application:get_env(kernel, inet_dist_base_port, 4370), %% Now, figure out our "offset" on top of the base port. The %% offset is the integer just to the left of the @ sign in our node %% name. If there is no such number, the offset is 0. %% %% Also handle the case when no hostname was specified. NodeName = re:replace(Name, "@.*$", ""), Offset = case re:run(NodeName, "[0-9]+$", [{capture, first, list}]) of nomatch -> 0; {match, [OffsetAsString]} -> list_to_integer(OffsetAsString) end, BasePort + Offset.
And a module to use as the -epmd_module
. One slight complication here is that 19.0 expects the module to exportregister_node/2
, while from 19.1 onwards it’s register_node/3
. Let’s include both functions to be sure:
-module(epmdless_epmd_client). %% epmd_module callbacks -export([start_link/0, register_node/2, register_node/3, port_please/2, names/1]). %% The supervisor module erl_distribution tries to add us as a child %% process. We don't need a child process, so return 'ignore'. start_link() -> ignore. register_node(_Name, _Port) -> %% This is where we would connect to epmd and tell it which port %% we're listening on, but since we're epmd-less, we don't do that. %% Need to return a "creation" number between 1 and 3. Creation = rand:uniform(3), {ok, Creation}. %% As of Erlang/OTP 19.1, register_node/3 is used instead of %% register_node/2, passing along the address family, 'inet_tcp' or %% 'inet6_tcp'. This makes no difference for our purposes. register_node(Name, Port, _Family) -> register_node(Name, Port). port_please(Name, _IP) -> Port = epmdless:dist_port(Name), %% The distribution protocol version number has been 5 ever since %% Erlang/OTP R6. Version = 5, {port, Port, Version}. names(_Hostname) -> %% Since we don't have epmd, we don't really know what other nodes %% there are. {error, address}.
As you can see, most things are essentially stubbed out:
start_link/0
is invoked as this module is added as a child of the erl_distribution
supervisor. We don’t actually need to start a process here, so we just return ignore
.register_node
function would normally connect to epmd and tell it what port number we use. In return, epmd would return a “creation” number. The “creation” number is an integer between 1 and 3. epmd keeps track of the creation number for each node name, and increments it whenever a node with a certain name reconnects. That means that it’s possible to distinguish e.g. pids from a previous “life” of a certain node.Since we don’t have epmd, we don’t have the benefit of it tracking the life span of the nodes. Let’s return a random number here, which has a 2 in 3 chance of being different from the previous “creation” number.port_please/2
gets the IP address of the remote host in order to connect to its epmd, but we don’t care; we use our algorithm to figure out the port number.We also need to return a distribution protocol version number. It has been 5 ever since Erlang/OTP R6 (see the Distribution Protocol documentation), so that’s simple.names/1
is called to list the Erlang nodes on a certain host. We have no way of knowing that, so let’s pretend that we couldn’t connect.So far, so good — but we need a way to make sure that we’re listening on the right port. The best way I could think of is to write a new distribution protocol module, one that just sets the port number and then lets the real protocol module do its job:
-module(epmdless_dist). -export([listen/1, select/1, accept/1, accept_connection/5, setup/5, close/1, childspecs/0]). listen(Name) -> %% Here we figure out what port we want to listen on. Port = epmdless:dist_port(Name), %% Set both "min" and "max" variables, to force the port number to %% this one. ok = application:set_env(kernel, inet_dist_listen_min, Port), ok = application:set_env(kernel, inet_dist_listen_max, Port), %% Finally run the real function! inet_tcp_dist:listen(Name). select(Node) -> inet_tcp_dist:select(Node). accept(Listen) -> inet_tcp_dist:accept(Listen). accept_connection(AcceptPid, Socket, MyNode, Allowed, SetupTime) -> inet_tcp_dist:accept_connection(AcceptPid, Socket, MyNode, Allowed, SetupTime). setup(Node, Type, MyNode, LongOrShortNames, SetupTime) -> inet_tcp_dist:setup(Node, Type, MyNode, LongOrShortNames, SetupTime). close(Listen) -> inet_tcp_dist:close(Listen). childspecs() -> inet_tcp_dist:childspecs().
Mostly stubs here; it’s just the listen/1
function that sets the inet_dist_listen_min
and inet_dist_listen_max
variables according to our node name, before passing control to the real module, inet_tcp_dist
.
(Note that while inet_tcp_dist
is the default module, it only provides unencrypted connections over IPv4. If you want to use IPv6, you would use inet6_tcp_dist
, and if you want to use Erlang distribution over TLS, that would beinet_tls_dist
or inet6_tls_dist
. Adding that flexibility is left as an exercise for the reader.)
And we’re ready! Now we can start two nodes, foo1
and foo2
, and have them connect to each other:
erl -proto_dist epmdless -start_epmd false -epmd_module epmdless_epmd_client -sname foo1
erl -proto_dist epmdless -start_epmd false -epmd_module epmdless_epmd_client -sname foo2
System working?
(foo2@poki-sona-sin)1> net_adm:ping('foo1@poki-sona-sin'). pong
Seems to be!
Of course, since Erlang and Elixir run on the same virtual machine, there is nothing stopping us from doing all of this in Elixir instead.
In Elixir, we can put all the code in a single file, and the compiler will compile it into the different modules we require:
# A module containing the function that determines the port number # based on a node name. defmodule Epmdless do def dist_port(name) when is_atom(name) do dist_port Atom.to_string name end def dist_port(name) when is_list(name) do dist_port List.to_string name end def dist_port(name) when is_binary(name) do # Figure out the base port. If not specified using the # inet_dist_base_port kernel environment variable, default to # 4370, one above the epmd port. base_port = :application.get_env :kernel, :inet_dist_base_port, 4370 # Now, figure out our "offset" on top of the base port. The # offset is the integer just to the left of the @ sign in our node # name. If there is no such number, the offset is 0. # # Also handle the case when no hostname was specified. node_name = Regex.replace ~r/@.*$/, name, "" offset = case Regex.run ~r/[0-9]+$/, node_name do nil -> 0 [offset_as_string] -> String.to_integer offset_as_string end base_port + offset end end defmodule Epmdless_dist do def listen(name) do # Here we figure out what port we want to listen on. port = Epmdless.dist_port name # Set both "min" and "max" variables, to force the port number to # this one. :ok = :application.set_env :kernel, :inet_dist_listen_min, port :ok = :application.set_env :kernel, :inet_dist_listen_max, port # Finally run the real function! :inet_tcp_dist.listen name end def select(node) do :inet_tcp_dist.select node end def accept(listen) do :inet_tcp_dist.accept listen end def accept_connection(accept_pid, socket, my_node, allowed, setup_time) do :inet_tcp_dist.accept_connection accept_pid, socket, my_node, allowed, setup_time end def setup(node, type, my_node, long_or_short_names, setup_time) do :inet_tcp_dist.setup node, type, my_node, long_or_short_names, setup_time end def close(listen) do :inet_tcp_dist.close listen end def childspecs do :inet_tcp_dist.childspecs end end defmodule Epmdless_epmd_client do # erl_distribution wants us to start a worker process. We don't # need one, though. def start_link do :ignore end # As of Erlang/OTP 19.1, register_node/3 is used instead of # register_node/2, passing along the address family, 'inet_tcp' or # 'inet6_tcp'. This makes no difference for our purposes. def register_node(name, port, _family) do register_node(name, port) end def register_node(_name, _port) do # This is where we would connect to epmd and tell it which port # we're listening on, but since we're epmd-less, we don't do that. # Need to return a "creation" number between 1 and 3. creation = :rand.uniform 3 {:ok, creation} end def port_please(name, _ip) do port = Epmdless.dist_port name # The distribution protocol version number has been 5 ever since # Erlang/OTP R6. version = 5 {:port, port, version} end def names(_hostname) do # Since we don't have epmd, we don't really know what other nodes # there are. {:error, :address} end end
When starting Elixir, we need to pass some of the parameters with --erl
in order for them to make it through:
iex --erl "-proto_dist Elixir.Epmdless -start_epmd false -epmd_module Elixir.Epmdless_epmd_client" --sname foo3
Let’s try to ping the two Erlang nodes we started earlier:
iex(foo3@poki-sona-sin)1> Node.ping :"foo1@poki-sona-sin" :pong iex(foo3@poki-sona-sin)2> Node.ping :"foo2@poki-sona-sin" :pong
All connected, and no epmd in sight!
This is just one possible scheme for Erlang distribution without epmd; I’m sure you can come up with something else that fits your requirements better. I hope the example code above proves useful as a guide!
Discover the big brands reaping significant benefits by using Erlang in production.
How do you choose the right programming language for a project? Here are some great use cases.
Erlang is a programming language designed to offer concurrency and fault-tolerance, making it perfect for the needs of modern computing. Talk to us about how you can handle more users, safer, faster and with less physical infrastructure demands. Find out how our experts can help you.