MongooseIM 6.1: Handle more traffic, consume less resources

MongooseIM is a highly customisable instant messaging backend, that can handle millions of messages per minute, exchanged between millions of users from thousands of dynamically configurable XMPP domains. With the new release 6.1.0 it becomes even more cost-efficient, flexible and robust thanks to the new arm64 Docker containers and the C2S process rework.

Arm64 Docker containers

Modern applications are often deployed in Docker containers. This solution simplifies deployment to cloud-based environments, such as Amazon Web Services (AWS) and Google Cloud. We believe this is a great choice for MongooseIM, and we also support Kubernetes by providing Helm Charts. Docker images are independent of the host operating system, but they need to be built for specific processor architectures. Amd64 (x86-64) CPUs have dominated the market for a long time, but recently arm64 (AArch64) has been taking over. Notable examples include the Apple Silicon and AWS Graviton processors. We made the decision to start publishing ARM-compatible Docker images with our latest 6.1.0 release.

To ensure top performance, we have been load-testing MongooseIM for many years using our own tools, such as amoc and amoc-arsenal-xmpp.

When we tested the latest Docker image on both amd64 and arm64 AWS EC2 instances, the results turned out to be much better than before – especially for arm64. The tested MongooseIM cluster consisted of two nodes, which is less than the recommended production size of three nodes. But the goal was to determine the maximum capability of a simple installation. Various compute-optimized instances were tested – including the 5th, 6th and 7th generations, all in the xlarge size. PostgreSQL (db.m6g.xlarge) was used for persistent storage, and three Amoc nodes (m6g.xlarge) were used for load generation. The three best-performing instance types were c6id (Intel Xeon Scalable, amd64), c6gd (AWS Graviton2, arm64) and c7g (AWS Graviton3, arm64).

The two most important test scenarios were:

One-to-one messaging, where each user chats with their contacts.
Multi-user chat, where each user sends messages to chat rooms with 5 participants each.

Several extensions were enabled to resemble a real-life use case. The most important are:

Message Archive Management (MAM) – message archive, allowing users to query for incoming and outgoing messages.
Inbox – view of recent and unread messages.
Multi-User Chat (MUC) Light – multi-user chat implementation optimized for performance.

The first two extensions perform database write operations for each message, and disabling them would improve performance.

The results are summarized in the table below:

Node instance type (size: xlarge)	c6id	c6gd	c7g
One-to-one messages per minute per node	240k	240k	300k
Multi-user chat messages per minute per node	120k sent 600k received	120k sent 600k received	150k sent 750k received
On-demand AWS instance pricing per node per hour (USD)	0.2016	0.1536	0.1445
Instance cost per billion delivered one-to-one chat messages (USD)	14.00	10.67	8.03
Instance cost per billion delivered multi-user chat messages (USD)	5.60	4.27	3.21

For each instance, the table shows the highest possible message rates achievable without performance degradation. The load was scaled up for the c7g instances thanks to their better performance, making it possible to handle 600k one-to-one messages per minute in the whole cluster, which is 300k messages per minute per node. Should you need more, you can scale horizontally or vertically, and further tests showed almost a linear increase of performance – of course there are limits (especially for the cluster size), but they are high. Maximum message rates for MUC Light were different because each message was routed to five recipients, making it possible to send up to 300k messages per minute, but deliver 1.5 million.

The results allowed calculating the costs of MongooseIM instances per 1 billion delivered messages, which are presented in the table above. Of course it might be difficult to reach these numbers in production environments because of the necessary margin for handling bursts of traffic, but during heavy load you can get close to these numbers. The database cost was actually higher than the cost of MongooseIM instances themselves.

C2S process rework

We have completely reimplemented the handling of C2S (client-to-server) connections. Although the changes are mostly internal, you can benefit from them, even if you are not interested in the implementation details.

The first change is about accepting incoming connections – instead of custom listener processes, the Ranch 2.1 library is now used. This introduces some new options, e.g. max_connections and reuse_port.

Prior to version 6.1.0, each open C2S connection was handled by two Erlang processes – the receiver process was responsible for XML parsing, while the C2S process would handle the decoded XML elements. They are now integrated into one, which means that the footprint of each session is smaller, and there is less internal messaging.

C2S State Machine: Separation of concerns

The core XMPP operations are defined in RFC 6120, and we have reimplemented them from scratch in the new mongoose_c2s module. The most important benefit of this change from the user perspective is the vastly improved separation of concerns, making feature development much easier. A simplified version of the C2S state machine diagram is presented below. Error handling is omitted for simplicity. The “wait for session” state is optional, and you can disable it with the backwards_compatible_session configuration option.

A similar diagram for version 6.0 would be much more complicated, because the former implementation had parts of multiple extensions scattered around its code:

Functionality	Described in	Moved out to
Stream resumption	XEP-0198 Stream Management	mod_stream_management
AMP event triggers	XEP-0079 Advanced Message Processing	mod_amp
Stanza buffering for CSI	XEP-0352 Client State Indication	mod_csi
Roster subscription handling	RFC 6121 Instant Messaging and Presence	mod_roster
Presence tracking	RFC 6121 Instant Messaging and Presence	mod_presence
Broadcasting PEP messages	XEP-0163 Personal Eventing Protocol	mod_pubsub
Handling and using privacy lists	XEP-0016 Privacy Lists	mod_privacy
Handling and using blocking commands	XEP-0191 Blocking Command	mod_blocking

It is important to note that mod_presence is the only new module in the list. Others have existed before, but parts of their code were in the C2S module. By disabling unnecessary extensions, you can gain performance. For example, by omitting [mod_presence] from your configuration file you can skip all the server-side presence handling. Our load tests have shown that this could significantly reduce the total time needed to establish a connection. Moreover, disabling extensions is now 100% reliable and guarantees that no unwanted code would be executed.

Easier extension development

If you are interested in developing your custom extensions, it is now easier than ever, because mongoose_c2s uses the new C2S-related hooks and handlers and several new features of the gen_statem behaviour. C2S Hooks can be divided into the following categories, depending on the events that trigger them:

Trigger	Hooks
User session opening	`user_open_session`
User sends an XML element	`user_send_packet, user_send_xmlel, user_send_message, user_send_presence, user_send_iq`
User receives an XML element	`user_receive_packet, user_receive_xmlel, user_receive_message, user_receive_presence, user_receive_iq, xmpp_presend_element`
User session closing	`user_stop_request, user_socket_closed, user_socket_error, reroute_unacked_messages`
`mongoose_c2s:call/3 mongoose_c2s:cast/3`	`foreign_event`

Most of the hooks are triggered by XMPP traffic. The only exception is foreign_event, which can be triggered by modules on demand, making it possible to execute code in context of a specific user’s C2S process.

Modules add handlers to selected hooks. Such a handler performs module-specific actions and returns an accumulator, which can contain special options, allowing the module to:

Store module-specific data using state_mod, or replace the whole C2S state data with c2s_data.
Transition to a new state with c2s_state.
Perform arbitrary gen_statem transition actions with actions.
Stop the state machine gracefully (stop) or forcefully (hard_stop).
Deliver XML elements to the user with (route, flush) or without triggering hooks (socket_send).

Example

Let’s take a look at the handlers of the new mod_presence module. For user_send_presence and user_receive_presence hooks, it updates the module-specific state (state_mod) storing the presence state. The handler for foreign_event is more complicated, because it handles the following events:

Event	Handler logic	Trigger
`{mod_presence, get_presence \| get_subscribed}`	Get user presence information / subscribed users	`mongoose_c2s:call(Pid, mod_presence, get_presence \| get_subscribed)`
`{mod_presence, {set_presence, Presence}}`	Set user presence information	`mongoose_c2s:cast(Pid, mod_presence, {set_presence, Presence})`
`{mod_roster, RosterItem}`	Update roster subscription state	`mongoose_c2s:cast(Pid, mod_roster, RosterItem)`

The example shows how the coupling between extension modules remains loose and modules don’t call each other’s code directly.

The benefits of gen_statem

The following new gen_statem features are used in mongoose_c2s:

Arbitrary term state – with the state_event_function callback mode it is possible to use tuples for state names. An example is {wait_for_sasl_response, cyrsasl:sasl_state(), retries()}, which has the state of the SASL authentication process and the number of authentication retries left encoded in the state tuple. Apart from the states shown in the diagram above, modules can introduce their own external states – they have the format {external, StateName}. An example is mod_stream_management, which causes transition to the {external, resume} state when a session is closed.

Multiple callback modules – to handle an external state, the callback module has to be changed, e.g. mod_stream_management uses the {push_callback_module, ?MODULE} transition action to provide its own handle_event function for the {external, resume} state.

State timeouts – for all states before wait_for_session, the session terminates after the configurable c2s_state_timeout. The timeout tuple itself is {state_timeout, Timeout, state_timeout_termination}.

Named timeouts – modules use these to trigger specific actions, e.g. mod_ping uses several timeouts to schedule ping requests and to wait for responses. The timeout tuple has the format {{timeout, ping | ping_timeout | send_ping}, Interval, fun ping_c2s_handler/2}. This feature is also used for traffic shaping to pause the state machine if the traffic volume exceeds the limit.

Self-generated events – this feature is used very often, for example when incoming XML data is parsed, an event {next_event, internal, XmlElement} is generated for each parsed XML element. The route and flush options of the c2s accumulator generate internal events as well.

Summary

MongooseIM 6.1.0 is full of improvements on many levels – both on the outside, like the arm64 Docker images, and deep inside, like the separation of concerns in mongoose_c2s. What is common for all of them is that we have load-tested them extensively, making sure that our new messaging server delivers what it promises and the performance is better than ever. There are no unpleasant surprises hidden underneath. After all, it is open source, and you are welcome to download, deploy, use and extend it free of charge. However, should you have a special use case, high performance requirements or want to reduce costs. Don’t hesitate to contact us, and we will be able to help you deploy, load test and maintain your messaging solution.

Pawel Chrzaszcz

Paweł started his journey with Erlang back in 2007. After graduating from the AGH University of Science and Technology in Kraków, Poland, he worked as an Erlang developer at Klarna AB, Stockholm, Sweden, and was teaching computer science at a university for seven years. He has a PhD in Computer Science, and his dissertation was about automatic creation of semantic dictionaries. Since 2012 he has been working at Erlang Solutions, and for the last few years he has been a Software Architect and Team Lead of the MongooseIM project. Paweł enjoys mountain biking and sometimes even wins a race.