Necessary Changes In AMQP

AMQP is a fast binary protocol!

If you've looked a little at AMQP, you'll see that it's a binary protocol. This means that methods like Queue.Create are defined as binary frames in which every field sits in a well-defined place, encoded carefully to make the best use of space. Binary encoding of numbers is much more compact than string encoding. "123" needs one octet in binary and three as a string. Because AMQP is a binary protocol, it is very fast to parse. Strings are safe, since there is no text parsing to do. AMQP's binary encoding means AMQP software is not at risk of buffer overflow attacks. And it's easy to generate the encoding/decoding routines, since AMQP is defined as easy-to-parse XML.

Overall, AMQP's binary encoding is a big win!

Or not. I think the evidence shows that AMQP's binary encoding was a mistake (mine, originally). Let's look at the advantages, and costs, of this approach, and let's deconstruct the basic assumptions it was based on. Finally, let's compare this with an alternative approach based on what I believe are more accurate assumptions.

Advantages of binary encoding:

  • It is more compact.
  • It is faster to parse than a text format.
  • It is safer to parse strings.
  • The codecs can be fully generated.
  • It is easy to process in silicon.

Costs of binary encoding:

  • You need codecs in the first place.
  • It creates endless incompatible versions of AMQP.
  • It is more complex to understand and use than string encoding.
  • There is a lot of emphasis on data types.
  • Even the simplest client API is significantly complex.

Now, a fast, compact wire-level encoding is surely worth the hassle. After all - so went the argument - AMQP is designed for speed, and it's enterprise technology, and complex APIs are not a big problem. It's true that text-based protocols like HTTP can be played with using a TELNET client but in reality no-one writes HTTP clients, they use existing libraries.

So the decision was that the costs of binary encoding were a necessary price for a fast, reliable protocol. Our view was that we could not hope to achieve the necessary performance over a text-encoded protocol like HTTP.

The main assumption underlying AMQP's encoding was that it is necessary for performance reasons (speed of parsing, compactness of data). If I can show that this assumption is wrong, I have removed the main justification for binary encoding.

A really fast protocol

Here is a pop quiz to test your knowledge of protocols. What is the fastest common messaging protocol, built-in on every modern operating system, integrated into every browser, and capable of saturating ordinary networks? It is faster than HTTP, and much, much faster than AMQP. In fact if implementations of this protocol were not dependent on reading and writing everything to disk, they would probably score as the fastest messaging application every designed.

The answer is FTP, the humble file transfer protocol, beloved of network engineers who want to check whether a network link is configured for 100Mbps or 1Gbps: FTP is capable of shoving data fast enough down the line to prove without doubt how fast the network is.

Now, the interesting part and the reason for my question. What is special about FTP that lets it transfer data so rapidly? And what lessons does this provide for AMQP?

Incidentally, the origin of my views on AMQP performance come from the work done by iMatix and FastMQ on ZeroMQ, an AMQ-inspired framework that can transmit millions of messages per second. ZeroMQ is so fast because it uses the same techniques as FTP, rather than those used by AMQP.

FTP wins because it uses one connection for control commands, and one for message transfer. This is something that later protocols, like HTTP, did not do. But FTP is mostly faster and simpler than HTTP. Faster and simpler are desirable features.

Binary encoding's broken assumptions

AMQP's main assumption that binary encoding is needed can be broken into more detailed assumptions, each wrong:

  1. That it is neccessary to optimise control commands like Queue.Create. The assumption is that such commands are relevant to performance. In fact, they form a tiny fraction of the messaging activity, the almost total mass being message transfer, not control.
  2. That control commands need to occupy the same network connection as messages. The assumption is that a logical package of control commands and message data must travel on the same physical path. In fact they can travel on very different paths, even across different parts of the network.
  3. That the encoding for control commands and for message transfer need to be the same. In fact, there is no reason for trying to use a single encoding model, and a big win from allowing each part of the protocol use the best encoding, whatever that is.

What AMQP should have done, from the start, was to separate control from data, and then use the simplest possible encoding form for commands, and the simplest possible encoding form for messages. I can't over-emphasise the importance of simplicity, especially in young protocols that have to support a lot of growth.

Separating control from data makes the dialogue between client and server much simpler. One of the big barriers to writing AMQP clients is that they have to understand a complex set of exchanges that mix control and data. It turns out that splitting these apart makes the two separate dialogues much simpler than the single combined dialogue.

Having separated the two types of work, we're free to choose the simplest encoding for each. The simplest possible encoding for commands is in the form of text, with (for example) the 'Header: value' syntax that is well known from HTTP, SMTP, etc. This is trivial to parse using regular expressions. Attacks on this kind of encoding are well understood and they are easy to deal with. There are no funny data types, everything is a string.

Using a simple text encoding for commands releases AMQP from many of its shackles:

  • It becomes obvious to developers.
  • It becomes easy to have backwards compatibility.
  • It becomes easier to write clients.
  • It becomes easier to debug and test AMQP test cases.

Does the text parsing create a performance penalty? Yes, but it is irrelevant to overall performance, because control commands are used only at the start of conversations that can last for hours.

Message encoding

What then is the simplest possible encoding for messages? AMQP defines a rather impressive envelope around messages (around 60-100 octets), which may be fine for large messages and low performance goals, but is bad news for small messages and high throughput rates. When we developed ZeroMQ, we wondered just how small the message envelope could get. The answer is quite surprising: you can reduce it to a single octet.

The simplest message encoding has a 1-octet header that encodes a 7-bit size and a 1-bit continuation indicator:

[octet][0 to 127 bytes of binary data]

Empirical tests also show that this is the most efficient encoding for random message sizes. We can of course define other encodings, each with their own cost-benefit equations.

Now, a necessary question is, "how do we mix those simple text-based control commands with that simple message encoding?" There are several answers:

  1. We can wrap binary messages in textual envelopes. This is how BEEP and HTTP work. This single-connection design looks simpler but in fact becomes quite complex, and it is inefficient.
  2. We can use distinct connections for control commands and for messages, like FTP. This is simple but means we need to manage multiple ports. This feels wrong: it should be possible to do everything on the single AMQP port 5672.
  3. We can start with a simple text-based control model and switch to simple binary message encoding if we decide to start message transfer. This is analogous to how TLS switches from an insecure to an encrypted connection.

I prefer the last option, because as I explained, it is useful to separate control and data. Mixing them, as AMQP does today, creates some extraordinarily delicate problems, such as how to handle errors that can hit both synchronous and asynchronous dialogues. AMQP's exception handling is an elegant solution but it solves a problem I would rather see disappear. By splitting control and data into two separate dialogues, error handling also becomes much more conventional.

Natural semantics

There is a concept I call "natural semantics". These are simple patterns that Just Work. Often they take a little while to appreciate. Natural semantics are like mathematical truths, they exist objectively, and independently of particular technologies or viewpoints. They are precious things. The AMQ exchange-binding-queue model is a natural semantic. A good designer should always search for these natural semantics, and then enshrine them in such ways as to make them inviolable and inevitable and trivial to use.

The natural semantic for control commands is pessimistic synchronous dialogue in which every request is acknowledge with a success / failure response. This is the model is used in every significant Internet protocol. It assumes that the risk of failure is high, and that performance is not an issue. The client therefore gets an explicit response to every request and it checks this response before continuing with the next request. The synchronous semantic is slow because it involves a round trip for each step.

The natural semantic for data transfer is optimistic asynchronous monologue in which one party shoves data as fast as possible to another, not waiting for any response whatsoever. This is the model used in every performance-oriented Internet protocol. It maps to the concept of "streaming". It assumes that the risk of failure is low, and therefore the client can send data without waiting for responses. Data can be batched, and network actions minimized.

The asynchronous semantic needs an answer to "what happens when data gets lost". The best answer is "it depends", and we'll look at the options in a later article.

AMQP does allow both synchronous and asynchronous dialogues but it's not tied to the natural semantics of control and data. The natural semantics are weakly bounded, insufficiently inevitable. And these weak boundaries are fully exploited as people experiment with asynchronous control and synchronous data, creating unnatural semantics.

Unnatural semantics

I've said that separating AMQP's control and data into two dialogues makes things simpler. In fact there are a whole host of related simplifications we can make to AMQP:

  • AMQP uses multiplexed connections, called "channels". We took this notion from HTTP/NG and from BEEP. Multiplexing optimizes the scenario where applications open and close many connections rapidly. This is a HTTP problem. AMQP connections tend to be few and long lasting. There is thus no justification for multiplexing in AMQP, it adds non-trivial complexity, and it should be dropped.
  • AMQP uses a heartbeating mechanism to detect when a peer has stopped responding. There are simpler ways to detect and handle a dead peer. In the synchronous control dialogue, there is no response within a certain timeout. In the asynchronous data dialogue, network traffic backs up and send operations block and timeout. No explicit heartbeating mechanism is needed.
  • AMQP uses handshaking to close a connection. This is needed because it mixes asynchronous and synchronous dialogues on one connection. There are simpler ways to close fully synchronous and fully asynchronous connections.
  • AMQP uses flow control to switch off and on the stream of messages arriving at a node. Flow control pushes the problem upstream. If a peer cannot handle incoming traffic, the best solution is to discard the traffic, fix the problem and then resend it.
  • AMQP uses acknowledgments and transaction on messages. Acknowledgements and transactions do not, as far as I can tell, belong in the core control and message protocols.

Conclusion: with the development of AMQP/1.0 we have a chance to learn from, and fix, some of the old design flaws that go back to the very first versions of AMQP. We can make the protocol significantly simpler, faster, flexible, more open, and more interoperable, by splitting what is today a mixture of not two, or three, but actually four or five protocols into distinct layered protocols:

  1. A textual protocol for connection and version negotiation
  2. A textual protocol for control commands
  3. A binary protocol for messages, with a choice of envelopes
  4. Protocols for transactions and reliability, layered on top

Links