Redis vs Kafka vs RabbitMQ for 1MB messages - performance

I am currently researching a queueing solution to handle medium sized messages of 1MB.
Besides the features differences between Redis, Kafka and RabbitMQ I cannot find any good answer to their performance on messages of size around 1MB.
Any of you guys knows how many messages of 1MB can any of these handle?
Do you know any other queueing solutions which can perform better?

When you are evaluating Kafka vs Redis in your case, there are other factors which you have to take into account, besides message size. Here are some of them I can think of:
How many producers/consumers? Redis performance can be affected in case of greater number of producers/consumers due to the nature of Redis (push based queue). This is because Redis delivers the message to all the consumers at once, at the moment the message is put in the queue.
Do you need speed or reliability first? If speed is of utmost importance, use Redis since it does not persist messages and it will deliver them faster. If you need reliability use Kafka since it persist messages even after they are delivered.
Do you want your consumers to get messages once they are ready or you want messages to be sent to the consumers immediately? In first case use Kafka because it's pull based mechanism (consumer have to ask for the message). In second case use Redis since it's push based mechanism (message is pushed to the consumer once it's on the queue). RabbitMQ is also push based (although there is pull API with bad performance)
What is the number of messages expected? If it's not huge use Redis since you are limited with memory. Otherwise use Kafka. Best practice for RabbitMQ is to keep queues short. This means that you can consume messages at the close rate at which they appear on the queue. So if you have some long lasting operation on the consumer part probably RabbitMQ is not the best choice.
Scaling? Kafka scales horizontally really well (it's built with scalability in mind). RabbitMQ is usually scaled vertically. Redis also scales well horizontally if needed.
It's obvious that there are more than one criteria when you evaluate proper queueing solution. There are best practices and recommendations for each of the queueing engines that you are looking at. Think more about your specific use case, it's definitely worth the time since it will save you time later on if you chose inappropriate queueing engine.

I am answering for Kafka.
Kafka itself has very good performance even for big messages.
In our tests with 2 Kafka nodes we reach p2p communication with 170 MB/sec smaller messages 150 MB/s bigger messages.
The only thing you need to remember is to configure the broker to accept bigger messages.
Hier is nice article: Configuring Kafka for Performance and Resource Management - Handling Large Messages
I know other p2p solution which might be interesting when you have concrete requirements look at YAMI4
I was using Redis but only for very small messages, so I cannot say anything about 1MB.

Related

Kafka: is it better to have a lot of small messages or fewer, but bigger ones?

There is a microservice, which receives the batch of the messages from the outside and push them to kafka. Each message is sent separately, so for each batch I have around 1000 messages 100 bytes each. It seems like the messages take much more space internally, because the free space on the disk going down much faster than I expected.
I'm thinking about changing the producer logic, the way it will put all the batch in one message (the consumer then will split them by itself). But I haven't found any information about space or performance issues with many small messages, neither any guildlines about balance between size and count. And I don't know Kafka enough to have my own conclusion.
Thank you.
The producer will, by itself, batch messages that are destined to the same partition, in order to avoid unnecesary calls.
The producer makes this thanks to its background threads. In the image, you can see how it batches 3 messages before sending them to each partition.
If you also set compression in the producer-side, it will also compress (GZip, LZ4, Snappy are the valid codecs) the messages before sending it to the wire. This property can also can be set on the broker-side (so the messages are sent uncompressed by the producer, and compressed by the broker).
It depends on your network capacity to decide wether you prefer a slower producer (as the compression will slow it) or bigger load on the wire. Note that setting a big compression level on big files may affect a lot your overall performance.
Anyway, I believe the big/small msg problem hurts a lot more to the consumer side; Sending messages to Kafka is easy and fast (the default behaviour is async, so the producer won't be too busy). But on the consumer side, you'll have to look the way you are processing the messages:
One Consumer-Worker
Here you couple consuming with processing. This is the simplest way: the consumer sets its own thread, reads a kafka msg and process it. Then continues the loop.
One Consumer - Many workers
Here you decouple consuming and processing. In most cases, reading from kafka will be faster than the time you need to process the message. It is just physics. In this approach, one consumer feeds many separate worker threads that share the processing load.
More info about this here, just above the Constructors area.
Why do I explain this? Well, if your messages are too big, and you choose the first option, your consumer may not call poll() within the timeout interval, so it will rebalance continuosly. If your messages are big (and take some time to be processed), better choose to implement the second option, as the consumer will continue its own way, calling poll() without falling in rebalances.
If the messages are too big and too many, you may have to start thinking about different structures than can buffer the messages into your memory. Pools, deques, queues, for example, are different options to acomplish this.
You may also increase the poll timeout interval. This may hide you about dead consumers, so I don't really recommend it.
So my answer would be: it depends, basicallty on: your network capacity, your required latency, your processing capacity. If you are able to process big messages equally fast as smaller ones, then I wouldn't care much.
Maybe if you need to filter and reprocess older messages I'd recommend partitioning the topics and sending smaller messages, but it's only a use-case.

MassTransit selective consumers without round tripping

I am looking at using masstransit and have a need for selectively sending messages to consumers at the end if unreliable and slow network links (they are in the same WAN but use a slow and expensive cellular link).
I am expecting a fanout of 1 to 200 where the sites with lowest volume of messages and least reliable / most expensive links need to ignore the potentially high amount of message traffic othe consumers will see
I have looked at using the Selective consumer interface but this seems to imply that the message is always sent to all consumers, and then discarded if it doesn't match the predicate. This overhead is not acceptable.
Without using endpoint factory and manually managing uri end points to do a Send(), is there a nice way to do thus using subscriptions?
Simple answer: nope.
You do have a few options though. Is it just routing based upon load/processing? You could use competing consumers to do load balancing. All the endpoints read off the same queue (but they must be the same consumers on every process reading from the queue) and just pick up the next one. If you're slow, you just pick off fewer messages. (You can only use competing consumers with RabbitMQ).
For MSMQ there's a distributor that was built for load balancing. You could look at rebuilding that on top of RabbitMQ that if that's your transport. It's not super complicated, but would take some effort to do.
Other than that, I think you're likely down to writing something from scratch. It's not really pub/sub any more. So it falls outside MT's wheelhouse.

Regarding Akka message transfer performance: many small messages or less large messages?

For a data-mining algorithm I am currently developing using Akka, I was wondering if Akka implements performance optimizations of the messages that are sent.
For instance, if I have an Actor that emits a very large number of messages to the same other Actor, is it good to encapsulate a set of messages into another large message? Or does Akka have some sort of buffer itself so that not one message but many messages are transfered over the network at once?
I am asking this question because the algorithm is supposed to be executed remotely on a cluster where transfer performance is important and I currently have no option to just do benchmarks myself.
For messages passed in Akka on the same machine, I don't think it matters a lot whether you use small message or an aggregation of messages as single message. The additional overhead of many calls versus having to loop while processing the aggregation is minimal I think.
I would prefer using small messages because it keeps the system simpler.
However, when sending messages over the network Akka is using HTTP and so there is the additional HTTP overhead costs for setting up a connection etc. Therefore you might choose here to aggregate some messages into a single message.
However, this also depends on your use case. Buffering implies waiting for more until there are enough (or a timeout occured). If you cannot wait, e.g. because you need fast responses, then you still need to send each message over individually.
I don't think there is a standard Akka actor available which does some aggregation of messages. Maybe a special kind of routing could be applied which does the buffering.
Or you might have a look at Akka Streams. That does support buffering of messages.

Safe to broadcast large objects with RabbitMQ?

I am relative new to RabbitMQ, and found it is extremely handy and swift, I have used it for communicating small objects by using ruby + bunny gem.
Now I'm trying to pass object around 10~20MB each to exchange, and fanout to its subscribers.
It seemed worked fine, BUT is it a good practice to use RabbitMQ as a publisher? Or should I use something conjecture with RabbitMQ?
You should not send files via AMQP.
Message queues are not databases. Specifically, RabbitMQ was not built with the idea of storing large objects in the queues, because messages are not supposed to be large.
Think about the real world a bit - the postal service for years (not necessarily so much anymore), was optimized for processing letters. If your letter is too fat (heavy), they charge a pretty hefty fee for additional postage. Big messages cost more to move around and disrupt the system. Additionally, your mailbox won't hold large messages - they get left somewhere else - either in a separate package drop or your front door (where they sometimes go missing).
Message queues are the same way. A message typically contains a small piece of data describing an event or other meaningful thing that happened in your application. Usually the data conveyed by a message can be communicated in 100kB or less.
As I mention in this answer, the AMQP protocol (which underlies RabbitMQ) is a fairly chatty protocol. It requires large messages be divided into multiple segments of no more than 131kB. This can add significant of overhead to a large file transfer, especially when compared to other file transfer mechanisms (e.g. FTP, HTTP).
More importantly for performance, the message has to be fully processed by the broker before it is made available in a queue, and it ties up RAM on the broker while this is being done. Putting files in the broker may work for one client and one broker, but it will break quickly when scaling out is attempted. Finally, compression is often desirable when transferring files - HTTP supports gzip compression automatically, while AMQP does not.
What should you do?
It is quite common in message-oriented applications to send a message containing a resource locator (e.g. URL) pointing to the larger data file, which is then accessed via appropriate means.
If it works and doesn't cause you any problems then great. I would suggest that there may be a time cost for the conversion of each object to a byte array. Clearly the reverse at the consumer side is the case too. As each object is so large that may be consideration, unless speed is not your primary objective. Is is necessary to send such large objects?
One big problem with sending large objects is that they will block and entire connection so if you have more than one channel publishing on the same connection they will have to wait for each connection to finish sending this large object.
see here

Low-latency, large-scale message queuing

I'm going through a bit of a re-think of large-scale multiplayer games in the age of Facebook applications and cloud computing.
Suppose I were to build something on top of existing open protocols, and I want to serve 1,000,000 simultaneous players, just to scope the problem.
Suppose each player has an incoming message queue (for chat and whatnot), and on average one more incoming message queue (guilds, zones, instances, auction, ...) so we have 2,000,000 queues. A player will listen to 1-10 queues at a time. Each queue will have on average maybe 1 message per second, but certain queues will have much higher rate and higher number of listeners (say, a "entity location" queue for a level instance). Let's assume no more than 100 milliseconds of system queuing latency, which is OK for mildly action-oriented games (but not games like Quake or Unreal Tournament).
From other systems, I know that serving 10,000 users on a single 1U or blade box is a reasonable expectation (assuming there's nothing else expensive going on, like physics simulation or whatnot).
So, with a crossbar cluster system, where clients connect to connection gateways, which in turn connect to message queue servers, we'd get 10,000 users per gateway with 100 gateway machines, and 20,000 message queues per queue server with 100 queue machines. Again, just for general scoping. The number of connections on each MQ machine would be tiny: about 100, to talk to each of the gateways. The number of connections on the gateways would be alot higher: 10,100 for the clients + connections to all the queue servers. (On top of this, add some connections for game world simulation servers or whatnot, but I'm trying to keep that separate for now)
If I didn't want to build this from scratch, I'd have to use some messaging and/or queuing infrastructure that exists. The two open protocols I can find are AMQP and XMPP. The intended use of XMPP is a little more like what this game system would need, but the overhead is quite noticeable (XML, plus the verbose presence data, plus various other channels that have to be built on top). The actual data model of AMQP is closer to what I describe above, but all the users seem to be large, enterprise-type corporations, and the workloads seem to be workflow related, not real-time game update related.
Does anyone have any daytime experience with these technologies, or implementations thereof, that you can share?
#MSalters
Re 'message queue':
RabbitMQ's default operation is exactly what you describe: transient pubsub. But with TCP instead of UDP.
If you want guaranteed eventual delivery and other persistence and recovery features, then you CAN have that too - it's an option. That's the whole point of RabbitMQ and AMQP -- you can have lots of behaviours with just one message delivery system.
The model you describe is the DEFAULT behaviour, which is transient, "fire and forget", and routing messages to wherever the recipients are. People use RabbitMQ to do multicast discovery on EC2 for just that reason. You can get UDP type behaviours over unicast TCP pubsub. Neat, huh?
Re UDP:
I am not sure if UDP would be useful here. If you turn off Nagling then RabbitMQ single message roundtrip latency (client-broker-client) has been measured at 250-300 microseconds. See here for a comparison with Windows latency (which was a bit higher) http://old.nabble.com/High%28er%29-latency-with-1.5.1--p21663105.html
I cannot think of many multiplayer games that need roundtrip latency lower than 300 microseconds. You could get below 300us with TCP. TCP windowing is more expensive than raw UDP, but if you use UDP to go faster, and add a custom loss-recovery or seqno/ack/resend manager then that may slow you down again. It all depends on your use case. If you really really really need to use UDP and lazy acks and so on, then you could strip out RabbitMQ's TCP and probably pull that off.
I hope this helps clarify why I recommended RabbitMQ for Jon's use case.
I am building such a system now, actually.
I have done a fair amount of evaluation of several MQs, including RabbitMQ, Qpid, and ZeroMQ. The latency and throughput of any of those are more than adequate for this type of application. What is not good, however, is queue creation time in the midst of half a million queues or more. Qpid in particular degrades quite severely after a few thousand queues. To circumvent that problem, you will typically have to create your own routing mechanisms (smaller number of total queues, and consumers on those queues are getting messages that they don't have an interest in).
My current system will probably use ZeroMQ, but in a fairly limited way, inside the cluster. Connections from clients are handled with a custom sim. daemon that I built using libev and is entirely single-threaded (and is showing very good scaling -- it should be able to handle 50,000 connections on one box without any problems -- our sim. tick rate is quite low though, and there are no physics).
XML (and therefore XMPP) is very much not suited to this, as you'll peg the CPU processing XML long before you become bound on I/O, which isn't what you want. We're using Google Protocol Buffers, at the moment, and those seem well suited to our particular needs. We're also using TCP for the client connections. I have had experience using both UDP and TCP for this in the past, and as pointed out by others, UDP does have some advantage, but it's slightly more difficult to work with.
Hopefully when we're a little closer to launch, I'll be able to share more details.
Jon, this sounds like an ideal use case for AMQP and RabbitMQ.
I am not sure why you say that AMQP users are all large enterprise-type corporations. More than half of our customers are in the 'web' space ranging from huge to tiny companies. Lots of games, betting systems, chat systems, twittery type systems, and cloud computing infras have been built out of RabbitMQ. There are even mobile phone applications. Workflows are just one of many use cases.
We try to keep track of what is going on here:
http://www.rabbitmq.com/how.html (make sure you click through to the lists of use cases on del.icio.us too!)
Please do take a look. We are here to help. Feel free to email us at info#rabbitmq.com or hit me on twitter (#monadic).
My experience was with a non-open alternative, BizTalk. The most painful lesson we learnt is that these complex systems are NOT fast. And as you figured from the hardware requirements, that translates directly into significant costs.
For that reason, don't even go near XML for the core interfaces. Your server cluster will be parsing 2 million messages per second. That could easily be 2-20 GB/sec of XML! However, most messages will be for a few queues, while most queues are in fact low-traffic.
Therefore, design your architecture so that it's easy to start with COTS queue servers and then move each queue (type) to a custom queue server when a bottleneck is identified.
Also, for similar reasons, don't assume that a message queue architecture is the best for all comminication needs your application has. Take your "entity location in an instance" example. This is a classic case where you don't want guaranteed message delivery. The reason that you need to share this information is because it changes all the time. So, if a message is lost, you don't want to spend time recovering it. You'd only send the old locatiom of the affected entity. Instead, you'd want to send the current location of that entity. Technology-wise this means you want UDP, not TCP and a custom loss-recovery mechanism.
FWIW, for cases where intermediate results are not important (like positioning info) Qpid has a "last-value queue" that can deliver only the most recent value to a subscriber.

Resources