activemq performance gotchas and precautions - performance

I am going to use ActiveMQ for the first time in one of my projects (topics for durable messages). I have read that durable messages enforce a limit to the scale of number of messages per second. What are the other factors that I should be aware of (e.g. slow consumers) that puts a limit to the scale and performance characteristics of activemq and what metrics should be closely monitored and what are the values at which all hell breaks lose.
I don't expect to be pushing more than a thousand events per second in ActiveMQ for now.

here are a few tips...
increase your systemUsage limits from the defaults
increase your JVM heap size from the defaults
if using KahaDB, consider setting enableJournalDiskSyncs to false (helps throughput dramatically) or preferably use the new LevelDB
learn about producer flow control and consider disabling (frequently done)
consider using virtual topics (instead of durable topic consumers)
learn about prefetch-limit and tweak as needed

Two specific issues I ran into with activeMQ:
1) There are memory limits enforced per queue that need to be tuned. ActiveMQ won't fill up your heap unless you change the config. So you need to set -Xmx and change the config to use more memory.
2) Related to #1, by default the sender (client) blocks when limits are reached. In newer versions, there is a setting to avoid this and have an exception thrown instead. See http://activemq.apache.org/producer-flow-control.html.

Related

Redis vs Kafka vs RabbitMQ for 1MB messages

I am currently researching a queueing solution to handle medium sized messages of 1MB.
Besides the features differences between Redis, Kafka and RabbitMQ I cannot find any good answer to their performance on messages of size around 1MB.
Any of you guys knows how many messages of 1MB can any of these handle?
Do you know any other queueing solutions which can perform better?
When you are evaluating Kafka vs Redis in your case, there are other factors which you have to take into account, besides message size. Here are some of them I can think of:
How many producers/consumers? Redis performance can be affected in case of greater number of producers/consumers due to the nature of Redis (push based queue). This is because Redis delivers the message to all the consumers at once, at the moment the message is put in the queue.
Do you need speed or reliability first? If speed is of utmost importance, use Redis since it does not persist messages and it will deliver them faster. If you need reliability use Kafka since it persist messages even after they are delivered.
Do you want your consumers to get messages once they are ready or you want messages to be sent to the consumers immediately? In first case use Kafka because it's pull based mechanism (consumer have to ask for the message). In second case use Redis since it's push based mechanism (message is pushed to the consumer once it's on the queue). RabbitMQ is also push based (although there is pull API with bad performance)
What is the number of messages expected? If it's not huge use Redis since you are limited with memory. Otherwise use Kafka. Best practice for RabbitMQ is to keep queues short. This means that you can consume messages at the close rate at which they appear on the queue. So if you have some long lasting operation on the consumer part probably RabbitMQ is not the best choice.
Scaling? Kafka scales horizontally really well (it's built with scalability in mind). RabbitMQ is usually scaled vertically. Redis also scales well horizontally if needed.
It's obvious that there are more than one criteria when you evaluate proper queueing solution. There are best practices and recommendations for each of the queueing engines that you are looking at. Think more about your specific use case, it's definitely worth the time since it will save you time later on if you chose inappropriate queueing engine.
I am answering for Kafka.
Kafka itself has very good performance even for big messages.
In our tests with 2 Kafka nodes we reach p2p communication with 170 MB/sec smaller messages 150 MB/s bigger messages.
The only thing you need to remember is to configure the broker to accept bigger messages.
Hier is nice article: Configuring Kafka for Performance and Resource Management - Handling Large Messages
I know other p2p solution which might be interesting when you have concrete requirements look at YAMI4
I was using Redis but only for very small messages, so I cannot say anything about 1MB.

Tibco - Max flow limit property

I have a process with max flow limit enabled. The value being set at 10. Its a Asyn process and used to get thousands of messages daily. We noticed that at peak time, with the increase in messages in queue in EMS server, the performance of the tibco process decline. Is there is any dependency between slowness in Tibco with increased inflow of EMS messages. How to calculate the exact flow limit for a process ? do we have any standard procedure ?
The FlowLimit configuration setting is a BusinessWorks setting, so I am assuming that you have BusinessWorks engines that are consuming messages from an EMS queue.
The concept of flow control exists in order to ensure that the number of incoming evens to a BusinessWorks engine does not cause the JVM to exceed its available memory resources. BusinessWorks implements the flow control by temporarily disabling the process starter until the number of jobs in memory falls below a threshold. In the case of EMS-based process starters this entains closing the MessageConsumer, which causes EMS to stop delivering messages to the process. In high-volume messaging scenarios this will cause a backlog of messages on the EMS server. Additionally it will cause any message in the prefetch cache on the client-side to be re-prioritzed for re-delivery on the EMS server side. When this happens you will notice that your outbound message count is greater than you inbound message count in your EMS statistics.
You are best off avoiding getting into flow-controlled scenarios. Is your current FlowLimit parameter realistic for the heap size you are allotting your JVM and the message payload sizes you are working with? Can you increase your JVM heap size and also your FlowLimit? Are you able to run multiple instances of the BusinessWorks application dispatching off the same queue in order to increase scalability? The approaches may help you scale and avoid message backlogs.

Active MQ load balancing to achieve high throughput

Currently my activeMQ configuration (non persistent messaging) allows me to achieve 2000 msgs/sec. There are four queues and four consumers consuming the messages. There's only one activeMQ broker in this configuration. I would like to achieve a higher throughput of about 5000 msgs/sec (with addition of additional brokers). I'm pretty clueless on how to achieve this with out splitting individual queues on to individual ActiveMQ instances. What are the topologies that support higher throughput than the individual instance with out splitting the queues among instances ?
Adding a network of brokers might help. That is if you have a decent number of consumers and a decent number of producers connecting to different brokers.
If you have a single producer or a single consumer, all traffic will still go over one of the brokers, making it the bottleneck in any case. So, your actual setup of the servers using the AMQ broker is important.
You will also need to check what's the bottleneck of your physical machines. Is it I/O? CPU? Memory usage/heap size? Even Linkspeed? Use OS tools together with visualvm to track this down. Then you at least know what kind of server you need next.
In any case, some semi-manual load balancing is always possible over several nodes, weather you are using a network of brokers or not. Just make sure messages are routed through certain brokers depending on their content or whatnot. If you cannot distinguish between different message types in any logical way - you can do things like finding some integer number in the message (be it client IP, yesterdays temperature in celsius or whatever), and do a number modulo <num brokers>. Then route it to the destination you selected. Round robin is also an option. There is almost always a way to distribute the load in a logical way among several brokers.

Is it possible to declare a maximum queue size with AMQP?

As the title says — is it possible to declare a maximum queue size and broker behaviour when this maximum size is reached? Or is this a broker-specific option?
I ask because I'm trying to learn about AMQP, not because I have this specific problem with any specific broker… But broker-specific answers would still be insightful.
AFAIK you can't declare maximum queue size with RabbitMQ.
Also there's no such setting in the AMQP sepc:
http://www.rabbitmq.com/amqp-0-9-1-quickref.html#queue.declare
Depending on why you're asking, you might not actually need a maximum queue size. Since version 2.0 RabbitMQ will seamlessly persist large queues to disk instead of storing all the messages in RAM. So if your concern the broker crashing because it exhausts its resources, this actually isn't much of a problem in most circumstances - assuming you aren't strapped for hard disk space.
In general this persistence actually has very little performance impact, because by definition the only "hot" parts of the queue are the head and tail, which stay in RAM; the majority of the backlog is "cold" so it makes little difference that it's sitting on disk instead.
We've recently discovered that at high throughput it isn't quite that simple - under some circumstances the throughput can deteriorate as the queue grows, which can lead to unbounded queue growth. But when that happens is a function of CPU, and we went for quite some time without hitting it.
You can read about RabbitMQ maximum queue implementation here http://www.rabbitmq.com/maxlength.html
They do not block the incoming messages addition but drop the messages from the head of the queue.
You should definitely read about Flow control here:
http://www.rabbitmq.com/memory.html
With qpid, yes
you can confire maximun queue size and politic in case raise the maximum. Ring, ignore messages,broke connection.
you also have lvq queues (las value) very configurable
There are some things that you can't do with brokers, but you can do in your app. For instance, there are two AMQP methods, basic.get and queue.declare, which return the number of messages in the queue. You can use this to periodically get a count of outstanding messages and take action (like start new consumer processes) if the message count gets too high.

Are there any tools to optimize the number of consumer and producer threads on a JMS queue?

I'm working on an application that is distributed over two JBoss instances and that produces/consumes JMS messages on several JMS queues.
When we configured the application we had to determine which threading model we would use, in particular the number of producing and consuming threads per queue. We have done this in a rather ad-hoc fashion but after reading the most recent columns by Herb Sutter in Dr Dobbs (in particular this one) I would like to size our threads in a more rigorous manner.
Are there any methods/tools to measure the throughput of JMS queues (in particular JBoss Messaging queues) as a function of the number of producing/consuming threads?
This is not really about a specific tool, but may be helpful.
Consumers:
Not sure what your inner architecture is, but let's assume it's an MDB reading in messages. I assert that your only requirement here for rigorous thread count sizing is to choose a maximum cap. If your MDB uses resources from a finite supplier like a JDBC connection pool, consider the maximum cap as the highest number of concurrent instances from that resource that you can tolerate taking. If the MDB's queue is remote, you probably want to consider remote connections (or technically, JMS sessions) a finite resource. If the MDB has less finite requirements (and the queue is local), your maximum cap becomes the number of threads, memory used and/or flat out CPU consumed by the working threads. The reasoning here is that the JBoss MDB container will simply keep allocating more MDB instances (and therefore threads) until the queue is empty or the maximum cap is reached. The only reason I can think of that you would really agonize over the minimum would be if the container's elapsed time or overhead to create new instances is above your tolerance and those operations are usually pretty small potatoes.
Producers
A general axiom of messaging is that producers nearly always outperform consumers. You would think this is pretty arbitrary, but it is a pattern I see recurring all the time, even in widely different messaging scenarios. Anyways, it's tough to say how the threading should work for the producer without knowing a bit about the application, but are you basically capable of [indefinitely] proportionally increasing the number of producer threads and the number of messages generated, or do you have some sort of cap where additional threads simply do not generate more messages ? I would guess it is the latter since most useful work has some limited data or calculation supplier. As I see it, the two drivers here are ordering and persistence.
First off, if you have strict message ordering where messages must be processed in strict (FPFP) First Produced First Processed then you're in a bit of a bind because you almost have to drop down to single threaded throughput unless you can devise some form of logical message demarcation (eg. a client number where any given client's messages are always sent to the same queue, but you may have multiple queues each serviced by one thread so each client is effectively FPFP).
Ordering aside, persistence is the next consideration in that if you have reliable and extensive message persistence, (or have a very high tolerance for message loss) just let the producer threads go to town. The messages will queue up reliably and eventually the consumers will [hopefully] catch up. However, if your message persistence message count or simple queue depths can potentially give you the willies when they get too high, here's where a tool might come in useful. If your producer thread count can be dynamically modified (which they can in many Java ThreadPool implementations) then you could sample the queue depths and raise or lower the producer thread count in accordance with the queue depth ranges you define, optionally to the point where if the consumers basically stall, so will the producers. I do not know of a specific tool that does this but between two JBoss servers this is fairly simple to whip up. Picking your queue depth-->producer thread count will be trickier.
Having said all that, I am going to actually read the article you linked to.....
I've got the perfect thing for you: IBM provide a free command line tool called perfharness.
It's aimed at benchmarking JMS providers, i.e. measuring the throughput of queues (single or multiple) given different numbers of producing or consuming threads.
Some features:
Send and consume messages at a fixed rate (msg/s) or at maximum rate possible on the queue
Use a specific number of threads
Use either JMS or native MQ
Can use data either generated randomly or taken from a file
Generates statistics telling you exactly how fast your queue is performing
The only down side is that it's not super intuitive, given the number of operations it supports. And IBM haven't open sourced it, which is a shame. However it sounds perfect for your purposes.

Resources