Is it possible to declare a maximum queue size with AMQP? - amqp

As the title says — is it possible to declare a maximum queue size and broker behaviour when this maximum size is reached? Or is this a broker-specific option?
I ask because I'm trying to learn about AMQP, not because I have this specific problem with any specific broker… But broker-specific answers would still be insightful.

AFAIK you can't declare maximum queue size with RabbitMQ.
Also there's no such setting in the AMQP sepc:
http://www.rabbitmq.com/amqp-0-9-1-quickref.html#queue.declare

Depending on why you're asking, you might not actually need a maximum queue size. Since version 2.0 RabbitMQ will seamlessly persist large queues to disk instead of storing all the messages in RAM. So if your concern the broker crashing because it exhausts its resources, this actually isn't much of a problem in most circumstances - assuming you aren't strapped for hard disk space.
In general this persistence actually has very little performance impact, because by definition the only "hot" parts of the queue are the head and tail, which stay in RAM; the majority of the backlog is "cold" so it makes little difference that it's sitting on disk instead.
We've recently discovered that at high throughput it isn't quite that simple - under some circumstances the throughput can deteriorate as the queue grows, which can lead to unbounded queue growth. But when that happens is a function of CPU, and we went for quite some time without hitting it.

You can read about RabbitMQ maximum queue implementation here http://www.rabbitmq.com/maxlength.html
They do not block the incoming messages addition but drop the messages from the head of the queue.
You should definitely read about Flow control here:
http://www.rabbitmq.com/memory.html

With qpid, yes
you can confire maximun queue size and politic in case raise the maximum. Ring, ignore messages,broke connection.
you also have lvq queues (las value) very configurable

There are some things that you can't do with brokers, but you can do in your app. For instance, there are two AMQP methods, basic.get and queue.declare, which return the number of messages in the queue. You can use this to periodically get a count of outstanding messages and take action (like start new consumer processes) if the message count gets too high.

Related

Kafka: is it better to have a lot of small messages or fewer, but bigger ones?

There is a microservice, which receives the batch of the messages from the outside and push them to kafka. Each message is sent separately, so for each batch I have around 1000 messages 100 bytes each. It seems like the messages take much more space internally, because the free space on the disk going down much faster than I expected.
I'm thinking about changing the producer logic, the way it will put all the batch in one message (the consumer then will split them by itself). But I haven't found any information about space or performance issues with many small messages, neither any guildlines about balance between size and count. And I don't know Kafka enough to have my own conclusion.
Thank you.
The producer will, by itself, batch messages that are destined to the same partition, in order to avoid unnecesary calls.
The producer makes this thanks to its background threads. In the image, you can see how it batches 3 messages before sending them to each partition.
If you also set compression in the producer-side, it will also compress (GZip, LZ4, Snappy are the valid codecs) the messages before sending it to the wire. This property can also can be set on the broker-side (so the messages are sent uncompressed by the producer, and compressed by the broker).
It depends on your network capacity to decide wether you prefer a slower producer (as the compression will slow it) or bigger load on the wire. Note that setting a big compression level on big files may affect a lot your overall performance.
Anyway, I believe the big/small msg problem hurts a lot more to the consumer side; Sending messages to Kafka is easy and fast (the default behaviour is async, so the producer won't be too busy). But on the consumer side, you'll have to look the way you are processing the messages:
One Consumer-Worker
Here you couple consuming with processing. This is the simplest way: the consumer sets its own thread, reads a kafka msg and process it. Then continues the loop.
One Consumer - Many workers
Here you decouple consuming and processing. In most cases, reading from kafka will be faster than the time you need to process the message. It is just physics. In this approach, one consumer feeds many separate worker threads that share the processing load.
More info about this here, just above the Constructors area.
Why do I explain this? Well, if your messages are too big, and you choose the first option, your consumer may not call poll() within the timeout interval, so it will rebalance continuosly. If your messages are big (and take some time to be processed), better choose to implement the second option, as the consumer will continue its own way, calling poll() without falling in rebalances.
If the messages are too big and too many, you may have to start thinking about different structures than can buffer the messages into your memory. Pools, deques, queues, for example, are different options to acomplish this.
You may also increase the poll timeout interval. This may hide you about dead consumers, so I don't really recommend it.
So my answer would be: it depends, basicallty on: your network capacity, your required latency, your processing capacity. If you are able to process big messages equally fast as smaller ones, then I wouldn't care much.
Maybe if you need to filter and reprocess older messages I'd recommend partitioning the topics and sending smaller messages, but it's only a use-case.

SQS maximum inflight message count is lower than documentation says

According to the SQS documentation, the maximum number of inflight messages is set to 120,000 for standard queues. However, sometimes I see my queues maxing out at lower numbers, such as here:
Does anyone know why this might be the case? I have code that dynamically changes the number of SQS listeners depending on the number of messages in the queue, but I don't want to do anything if I've hit the maximum. My problem is now that the max limit doesn't seem to be consistent. Some queues go to 120K, but this one is stuck at 100K instead, and as far as I can tell there is no setting that allows me to set this limit.
approximateNumberOfMessagesNotVisible indicates the number of messages in-flight, as you are rightly said. It depends on how many consumers you have, and what is througput of each consumer.
If the actual number is caping at 100k, then your consumers are swamped and have no more receiving capacity.
Anyways, it's better if you provide more info on the use-case as 100k in-flight messages look out of ordinary and you may be not using correct solution for your problem.

JMS Priority Messages Causing Starvation of Lower Priority Message

I have a queue that is loaded with high priority JMS messages throughout the day, I want to get them out the door quickly. The queue is also being loaded periodically with lower priority messages in large batches. The problem that I see on busy days, is that there are always enough high priority messages at the front of the queue that none of the lower priority messages get selected until that volume drops off. Often they will sit on the queue until they middle of the night. The app is distributed over a number of servers, but the CPUs are not even breathing hard, the JMS seems to be the choak point.
My hunch is to implement some sort of aging algorithm that increases priority for messages that have been on the queue for a very long time, but of course, that is what middleware is supposed to do for me. I can't imagine that the JMS provider (IBM WebsphereMQ) or the application server (TIBCO BusinessWorks) doesn't have some sort of facility to cope with this. So before I go write some code, I thought I would ask, is there any way to get either of these technologies to help me out with this problem?
The BusinessWorks activity that is reading the queue is a JMS SOAP Event Source, but I could turn it into a JMS Queue Receiver activity or whatever.
All thoughts on how to solve this are welcome :-) TIA
That's like tying 1 hand behind your back and then complaining that you cannot swim properly. D'oh! First off, who's bright idea was it to mix messages. Just because you can do something does not mean you should.
The app is distributed over a number of servers, but the CPUs are not
even breathing hard, the JMS seems to be the choak point.
Well then, the solution is easy. Put high priority messages into queue "A" (the existing queue) and low priority messages into a new queue "B". Next, startup another instance of your JMS application to read the messages off queue "B".
Also, JMS is probably not the choke-point. It is what the application is doing with the message data after the JMS layer picks up the message that is taking a long time (i.e. backend work).
Finally, how many instances of your JMS application is running against the existing queue? If you are only running 1 instance, why? If you have lots of CPU capacity then why don't you run 10 instances of your JMS application. Do some true parallel processing of messages.
If you really want to keep you messages mixed on the same queue and have the high priority messages processed first, and yet your volume of messages is such that you cannot work through all the volume sometimes until the middle of the night, then you quite simply do not have enough processing applications. MQ is a parallel processing system, it is designed to allow many applications to put or get from a queue at once. Make use of this by running more of your getting applications at the same time. They will work through your high priority messages quicker and then get back to processing the lower priority ones.
From your description it's clear that you want the high priority messages to processed first. In such a case lower priority messages will have to wait.
MQ will not increase the priority of messages if they are sitting in queue for long time. How will it know that it has to change property of a message :)?. You will need to develop an application to do that.
I would think segregating messages based on priority, for example, high priority messages are put to one queue and lower priority messages to another queue could be one option you could look at.
Second option would be to look at the changing the delivery sequence (MSGDLVSQ) to FIFO. This makes to messages to be delivered to consumers in the order they arrived into queue. But note this will ignore the message priority, meaning if there is a lower priority message followed by a higher priority message, then higher priority message will wait till the lower priority message is delivered.

activemq performance gotchas and precautions

I am going to use ActiveMQ for the first time in one of my projects (topics for durable messages). I have read that durable messages enforce a limit to the scale of number of messages per second. What are the other factors that I should be aware of (e.g. slow consumers) that puts a limit to the scale and performance characteristics of activemq and what metrics should be closely monitored and what are the values at which all hell breaks lose.
I don't expect to be pushing more than a thousand events per second in ActiveMQ for now.
here are a few tips...
increase your systemUsage limits from the defaults
increase your JVM heap size from the defaults
if using KahaDB, consider setting enableJournalDiskSyncs to false (helps throughput dramatically) or preferably use the new LevelDB
learn about producer flow control and consider disabling (frequently done)
consider using virtual topics (instead of durable topic consumers)
learn about prefetch-limit and tweak as needed
Two specific issues I ran into with activeMQ:
1) There are memory limits enforced per queue that need to be tuned. ActiveMQ won't fill up your heap unless you change the config. So you need to set -Xmx and change the config to use more memory.
2) Related to #1, by default the sender (client) blocks when limits are reached. In newer versions, there is a setting to avoid this and have an exception thrown instead. See http://activemq.apache.org/producer-flow-control.html.

Are there any tools to optimize the number of consumer and producer threads on a JMS queue?

I'm working on an application that is distributed over two JBoss instances and that produces/consumes JMS messages on several JMS queues.
When we configured the application we had to determine which threading model we would use, in particular the number of producing and consuming threads per queue. We have done this in a rather ad-hoc fashion but after reading the most recent columns by Herb Sutter in Dr Dobbs (in particular this one) I would like to size our threads in a more rigorous manner.
Are there any methods/tools to measure the throughput of JMS queues (in particular JBoss Messaging queues) as a function of the number of producing/consuming threads?
This is not really about a specific tool, but may be helpful.
Consumers:
Not sure what your inner architecture is, but let's assume it's an MDB reading in messages. I assert that your only requirement here for rigorous thread count sizing is to choose a maximum cap. If your MDB uses resources from a finite supplier like a JDBC connection pool, consider the maximum cap as the highest number of concurrent instances from that resource that you can tolerate taking. If the MDB's queue is remote, you probably want to consider remote connections (or technically, JMS sessions) a finite resource. If the MDB has less finite requirements (and the queue is local), your maximum cap becomes the number of threads, memory used and/or flat out CPU consumed by the working threads. The reasoning here is that the JBoss MDB container will simply keep allocating more MDB instances (and therefore threads) until the queue is empty or the maximum cap is reached. The only reason I can think of that you would really agonize over the minimum would be if the container's elapsed time or overhead to create new instances is above your tolerance and those operations are usually pretty small potatoes.
Producers
A general axiom of messaging is that producers nearly always outperform consumers. You would think this is pretty arbitrary, but it is a pattern I see recurring all the time, even in widely different messaging scenarios. Anyways, it's tough to say how the threading should work for the producer without knowing a bit about the application, but are you basically capable of [indefinitely] proportionally increasing the number of producer threads and the number of messages generated, or do you have some sort of cap where additional threads simply do not generate more messages ? I would guess it is the latter since most useful work has some limited data or calculation supplier. As I see it, the two drivers here are ordering and persistence.
First off, if you have strict message ordering where messages must be processed in strict (FPFP) First Produced First Processed then you're in a bit of a bind because you almost have to drop down to single threaded throughput unless you can devise some form of logical message demarcation (eg. a client number where any given client's messages are always sent to the same queue, but you may have multiple queues each serviced by one thread so each client is effectively FPFP).
Ordering aside, persistence is the next consideration in that if you have reliable and extensive message persistence, (or have a very high tolerance for message loss) just let the producer threads go to town. The messages will queue up reliably and eventually the consumers will [hopefully] catch up. However, if your message persistence message count or simple queue depths can potentially give you the willies when they get too high, here's where a tool might come in useful. If your producer thread count can be dynamically modified (which they can in many Java ThreadPool implementations) then you could sample the queue depths and raise or lower the producer thread count in accordance with the queue depth ranges you define, optionally to the point where if the consumers basically stall, so will the producers. I do not know of a specific tool that does this but between two JBoss servers this is fairly simple to whip up. Picking your queue depth-->producer thread count will be trickier.
Having said all that, I am going to actually read the article you linked to.....
I've got the perfect thing for you: IBM provide a free command line tool called perfharness.
It's aimed at benchmarking JMS providers, i.e. measuring the throughput of queues (single or multiple) given different numbers of producing or consuming threads.
Some features:
Send and consume messages at a fixed rate (msg/s) or at maximum rate possible on the queue
Use a specific number of threads
Use either JMS or native MQ
Can use data either generated randomly or taken from a file
Generates statistics telling you exactly how fast your queue is performing
The only down side is that it's not super intuitive, given the number of operations it supports. And IBM haven't open sourced it, which is a shame. However it sounds perfect for your purposes.

Resources