JMS queue consumer: synchronous receive() or single-threaded onMessage() - performance

I need to consume from a Q, and stamp a sequence key on each message to indicate the ordering. i.e. the consumption needs to be sequential. From performance/throughput point of view, would I be better off using a blocking receive() method, or an async listener with a single-threaded configuration on the onMessage() method?

There are many aspects that will affect the performance and throughput; in pure JMS terms it's not really possible to state that the sync or async model of getting messages will be any less or more efficient. It will depend on a large number of factors from how the application is written, other resources it's using, implementation of your chosen messaging provider and other factors such as machine performance and configuration of both client and server machines.
This discussion,
Single vs Multi-threaded JMS Producer, covered some of these topics.
To the sequence, if you are single threaded, with a single session the JMS specification gives some assurances on message ordering; best to review the spec to see if it matches your overall requirements.
Often people will insert an application sequence number at message production time; the consumer can therefore check they are getting the correct message in order. Adding a sequence number at consumption time won't specifically help that consumer.
Keep in mind that the stricter the requirement for messaging ordering the more restrictive the overall architecture gets and the harder it is to implement horizontal scalabilty.


Spring Kafka Accumulator Use Case

I am developing a SpringBoot application which consumes events from a Kafka (broker version is 2.6) input topic and produce an event into an output topic.
In order to respect some business constraints the component should wait to have at least X messages (which is a batch size) or until a timeout expired. In conclusion, it should act like an accumulator.
Further, another mandatory requirement is to respect exactly-once semantic.
The first solution I approached was to maintain events in-memory until constraints are satisfied and then publish output messages. In order to implement an at least-once semantic I used manual_immediate ack mode and I stored latest ack for each partition in-memory and acknowledge after processing ended (it may cause duplicates in race conditions but it is acceptable).
In order to increase reliability I enforced upstream transactionality and set read_committed mode on listener.
I was wondering wheter it is a correct approach or if there is any suitable solution like batch_mode listener.
On a first look it is wonderful, but it seems not to allow accumulating on number of records, but rather on data size in bytes.
Thanks in advance,

How do I ensure that only one consumer actually consumes a published message?

I use Rabbitmq with microservice architecture. I use topic and direct exchange for many of my use-cases, and it works fine. However I have a use-case where I have to delete a record from database. When I deleted the record several other services needs to be called and maintain/delete the referenced records. I could achieve that by simple call those services with direct exchange, but I read that it is choreography preferred instead orchestration. That means the I should implement publish/subscribe pattern(fanout in rabbitmq).
My question is that if I use the publish/subscribe pattern in a distributed system how to make sure that only one instance by service consumes the published messages?
Your question doesn't deal so much with publish-subscribe, as it does with basic message processing. The fundamental issue is whether or not you can guarantee that an operation will be performed exactly one time. The short answer is that you probably want to use a direct exchange such that a message goes into one queue and is processed by one (of possibly many) consumers.
The long answer is that "exactly once" cannot be guaranteed, so you need to make this part of your design.
It is best practice to have message processing be an idempotent operation. In fact, idempotency is a critical design assumption of almost any external interface (and I would argue it is equally-important in internal interfaces).
Additionally, you should be aware of the fact it is not possible to guarantee "exactly once" delivery. Mathematically, no such guarantee can be made. Instead, you can have one of two things (being mutually exclusive):
At most once delivery (0 < n <= 1)
At least once delivery (1 <= n)
From the RabbitMQ documentation:
Use of acknowledgements guarantees at-least-once delivery. Without acknowledgements, message loss is possible during publish and consume operations and only at-most-once delivery is guaranteed.
Several things are happening when messages are published and consumed. Because of the asynchronous nature of message handling systems, and the AMQP protocol in particular, there is no way to guarantee exactly once processing while still yielding the performance you would need from a messaging system (essentially, to try to ensure exactly-once would forces everything through a serial process at the point of de-duplication).
Design Implications
Given the above, it is important that your design rely upon "at least once" delivery. For a delete operation, this involves re-writing the definition of that operation to be assertive rather than procedural (e.g. "Delete this" becomes "Ensure this does not exist."). The difference is that you describe the end-state rather than the process.
I thing you should have a separate queue for each of the service that instance should be notified about db record deletion. The exchanger puts a copy of a message in all queues. Service instances compete for access to dedicated queue (only one gets a message).

Suggestion regarding max concurrent consumer

I have a spring integration application and I am using message driver adapter to consume messages from external systems. To handle the messages concurrently I have setup concurrent (5) and maximum concurrent consumers (20) which is working fine.
But for production scenario I wanted to fine tune it further. I just want to understand that if we have any standard suggestion regarding how much we can increase this maximum concurrent consumer to? I understand that this is purely dependent on the application and how much traffic is coming to it but I hope there should be some standard process to figure out this number. If we blindly increase this number to a random value like 1000 than it might lead to resource starvation, conflicts etc so I am trying to understand the process of how to go about fine tuning this property.
There is no standard process as there is no standard performance requirement. It all depends on your SLA and performant system is the one that meets your SLA (as there is no such thing as beats SLA).
The main caveat when it comes to concurrent consumers is the order of messages. Basically once you introduced more then one consumer you can not and should not assume any guarantees of message ordering.

Regarding Akka message transfer performance: many small messages or less large messages?

For a data-mining algorithm I am currently developing using Akka, I was wondering if Akka implements performance optimizations of the messages that are sent.
For instance, if I have an Actor that emits a very large number of messages to the same other Actor, is it good to encapsulate a set of messages into another large message? Or does Akka have some sort of buffer itself so that not one message but many messages are transfered over the network at once?
I am asking this question because the algorithm is supposed to be executed remotely on a cluster where transfer performance is important and I currently have no option to just do benchmarks myself.
For messages passed in Akka on the same machine, I don't think it matters a lot whether you use small message or an aggregation of messages as single message. The additional overhead of many calls versus having to loop while processing the aggregation is minimal I think.
I would prefer using small messages because it keeps the system simpler.
However, when sending messages over the network Akka is using HTTP and so there is the additional HTTP overhead costs for setting up a connection etc. Therefore you might choose here to aggregate some messages into a single message.
However, this also depends on your use case. Buffering implies waiting for more until there are enough (or a timeout occured). If you cannot wait, e.g. because you need fast responses, then you still need to send each message over individually.
I don't think there is a standard Akka actor available which does some aggregation of messages. Maybe a special kind of routing could be applied which does the buffering.
Or you might have a look at Akka Streams. That does support buffering of messages.

Are there any tools to optimize the number of consumer and producer threads on a JMS queue?

I'm working on an application that is distributed over two JBoss instances and that produces/consumes JMS messages on several JMS queues.
When we configured the application we had to determine which threading model we would use, in particular the number of producing and consuming threads per queue. We have done this in a rather ad-hoc fashion but after reading the most recent columns by Herb Sutter in Dr Dobbs (in particular this one) I would like to size our threads in a more rigorous manner.
Are there any methods/tools to measure the throughput of JMS queues (in particular JBoss Messaging queues) as a function of the number of producing/consuming threads?
This is not really about a specific tool, but may be helpful.
Not sure what your inner architecture is, but let's assume it's an MDB reading in messages. I assert that your only requirement here for rigorous thread count sizing is to choose a maximum cap. If your MDB uses resources from a finite supplier like a JDBC connection pool, consider the maximum cap as the highest number of concurrent instances from that resource that you can tolerate taking. If the MDB's queue is remote, you probably want to consider remote connections (or technically, JMS sessions) a finite resource. If the MDB has less finite requirements (and the queue is local), your maximum cap becomes the number of threads, memory used and/or flat out CPU consumed by the working threads. The reasoning here is that the JBoss MDB container will simply keep allocating more MDB instances (and therefore threads) until the queue is empty or the maximum cap is reached. The only reason I can think of that you would really agonize over the minimum would be if the container's elapsed time or overhead to create new instances is above your tolerance and those operations are usually pretty small potatoes.
A general axiom of messaging is that producers nearly always outperform consumers. You would think this is pretty arbitrary, but it is a pattern I see recurring all the time, even in widely different messaging scenarios. Anyways, it's tough to say how the threading should work for the producer without knowing a bit about the application, but are you basically capable of [indefinitely] proportionally increasing the number of producer threads and the number of messages generated, or do you have some sort of cap where additional threads simply do not generate more messages ? I would guess it is the latter since most useful work has some limited data or calculation supplier. As I see it, the two drivers here are ordering and persistence.
First off, if you have strict message ordering where messages must be processed in strict (FPFP) First Produced First Processed then you're in a bit of a bind because you almost have to drop down to single threaded throughput unless you can devise some form of logical message demarcation (eg. a client number where any given client's messages are always sent to the same queue, but you may have multiple queues each serviced by one thread so each client is effectively FPFP).
Ordering aside, persistence is the next consideration in that if you have reliable and extensive message persistence, (or have a very high tolerance for message loss) just let the producer threads go to town. The messages will queue up reliably and eventually the consumers will [hopefully] catch up. However, if your message persistence message count or simple queue depths can potentially give you the willies when they get too high, here's where a tool might come in useful. If your producer thread count can be dynamically modified (which they can in many Java ThreadPool implementations) then you could sample the queue depths and raise or lower the producer thread count in accordance with the queue depth ranges you define, optionally to the point where if the consumers basically stall, so will the producers. I do not know of a specific tool that does this but between two JBoss servers this is fairly simple to whip up. Picking your queue depth-->producer thread count will be trickier.
Having said all that, I am going to actually read the article you linked to.....
I've got the perfect thing for you: IBM provide a free command line tool called perfharness.
It's aimed at benchmarking JMS providers, i.e. measuring the throughput of queues (single or multiple) given different numbers of producing or consuming threads.
Some features:
Send and consume messages at a fixed rate (msg/s) or at maximum rate possible on the queue
Use a specific number of threads
Use either JMS or native MQ
Can use data either generated randomly or taken from a file
Generates statistics telling you exactly how fast your queue is performing
The only down side is that it's not super intuitive, given the number of operations it supports. And IBM haven't open sourced it, which is a shame. However it sounds perfect for your purposes.
