Azure Java ServiceBus Queue Java receive or complete messages in batch for better performance - azure-servicebus-queues

I am trying to receive messages from a queue and had experimented with different methods and was facing performance issues. Below are the metrics for each type of run:
Receive Mode = peek and lock; 1000 messages took 2.5 minutes as I had to complete each message one by one
Receive Mode = receive and delete; 1000 messages took an average of 1.5 mins.
Receive Mode = receive and delete (with prefetch count as 100); 1000 messages took 3 seconds but I ended up losing the 100 messages which were in the buffer at the time of execution end
Receive Mode = peek and lock (with prefetch count as 100); 1000 message took 2 minutes as I had to again complete each message. It would have been a problem solver only if there was a way to complete them in batch.
Below is my code for reference:
ServiceBusSessionReceiverClient sessionReceiverClient = new ServiceBusClientBuilder()
.connectionString(System.getenv("QueueConnectionString"))
.sessionReceiver()
.maxAutoLockRenewDuration(Duration.ofMinutes(2))
.receiveMode(ServiceBusReceiveMode.PEEK_LOCK)
.queueName(queueName)
.buildClient();
ServiceBusReceiverClient receiverClient = sessionReceiverClient.acceptSession(System.getenv("QueueSessionName"));
ObjectMapper objectMapper = new ObjectMapper();
do {
receiverClient.receiveMessages((int) prefetchCount).stream().forEach(message -> {
try {
String str = message.getBody().toString();
final T dataDto = objectMapper.readValue(message.getBody().toString(), returnType);
dataDtoList.add(dataDto);
receiverClient.complete(message);
} catch (Exception e) {
AzFaUtil.getLogger().severe("Message processing failed. Error: " + e.getMessage() + e + "\n Payload: "
+ message);
}
});
} while (dataDtoList.size() < numberOfMessages);
receiverClient.close();
sessionReceiverClient.close();
Possible solutions that I can think of:
If there is a way to complete messages in batch instead of completing 1 by 1.
If there is a way to requeue the messages back to the queue which are sitting in the prefetch buffer.
Note: This API needs to be Synchronous. I just experimented with 1000 entries but I am working with 30000 entries so performance matters. Also the queue is session enabled and also partition enabled

As per this issue, Microsoft has yet to test the performance of their ServiceBus. As FIFO (First In First Out) was a requirement for my message queue, I used JMS and it performed almost 10x faster on average but there is a drawback. Currently, JMS doesn't support session-based queues so I had to disable sessions, and then to ensure FIFO I also had to disable partitioning on the queue. This is a partial and temporary solution for better performance till either Microsoft improves the performance of their ServiceBusRecieverClient or enables sessions on JMS.

Related

does Spring allow to configure retry and recovery mechanism for KafkaTemplate send method?

Trying to build a list of possible errors that can potentially happen during the execution of kafkaTemplate.send() method:
Errors related serialization process;
Some network issues or broker is down;
Some technical issues on broker side, for example acknowledgement not received from broker, etc.
And now I need to find a way how to handle all possible errors in the right way:
Based on the business requirements: in case of any exceptions I need to do the following things:
Retry 3 times;
If all 3 retries failed - log appropriate message.
I found that configuration property spring.kafka.producer.retries available, and I believe it exactly what I need.
But have can I configure recovery method (method that will be executed when all retries failed)?
Probably that spring.kafka.producer.retries is not what you are looking for.
This auto-configuration property is mapped directly to ConsumerConfig:
map.from(this::getRetries).to(properties.in(ProducerConfig.RETRIES_CONFIG));
and then we go and read docs for that ProducerConfig.RETRIES_CONFIG property:
private static final String RETRIES_DOC = "Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error."
+ " Note that this retry is no different than if the client resent the record upon receiving the error."
+ " Allowing retries without setting <code>" + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION + "</code> to 1 will potentially change the"
+ " ordering of records because if two batches are sent to a single partition, and the first fails and is retried but the second"
+ " succeeds, then the records in the second batch may appear first. Note additionally that produce requests will be"
+ " failed before the number of retries has been exhausted if the timeout configured by"
+ " <code>" + DELIVERY_TIMEOUT_MS_CONFIG + "</code> expires first before successful acknowledgement. Users should generally"
+ " prefer to leave this config unset and instead use <code>" + DELIVERY_TIMEOUT_MS_CONFIG + "</code> to control"
+ " retry behavior.";
As you see spring-retry is fully not involved in the process and all the retries are done directly inside Kafka Client and its KafkaProducer infrastructure.
Although this is not all. Pay attention to the KafkaProducer.send() contract:
Future<RecordMetadata> send(ProducerRecord<K, V> record);
It returns a Future. And if we take a look closer to the implementation, we will see that there is a synchronous part - topic metadata request and serialization, - and enqueuing for the batch for async sending to Kafka broker. The mentioned ProducerConfig.RETRIES_CONFIG has an effect only in that Sender.completeBatch().
I believe that the Future is completed with an error when those internal retries are exhausted. So, you probably should think about using a RetryTemplate manually in the service method around KafkaTemplate to be able to control a retry (and recovery, respectively) around metadata and serialization which are really sync and blocking in the current call. The actual send you also can control in that method with retry, but if you call Future.get() to block it for a response or error from Kafka client on send.

ActiveMQ messageId not working to stop duplication

I am using ActiveMQ for messaging and there is one requirement that if message is duplicate then it should handled by AMQ automatically.
For that I generate unique message key and set to messageproccessor.
following is code :
jmsTemplate.convertAndSend(dataQueue, event, messagePostProccessor -> {
LocalDateTime dt = LocalDateTime.now();
long ms = dt.get(ChronoField.MILLI_OF_DAY) / 1000;
String messageUniqueId = event.getResource() + event.getEntityId() + ms;
System.out.println("messageUniqueId : " + messageUniqueId);
messagePostProccessor.setJMSMessageID(messageUniqueId);
messagePostProccessor.setJMSCorrelationID(messageUniqueId);
return messagePostProccessor;
});
As it can be seen code generates unique id and then set it to messagepostproccessor.
Can somehelp me on this, is there any other configuration that I need do.
A consumer can receive duplicate messages mainly for two reasons: a producer sent the same message more times or a consumer receive the same message more times.
Apache ActiveMQ Artemis includes powerful automatic duplicate message detection, filtering out messages sent by a producer more times.
To prevent a consumer from receiving the same message more times, an idempotent consumer must be implemented, ie Apache Camel provides an Idempotent consumer component that would work with any JMS provider, see: http://camel.apache.org/idempotent-consumer.html

DefaultMessageListenerContainer stops processing messages

I'm hoping this is a simple configuration issue but I can't seem to figure out what it might be.
Set-up
Spring-Boor 2.2.2.RELEASE
cloud-starter
cloud-starter-aws
spring-jms
spring-cloud-dependencies Hoxton.SR1
amazon-sqs-java-messaging-lib 1.0.8
Problem
My application starts up fine and begins to process messages from Amazon SQS. After some amount of time I see the following warning
2020-02-01 04:16:21.482 LogLevel=WARN 1 --- [ecutor-thread14] o.s.j.l.DefaultMessageListenerContainer : Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
The above warning gets printed multiple times and eventually I see the following two INFO messages
2020-02-01 04:17:51.552 LogLevel=INFO 1 --- [ecutor-thread40] c.a.s.javamessaging.SQSMessageConsumer : Shutting down ConsumerPrefetch executor
2020-02-01 04:18:06.640 LogLevel=INFO 1 --- [ecutor-thread40] com.amazon.sqs.javamessaging.SQSSession : Shutting down SessionCallBackScheduler executor
The above 2 messages will display several times and at some point no more messages are consumed from SQS. I don't see any other messages in my log to indicate an issue, but I get no messages from my handlers that they are processing messages (I have 2~) and I can see the AWS SQS queue growing in the number of messages and the age.
~: This exact code was working fine when I had a single handler, this problem started when I added the second one.
Configuration/Code
The first "WARNing" I realize is caused by the currency of the ThreadPoolTaskExecutor, but I can not get a configuration which works properly. Here is my current configuration for the JMS stuff, I have tried various levels of max pool size with no real affect other than the warings start sooner or later based on the pool size
public ThreadPoolTaskExecutor asyncAppConsumerTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setThreadGroupName("asyncConsumerTaskExecutor");
taskExecutor.setThreadNamePrefix("asyncConsumerTaskExecutor-thread");
taskExecutor.setCorePoolSize(10);
// Allow the thread pool to grow up to 4 times the core size, evidently not
// having the pool be larger than the max concurrency causes the JMS queue
// to barf on itself with messages like
// "Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers"
taskExecutor.setMaxPoolSize(10 * 4);
taskExecutor.setQueueCapacity(0); // do not queue up messages
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
return taskExecutor;
}
Here is the JMS Container Factory we create
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(SQSConnectionFactory sqsConnectionFactory, ThreadPoolTaskExecutor asyncConsumerTaskExecutor) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(sqsConnectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
// The JMS processor will start 'concurrency' number of tasks
// and supposedly will increase this to the max of '10 * 3'
factory.setConcurrency(10 + "-" + (10 * 3));
factory.setTaskExecutor(asyncConsumerTaskExecutor);
// Let the task process 100 messages, default appears to be 10
factory.setMaxMessagesPerTask(100);
// Wait up to 5 seconds for a timeout, this keeps the task around a bit longer
factory.setReceiveTimeout(5000L);
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
}
I added the setMaxMessagesPerTask & setReceiveTimeout calls based on stuff found on the internet, the problem persists without these and at various settings (50, 2500L, 25, 1000L, etc...)
We create a default SQS connection factory
public SQSConnectionFactory sqsConnectionFactory(AmazonSQS amazonSQS) {
return new SQSConnectionFactory(new ProviderConfiguration(), amazonSQS);
}
Finally the handlers look like this
#JmsListener(destination = "consumer-event-queue")
public void receiveEvents(String message) throws IOException {
MyEventDTO myEventDTO = jsonObj.readValue(message, MyEventDTO.class);
//messageTask.process(myEventDTO);
}
#JmsListener(destination = "myalert-sqs")
public void receiveAlerts(String message) throws IOException, InterruptedException {
final MyAlertDTO myAlert = jsonObj.readValue(message, MyAlertDTO.class);
myProcessor.addAlertToQueue(myAlert);
}
You can see in the first function (receiveEvents) we just take the message from the queue and exit, we have not implemented the processing code for that.
The second function (receiveAlerts) gets the message, the myProcessor.addAlertToQueue function creates a runnable object and submits it to a threadpool to be processed at some point in the future.
The problem only started (the warning, info and failure to consume messages) only started when we added the receiveAlerts function, previously the other function was the only one present and we did not see this behavior.
More
This is part of a larger project and I am working on breaking this code out into a smaller test case to see if I can duplicate this issue. I will post a follow-up with the results.
In the Mean Time
I'm hoping this is just a config issue and someone more familiar with this can tell me what I'm doing wrong, or that someone can provide some thoughts and comments on how to correct this to work properly.
Thank you!
After fighting this one for a bit I think I finally resolved it.
The issue appears to be due to the "DefaultJmsListenerContainerFactory", this factory creates a new "DefaultJmsListenerContainer" for EACH method with a '#JmsListener' annotation. The person who originally wrote the code thought it was only called once for the application, and the created container would be re-used. So the issue was two-fold
The 'ThreadPoolTaskExecutor' attached to the factory had 40 threads, when the application had 1 '#JmsListener' method this worked fine, but when we aded a second method then each method got 10 threads (total of 20) for listening. This is fine, however; since we stated that each listener could grow up to 30 listeners we quickly ran out of threads in the pool mentioned in 1 above. This caused the "Number of scheduled consumers has dropped below concurrentConsumers limit" error
This is probably obvious given the above, but I wanted to call it out explicitly. In the Listener Factory we set the concurrency to be "10-30", however; all of the listeners have to share that pool. As such the max concurrency has to be setup so that each listeners' max value is small enough so that if each listener creates its maximum that it doesn't exceed the maximum number of threads in the pool (e.g. if we have 2 '#JmsListener' annotated methods and a pool with 40 threads, then the max value can be no more than 20).
Hopefully this might help someone else with a similar issue in the future....

Rocketmq:MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]

when i send message to broker,this exception occasionally occurs.
MQBrokerException: CODE: 2 DESC: [TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while
This means broker is too busy(when tps>1,5000) to handle so many sending message request.
What would be the most impossible reason to cause this? Disk ,cpu or other things? How can i fix it?
There are many possible ways.
The root cause is that, there are some messages has waited for long time and no worker thread processes them, rocketmq will trigger the fast failure.
So the below is the cause:
Too many thread are working and they are working very slow to process storing message which makes the cache request is timeout.
The jobs it self cost a long time to process for message storing.
This may be because of:
2.1 Storing message is busy, especially when SYNC_FLUSH is used.
2.2 Syncing message to slave takes long when SYNC_MASTER is used.
In
/broker/src/main/java/org/apache/rocketmq/broker/latency/BrokerFastFailure.java you can see:
final long behind = System.currentTimeMillis() - rt.getCreateTimestamp();
if (behind >= this.brokerController.getBrokerConfig().getWaitTimeMillsInSendQueue()) {
if (this.brokerController.getSendThreadPoolQueue().remove(runnable)) {
rt.setStopRun(true);
rt.returnResponse(RemotingSysResponseCode.SYSTEM_BUSY, String.format("[TIMEOUT_CLEAN_QUEUE]broker busy, start flow control for a while, period in queue: %sms, size of queue: %d", behind, this.brokerController.getSendThreadPoolQueue().size()));
}
}
In common/src/main/java/org/apache/rocketmq/common/BrokerConfig.java, getWaitTimeMillsInSendQueue() method returns
public long getWaitTimeMillsInSendQueue() {
return waitTimeMillsInSendQueue;
}
The default value of waitTimeMillsInSendQueue is 200, thus you can just set it bigger to make the queue waiting for longer time. But if you wanna solve the problem completely, you should follow Jaskey's advice and check your code.

Using the majordomo broker with asynchronous clients

While reading the zeromq guide, I came across client code which sends 100k requests in a loop, and then receives the reply in a second loop.
#include "../include/mdp.h"
#include <time.h>
int main (int argc, char *argv [])
{
int verbose = (argc > 1 && streq (argv [1], "-v"));
mdp_client_t *session = mdp_client_new ("tcp://localhost:5555", verbose);
int count;
for (count = 0; count < 100000; count++) {
zmsg_t *request = zmsg_new ();
zmsg_pushstr (request, "Hello world");
mdp_client_send (session, "echo", &request);
}
printf("sent all\n");
for (count = 0; count < 100000; count++) {
zmsg_t *reply = mdp_client_recv (session,NULL,NULL);
if (reply)
zmsg_destroy (&reply);
else
break; // Interrupted by Ctrl-C
printf("reply received:%d\n", count);
}
printf ("%d replies received\n", count);
mdp_client_destroy (&session);
return 0;
}
I have added a counter to count the number of replies that the worker (test_worker.c) sends to the broker, and another counter in mdp_broker.c to count the number of replies the broker sends to a client. Both of these count up to 100k, but the client is receiving only around 37k replies.
If the number of client requests is set to around 40k, then it receives all the replies. Can someone please tell me why packets are lost when the client sends more than 40k asynchronous requests?
I tried setting the HWM to 100k for the broker socket, but the problem persists:
static broker_t *
s_broker_new (int verbose)
{
broker_t *self = (broker_t *) zmalloc (sizeof (broker_t));
int64_t hwm = 100000;
// Initialize broker state
self->ctx = zctx_new ();
self->socket = zsocket_new (self->ctx, ZMQ_ROUTER);
zmq_setsockopt(self->socket, ZMQ_SNDHWM, &hwm, sizeof(hwm));
zmq_setsockopt(self->socket, ZMQ_RCVHWM, &hwm, sizeof(hwm));
self->verbose = verbose;
self->services = zhash_new ();
self->workers = zhash_new ();
self->waiting = zlist_new ();
self->heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
return self;
}
Without setting the HWM and using the default TCP settings, packet loss was being incurred with just 50k messages.
The following helped to mitigate the packet loss at the broker:
Setting the HWM for the zeromq socket.
Increasing the TCP send/receive buffer size.
This helped only up to a certain point. With two clients, each sending 100k messages, the broker was able to manage fine. But when the number of clients was increased to three, they stopped receiving all the replies.
Finally, what has helped me to ensure no packet loss is to change the design of the client code in the following way:
A client can send upto N messages at once. The client's RCVHWM and broker's SNDHWM should be sufficiently high to hold a total of N messages.
After that, for every reply received by the client, it sends two requests.
You send 100k messages, and then begin to receive them. Thus, the 100k messages should be stored in a buffer. When the buffer is exhausted and cannot store anymore messages, you reach the ZeroMQ's high water mark. Behaviour on high water mark is specified in ZeroMQ documentation.
In case of the above code, the broker may discard some of the messages since a majordomo broker uses the ROUTER socket. One of resolutions would be split the send/receive loops into separated threads
Why lost?
In ZeroMQ v2.1, a default value for ZMQ_HWM was INF (infinity), which helped the said test to be somewhat meaningful but at a cost of heavy risk of memory-overflow crashes, as the buffer allocation policy was not constrained / controlled so as to hit some physical limit.
As of ZeroMQ v3.0+, ZMQ_SNDHWM / ZMQ_RCVHWM default to 1000, which can be set afterwards.
You may also read an explicit warning, that
ØMQ does not guarantee that the socket will accept as many as ZMQ_SNDHWM messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket.
Will splitting the sending / receiving part into separate threads help?
No.
Quick fix?
Yes, for the purpose of demo-test experimenting, set again infinite high-water marks, but be carefull to avoid such practice in any production-grade software.
Why to test a ZeroMQ performance in this way?
As said above, the original demo-test seems to have some meaning in its v2.1 implementation.
Since those days, ZeroMQ have evolved a lot. A very nice reading for your particular interest about performance envelopes, that may please building your further insight into this domain is in step by step guide with code examples on ZeroMQ protocol overheads/performance case-study on large file transfers
... we already run into a problem: if we send too much data to the ROUTER socket, we can easily overflow it. The simple but stupid solution is to put an infinite high-water mark on the socket. It's stupid because we now have no protection against exhausting the server's memory. Yet without an infinite HWM, we risk losing chunks of large files.
Try this: set the HWM to 1,000 (in ZeroMQ v3.x this is the default) and then reduce the chunk size to 100K so we send 10K chunks in one go. Run the test, and you'll see it never finishes. As the zmq_socket() man page says with cheerful brutality, for the ROUTER socket: "ZMQ_HWM option action: Drop".
We have to control the amount of data the server sends up-front. There's no point in it sending more than the network can handle. Let's try sending one chunk at a time. In this version of the protocol, the client will explicitly say, "Give me chunk N", and the server will fetch that specific chunk from disk and send it.
The best part, as far as I know, is in the commented progress of the resulting performance to the "model 3" flow-control and one can learn a lot from the great chapters and real-life remarks in the ZeroMQ Guide.

Resources