We have a situation where we set up a component to run batch jobs using spring batch remotely. We send a JMS message with the job xml path, name, parameters, etc. and we wait on the calling batch client for a response from the server.
The server reads the queue and calls the appropriate method to run the job and return the result, which our messaging framework does by:
this.jmsTemplate.send(queueName, messageCreator);
this.LOGGER.debug("Message sent to '" + queueName + "'");
try {
final Destination replyTo = messageCreator.getReplyTo();
final String correlationId = messageCreator.getMessageId();
this.LOGGER.debug("Waiting for the response '" + correlationId + "' back on '" + replyTo + "' ...");
final BytesMessage message = (BytesMessage) this.jmsTemplate.receiveSelected(replyTo, "JMSCorrelationID='"
+ correlationId + "'");
this.LOGGER.debug("Response received");
Ideally, we want to be able to call out runJobSync method twice, and have two jobs simultaneously operate. We have a unit test that does something similar, without jobs. I realize this code isn't very great, but, here it is:
final List result = Collections.synchronizedList(new ArrayList());
Thread thread1 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(1000);
result.add(Thread.currentThread().getName());
}
}, "thread1");
Thread thread2 = new Thread(new Runnable(){
#Override
public void run() {
client.pingWithDelaySync(500);
result.add(Thread.currentThread().getName());
}
}, "thread2");
thread1.start();
Thread.sleep(250);
thread2.start();
thread1.join();
thread2.join();
Assert.assertEquals("both thread finished", 2, result.size());
Assert.assertEquals("thread2 finished first", "thread2", result.get(0));
Assert.assertEquals("thread1 finished second", "thread1", result.get(1));
When we run that test, thread 2 completes first since it just has a 500 millisencond wait, while thread 1 does a 1 second wait:
Thread.sleep(delayInMs);
return result;
That works great.
When we run two remote jobs in the wild, one which takes about 50 seconds to complete and one which is designed to fail immediately and return, this does not happen.
Start the 50 second job, then immediately start the instant fail job. The client prints that we sent a message requesting that the job run, the server prints that it received the 50 second request, but waits until that 50 second job is completed before handling the second message at all, even though we use the ThreadPoolExecutor.
We are running transactional with Auto acknowledge.
Doing some remote debugging, the Consumer from AbstractPollingMessageListenerContainer shows no unhandled messages (so consumer.receive() obviously just returns null over and over). The webgui for the amq broker shows 2 enqueues, 1 deque, 1 dispatched, and 1 in the dispatched queue. This suggests to me that something is preventing AMQ from letting the consumer "have" the second message. (prefetch is 1000 btw)
This shows as the only consumer for the particular queue.
Myself and a few other developers have poked around for the last few days and are pretty much getting nowhere. Any suggestions on either, what we have misconfigured if this is expected behavior, or, what would be broken here.
Does the method that is being remotely called matter at all? Currently the job handler method uses an executor to run the job in a different thread and does a future.get() (the extra thread is for reasons related to logging).
Any help is greatly appreciated
not sure I follow completely, but off the top, you should try the following...
set the concurrentConsumers/maxConcurrentConsumers greater than the default (1) on the MessageListenerContainer
set the prefetch to 0 to better promote balancing messages between consumers, etc.
Related
I am trying to utilize the MassTransit batching technique to process multiple messages to reduce the individual queries to be database (read and write).
If there is an exception while processing one/more of the messages, then the expectation is to fault only required messages and have the ability to process the rest of the messages.
This is common scenario in my use case ,what I am trying to establish here is a way to perform batch processing that caters for poisoned messages
For example, if I have a batch size of 10 messages, 10 in the queue and 1 persistently fails, I still need a means of ensuring the other 9 can be processed successfully. It is fine if all 10 need to be returned to the queue and subset re-consumed - but the poisoned message needs to be eliminated somehow. Does this requirement discount the use of batching?
I have tried below, however did solve my use case.
catching the exception and raising NotifyFaulted for that specific message.
modified sample-twitch application, to throw an exception to something like below , based on https://github.com/MassTransit/Sample-Twitch/blob/master/src/Sample.Components/BatchConsumers/RoutingSlipBatchEventConsumer.cs
file.
public Task Consume(ConsumeContext<Batch<RoutingSlipCompleted>> context)
{
if (_logger.IsEnabled(LogLevel.Information))
{
_logger.Log(LogLevel.Information, "Routing Slips Completed: {TrackingNumbers}",
string.Join(", ", context.Message.Select(x => x.Message.TrackingNumber)));
}
for (int i = 0; i < context.Message.Length; i++)
{
try
{
if (i % 2 != 0)
throw new System.Exception("business error -message failed");
}
catch (System.Exception ex)
{
context.Message[i].NotifyFaulted(TimeSpan.Zero, "batch routing silp faulted", ex);
}
}
return Task.CompletedTask;
}
I have dig into a few more threads that look similar to the issue ,for reference.
Masstransit error handling for batch consumer
If you want to use batch, and have a message in that batch that cannot be processed, you should catch the exception and do something else with the poison message. You could write it someplace else, publish some type of event, or whatever else. But MassTransit does not allow you to partially complete/fault messages of a batch.
I have subscriber which collects the messages until reaches the specified limit and then pass collected messages to the processor to perform some operations. Code works fine, problem is subscriber waits Until it collects specified number messages. If we have lesser message program control will not pass to processor.
For example Lets say my chunk size is 100 and if I have 100 or multiple of 100 messages then program works fine But if I have messages < 100 or 150 some of messages are read by subscriber but they were never passed to processor. Is there way I can figure-out is that Queue is empty using rabbit template so that I can check that condition and break the loop
#RabbitListener(id="messageListener",queues = "#{rabbitMqConfig.getSubscriberQueueName()}",containerFactory="queueListenerContainer")
public void receiveMessage(Message message, Channel channel, #Header("id") String messageId,
#Header("amqp_deliveryTag") Long deliveryTag) {
LOGGER.info(" Message:"+ message.toString());
if(messageList.size() < appConfig.getSubscriberChunkSize() ) {
messageList.add(message);
deliveryTagList.add(deliveryTag);
if(messageList.size() == appConfig.getSubscriberChunkSize()) {
LOGGER.info("------------- Calling Message processor --------------");
Message [] messageArry = new Message[messageList.size()];
messageArry = messageList.toArray(messageArry);
LOGGER.info("message Array Length: "+messageArry.length);
messageProcessor.process(messageArry);
messageList = new ArrayList<Message>(Arrays.asList(messageArry));
LOGGER.info("message Array to List conversion Size: "+messageList.size());
LOGGER.info("-------------- Completed Message processor -----------");
eppQ2Publisher.sendMessages(messageList, channel, deliveryTagList);
messageList.clear();
deliveryTagList.clear();
}
} else {
// do nothing..
}
There are two ways to achieve this.
Add an #EventListener to listen for ListenerContainerIdleEvents which are published when no messages have been received for some time; set the container's idleEventInterval property. The source of the event is the listener container; it contains the #RabbitListener's id. See Detecting Idle Consumers.
Use RabbitAdmin.getQueueProperties().
You can use RabbitAdmin.getQueueInfo("queue name").getMessageCount() that will be 0 for empty queue.
I have a single ActorA that reads from an input stream and sends messages to a group of ActorB's. When ActorA reaches the end of the input stream it cleans up its resources, broadcasts a Done message to the ActorB's, and shuts itself down.
I have approx 12 ActorB's that send messages to a group of ActorC's. When an ActorB receives a Done message from ActorA then it cleans up its resources and shuts itself down, with the exception of the last surviving ActorB which broadcasts a Done message to the ActorC's before it shuts itself down.
I have approx 24 ActorC's that send messages to a single ActorD. Similar to the ActorB's, when each ActorC gets a Done message it cleans up its resources and shuts itself down, with the exception of the last surviving ActorC which sends a Done message to ActorD.
When ActorD gets a Done message it cleans up its resources and shuts itself down.
Initially I had the ActorB's and ActorC's immediately propagate the Done message when they received it, but this might cause the ActorC's to shut down before all of the ActorB's have finished processing their queues; likewise the ActorD might shut down before the ActorC's have finished processing their queues.
My solution is to use an AtomicInteger that is shared among the ActorB's
class ActorB(private val actorCRouter: ActorRef,
private val actorCount: AtomicInteger) extends Actor {
private val init = {
actorCount.incrementAndGet()
()
}
def receive = {
case Done => {
if(actorCount.decrementAndGet() == 0) {
actorCRouter ! Broadcast(Done)
}
// clean up resources
context.stop(self)
}
}
}
ActorC uses similar code, with each ActorC sharing an AtomicInteger.
At present all actors are initialized in a web service method, with the downstream ActorRef's passed in the upstream actors' constructors.
Is there a preferred way to do this, e.g. using calls to Akka methods instead of an AtomicInteger?
Edit: I'm considering the following as a possible alternative: when an actor receives a Done message it sets the receive timeout to 5 seconds (the program will take over an hour to run, so delaying cleanup/shutdown by a few seconds won't impact the performance); when the actor gets a ReceiveTimeout it broadcasts Done to the downstream actors, cleans up, and shuts down. (The routers for ActorB and ActorC are using a SmallestMailboxRouter)
class ActorB(private val actorCRouter: ActorRef) extends Actor {
def receive = {
case Done => {
context.setReceiveTimeout(Duration.create(5, SECONDS))
}
case ReceiveTimeout => {
actorCRouter ! Broadcast(Done)
// clean up resources
context.stop(self)
}
}
}
Sharing actorCount among related actors is not good thing to do. Actor should only be using its own state to handle messages.
How about having ActorBCompletionHanlder actor for actor of type ActorB. All ActorB will have reference to ActorBCompletionHanlder actor. Every time ActorB receives Done message it can do necessay cleanup and simply pass done message to ActorBCompletionHanlder. ActorBCompletionHanlder will maintain state variale for maintaining counts. Everytime it receives done message it can simply update counter. As this is solely state variable for this actor no need to have it atomic and that way no need for any explicit locking. ActorBCompletionHanlder will send done message to ActorC once it receives last done message.
This way sharing of activeCount is not among actors but only managed by ActorBCompletionHanlder. Same thing can be repeated for other types.
A-> B's -> BCompletionHanlder -> C's -> CCompletionHandler -> D
Other approach could be to have one monitoring actor for evey related group of actors. And using watch api and child terminated event on monitor you can chose to decide what to do once you receive last done message.
val child = context.actorOf(Props[ChildActor])
context.watch(child)
case Terminated(child) => {
log.info(child + " Child actor terminated")
}
I've written a Continuous JMS Message reveiver :
Here, I'm using CLIENT_ACKNOWLEDGE because I don't want this thread to acknowledge the messages.
(...)
connection.start();
session = connection.createQueueSession(true, Session.CLIENT_ACKNOWLEDGE);
queue = session.createQueue(QueueId);
receiver = session.createReceiver(queue);
While (true) {
message = receiver.receive(1000);
if ( message != null ) {
// NB : I can only pass Strings to the other thread
sendMessageToOtherThread( message.getText() , message.getJMSMessageID() );
}
// TODO Implement criteria to exit the loop here
}
In another thread, I'll do something as follows (after successful processing) :
This is in a distinct JMS Connection executed simultaneously.
public void AcknowledgeMessage(String messageId) {
if (this.first) {
this.connection.start();
this.session = this.connection.createQueueSession( false, Session.AUTO_ACKNOWLEDGE );
this.queue = this.session.createQueue(this.QueueId);
}
QueueReceiver receiver = this.session.createReceiver(this.queue, "JMSMessageID='" + messageId + "'");
Message AckMessage = receiver.receive(2000);
receiver.close();
}
It appears that the message is not found (AckMessage is null after timeout) whereas it does exist in the Queue.
I suspect the message to be blocked by the continuous input thread.. indeed, when firing the AcknowledgeMessage() alone, it works fine.
Is there a cleaner way to retrieve 1 message ? based on its QueueId and messageId
Also, I feel like there could be a risk of memory leak in the continuous reader if it has to memorize the Messages or IDs during a long time.. justified ?
If I'm using a QueueBrowser to avoid impacting the Acknowledge Thread, it looks like I cannot have this continuous input feed.. right ?
More context : I'm using ActiveMQ and the 2 threads are 2 custom "Steps" of a Pentaho Kettle transformation.
NB : Code samples are simplified to focus on the issue.
Well, you can't read that message twice, since you have already read it in the first thread.
ActiveMQ will not delete the message as you have not acknowledge it, but it won't be visible until you drop the JMS connection (I'm not sure if there is a long timeout here as well in ActiveMQ).
So you will have to use the original message and do: message.acknowledge();.
Note, however, that sessions are not thread safe, so be careful if you do this in two different threads.
The run method of my worker role is:
public override void Run()
{
Message msg=null;
while (true)
{
msg = queue.GetMessage();
if(msg!=null && msg.DequeueCount==1){
//delete message
...
//execute operations
...
}
else if(msg!=null && msg.DequeueCount>1){
//delete message
...
}
else{
int randomTime = ...
Thread.Sleep(randomTime);
}
}
}
For performance tests I would that a message could be analysed only by a worker (I don't consider failure problems on workers).
But seems by my tests, that two workers can pick up the same message and read DequeueCount equals to 1 (both workers). Is it possible?
Does exist a way that allow just a worker to read a message in a "mutex" way?
How is your "getAMessage(queue)" method defined? If you do PeekMessage(), a message will be visible by all workers. If you do GetMessage(), the message will be got only by the worker which firsts get it. But for the invisibility timeout either specified or the default (30 sec.). You have to delete the message before the invisibility timeout comes.
Check out the Queue Service API for more information. I am sure that there is something wrong in your code. I use queues and they behave as by documentation in dev storage and in production storage. You may want to explicitly put higher value of the Visibility Timeout when you do GetMessage. And make sure you do not sleep longer than the visibility timeout.