Transactional Executor with Retry? - spring

Please a bit of advice on the following:
I am using a ThreadPoolTaskExecutor to execute slow-external tasks like sending emails.
I need to improve this:
1) When the a task is passed to the exeuctor, it has to wait executing it till at least the transaction of the passing operation has finished.
Example: i makes no sense to email something when the order process fails and generates an exception which occurs on commitment
2) When the task fails, some retry mechanism is used to try the task again.
Example: when sending the email fails, it will be retried after 5,10 minutes and then throws an exception.
How to deal with these issues? of should I just integrate some queue that offers this functionality?..
Ed

I would say : yes use a queue in a messaging infrastructure.
Personally I would use Camel for this because I am completely smitten by Camel and would use it if I would reprogram my toaster to toast the slices golden brown at breakfast.
Since you are going to mail, it will be message based anyway, so using a message based system will already reduce the impedance mismatch.
Now things as transactions, retries and parking the message on a dead letter queue comes as standard with these things. This is nice, because you can then script your way out of trouble when a email server disaster hits, by resubmitting the messages from the dead letter queue.
Integrating an ActiveMQ or a Camel is just adding a couple of dependencies and 5-10 lines in your spring configuration.
Once it is in there it is beautiful to organize background processing, notify remote systems, automate email responses, notify sysadmins of impending doom, ... You send a message, continue what you're doing, respond to the customer, while in the background the wheels are turning.
Ok, sorry : I got carried away and got way too lyrical.

Related

How to know the the running status of a spring integration flow

I have a simple integration flow that poll data based on a cron job from database, publish on a DirectChannel, then do split and transformations, and publish on another executor service channel, do some operations and finally publish to an output channel, its written using dsl style.
Also, I have an endpoint where I might receive an http request to trigger this flow, at this point I send the messages one of the mentioned channels to trigger the flow.
I want to make sure that the manual trigger doesn’t happen if the flow is already running due to either the cron job or another request.
I have used the isRunning method of the StandardIntegrationFlow, but it seems that it’s not thread safe.
I also tried using .wireTap(myService) and .handle(myService) where this service has an atomicBoolean flag but it got set per every message, which is not a solution.
I want to know if the flow is running without much intervention from my side, and if this is not supported how can I apply the atomic boolean logic on the overall flow and not on every message.
How can I simulate the racing condition in a test in order to make sure my implementation prevent this?
The IntegrationFlow is just a logical container for configuration phase. It does have those lifecycle methods, but only for an internal framework logic. Even if they are there, they don't help because endpoints are always running if you want to do them something by some event or input message.
It is hard to control all of that since it is in an async state as you explain. Even if we can stop a SourcePollingChannelAdapter in the beginning of that flow to let your manual call do do something, it doesn't mean that messages in other threads are not in process any more. The AtomicBoolean cannot help here for the same reason: even if you set it to true in the MessageSourceMutator.beforeReceive() and reset back to false in its afterReceive() when message is null, it still doesn't mean that messages you pushed down in other thread are already processed.
You might consider to use an aggregator for AtomicBoolean resetting in the end of batch since you mention that you pull data from DB, so perhaps there is a number of records per poll you can track downstream. This way your manual call could be skipped until aggregator collects results for that batch.
You also need to think about stopping a SourcePollingChannelAdapter at the moment when manual action is permitted, so there won't be any further race conditions with the cron.

ActiveMQ- Can I Replace The Scheduler Plugin With A Delayed Message Queue?

I worked a little with the ActiveMQ scheduler plugin. This simplifies scheduling messages for delivery with a delay at low volume, but as I get into the 100ks of messages the system breaks down in two key ways.
It's very slow (compared to queues) to enqueue messages in the scheduler.
Attempting to view the schedules in the dashboard crashes the ActiveMQ instance.
The existing scheduler feels a little bolted on and does not perform as expected. So, rethinking the problem I would like to have a jobs and jobs-scheduled queue. Messages sent to the jobs-scheduled queue will have a ttl header with the unix timestamp for when it should be delivered. A process will run on a cron job which will take messages from the jobs-scheduled queue and send it to the jobs queue using a selector to just pick out the messages with an elapsed ttl convert_string_expressions:ttl < %(now)s.
My two questions are:
Will this strategy work for delaying messages at scale or will I find scaling pains around the selector? These messages will be persisted if that makes a difference.
Is there an existing feature in ActiveMQ that will allow me to send messages from one queue to another with a selector query?
ActiveMQ is a message broker not a job scheduler so what you are trying to do is really outside the scope of the what the broker is intended to do. Yes ActiveMQ does have a scheduled message feature but this is not intended for large scale job queue type work, it is a simple feature to provide some minimal delayed delivery.
What you are looking for sounds more like Quartz or some other batch job scheduling library. You could develop your own Job scheduler implementation for ActiveMQ or do something in a plugin but you are really trying to run against the grain of what a broker is meant to do which is deliver messages as quickly as possible in a decoupled manner.
Side note-- potentially off-topic.
I've had to solve a similar situation in the past where it made a lot of sense to load up the queues with messages ahead of time to cut down on the total transfer time.
I solved it by using Camel routes and a side-channel activation. Camel allows you to programmatically start and stop routes, so you can load up a queue with no consumers for the data for a given time period. Then using a dedicated queue for control you send the 'start' message. The control route receives the 'start' message, and then activates the main data processing route. You then need to configure some sort of 'stop' message semantic to be ready for the next time periods run.
Effectively, you get the delayed behavior pattern with much more control over scheduling and cut down on the data-to-queue loading time problem. You can also solve the scaling problem by loading the data across more than one queue.

How do I achieve a redelivery delay in azure service bus with amqp using rhea

I'm using rhea in a nodejs application to send messages around over Azure Service Bus using AMQP. My problem is as follows:
Sometimes a message processing attempt can fail because of something that is out of our hands. For instance, a call to some API could fail because a service is down. At that point we unlock the message so it can be picked up at a later time or by another instance. After a certain amount of retries (when delivery-count has hit a certain max) it just ends up in DLQ.
What I want to achieve is that between each delivery attempt there is an increasing pause so the X amount of retries don't just occur in rapid succession until the max is hit. This way I can give whatever is causing the failure some time to come back up if it's just a matter of waiting for some service to become available again. If that doesn't work the message can go to DLQ anyway.
Is there some setting in azure service bus that will achieve this or will I have to program this into my own application?
if you explicitly want to delay processing you can en-queue a new message with ScheduledEnqueueTime set of later delivery (using the message.Clone() function can help in creating the cloned message). You also have the ability to call message.Defer() and will not deliver this message again until you call Receive(Sequenceid) for that specific message at a later time .

what are the retry settings for subscriber in pubsub and how to set them correctly in a spring application?

I have a spring service subcribing for messages from a topic in the google cloud pubsub (pulling). It is working correctly in general. But I want to have more control over resent messages. My service need sometimes to nack the message or just let the ackDeadline pass so that I would get the message later on again. While testing with single messages, the nacked message comes back to me almost immidetaly, and the ones I don't ack or nack at all, come back after 10 sec default for ackDeadline. I would like it to postpone the repeated consuming of these messages. I thought the retry setting are designed for such cases.
I should mention as well that I am currently testing locally with an emulator and create the subscription from code. I am using the PubSubAdmin for managing.
According to this docu I have tried to set those configuration in my profile config. like this:
spring.cloud.gcp.pubsub.subscriber.retry.initial-retry-delay-second: 4
spring.cloud.gcp.pubsub.subscriber.retry.max-attempts: 5
spring.cloud.gcp.pubsub.subscriber.retry.initial-rpc-timeout-seconds: 4
spring.cloud.gcp.pubsub.subscriber.retry.max-rpc-timeout-seconds: 8
spring.cloud.gcp.pubsub.subscriber.retry.max-retry-delay-seconds: 7
spring.cloud.gcp.pubsub.subscriber.retry.total-timeout-seconds: 3000
but it had no effect on the time of reoccuring of the messages.
Do I understand the meaning of retry settings wrongly? maybe they only take effect if there are some connection problems but not in nacking or lacking of acknowledgment cases? Or do I have to set the setting while using deploymentManager for creating the subscriptions and am not allowed to set them from the code? Or maybe setting them in (development) profile configs won't work with the PubSubAdmin?
Thanks for any suggestions!
edit: I want the first retry to happen after 5 seconds, but next retry 10 seconds later, etc. Plus I want to set the max retry number. So what I am not interested in is setting the ackDeadline just to a bigger number.
edit2: why nacking: one of the services (let's call it a bridge) is subscribing for the messages, has to validate each message and if ok pass it to another external system. this service is acting as a bridge for this system, as we can't work on this second system directly. in some cases the message need some extra information, so the bridge will try to fetch it somewhere else (there are a lot of microservices included) and it happens sometimes, that at this moment in time the extra information is not there (yet). So the first idea was to not ack the message and let it come later again. but I don't want to ask every 10 sec for the next 7 days (with ackDeadline), I want to just try few times, and if it is not there after 2 hours, it will never came. so we tried to nack and hoped those retry settings can help to manage the resending. But as they don't, I suppose the only way to go will be to build something for managing these messages in the bridge by myself. Maybe store message ids and the number of retry so that I can ack after for example 5 times and push the message to another topic to deal with it differently. Or are there any better solutions known?
Cloud Pub/Sub does not provide exponential backoff for specific messages. A nack has no effect other than to tell Cloud Pub/Sub that you were not able to handle the message.
I could provide a more useful answer if you were to document why you needed to nack the messages. If you are unable to handle the current load, you can use the flow control options described here to reduce the number of outstanding messages or bytes to your client. If you have messages that are known to be bad, you should instead ack them after pushing to another dead letter topic to be handled separately.
Response to edit 2:
If you have this scenario where the action to supplement the messages can fail, implement whatever backoff mechanism you want on that action yourself in your service. Set the max ack extension period when constructing your subscriber (setMaxAckExtensionPeriod in java) to ensure that your client will extend the ack deadline for each message long enough for your chain of retries.
Edit 2
Note that Pub/Sub now has built in support for Dead Lettering.
You can use PubSubSubscriberTemplate.modifyAckDeadline() to programmatically extend the deadlines of a batch of messages retrieved through pull. Each individual AcknowledgeablePubsubMessage also has a modifyAckDeadline() method, if you only need to extend deadline for a select few stragglers.
If all messages on that particular subscription need to have a longer acknowledgement period, a default can be set in GCP Console by editing the subscription and updating the "Acknowledgement Deadline" field.

How to get a return value from a send.Message and include the returned value as part of second message in MSMQ?

I'm pretty new to MSMQ 4.0. I got stuck with below scenario;
Service A takes User Details and Returns an User ID.
Then Service B takes Billing detials with User ID.
Now I have to Queue these steps. I'm planning to use Transaction Queue.
Could some one please help me with
1)Get the ID from first message and include it in the second message.
2)If at least one step failed I have to rollback(transaction Queue does it for me) retry or 5 times and if it still failed then move it to VerifyAdminQueue for verification by Admin.I dont like using DeadLetter Queue etc.,
Thanks in advance.
Services built with MSMQ queues are truly one-way. This means that there is no built in concept of a response. There are many ways you can implement a request-response communication pattern using MSMQ but with all of them you will need to construct and send the response back to the caller yourself.
With one way actions, rollback is very simple, and indeed MSMQ will rollback any failed steps in the transmission of a message. More complex operations such as request-response however lack any concept of a transaction in MSMQ and so any rollback across more than one message transmission steps will require you to write compensatory code.

Resources