How to suspend Mass Transit processing messages from the queue - masstransit

I have a Mass Transit Service Bus that is listening to several queues and processing the messages. I would like to somehow pause the processing of new requests and wait for the current requests to complete so that I can run some housekeeping tasks.
A few of my own thoughts:
I have investigated the service bus BeforeConsumingMessage handler and although this would allow me to check for a 'Pause processing' flag in my database, I unsure how I would then actually pause the processing!
We are using RabbitMQ - could I use this to put the queues in a suspend state?
I have found so little on this subject that I wonder if it is an 'anti-pattern' and I should just stop my Mass Transit services if I want to run some housekeeping jobs and trust in any partially complete sagas to be picked up when the service bus starts back up. (Rather not go for this option, though).
So my question is: Is there a way to instruct the service bus to finish processing the current sagas but do not take any more messages from the queue?

Cleanly shutting down a MT service will wait for any messages in process to completely finished. Why sometimes it takes a little while for a service to shutdown. Shutting down the service is the best way to handle this, you are sure MT is not pulling any new messages.
If your sagas are serialized to a backing store, e.g. NHibernate, then the state will be saved until the service is restarted and the sagas will pick up in the state they were left after the last message was processed. You should be in good shape. We do this all the time for any maintenance periods.
If you REALLY must leave the service running, call Dispose on the IServiceBus instance. This will do the same thing, letting the current consumers finish then releasing all your resources. Once you have done maintenance you can create a new IServiceBus instance as needed.

Related

Service synchronization issue

I've created two services.
One of them (scheduler) only requests to the other (backoffice) for performing some "large" operations.
When backoffice receives a request:
first creates a mark (key on redis) in order to set that the process has started.
Each time a request is reached:
backoffice checks if the mark exist.
When it exists means that the previous process has not yet finished, and escape it.
Perform the large process.
When process is finished, the previous key in redis is removed.
It would be something like this:
if (key exists)
return;
make long process... (1);
remove key;
The problem arises when service is destroyed when the process has not already finished and then it doesn't removes the mark on redis. It means the process will never run again.
Is there any way to solve this kind of problems?
The way to solve this problem is use an existing engine as building custom scalable and robust solution for reliable service orchestration is really hard.
I recommend looking at Uber Cadence Workflow which would allow to convert your pseudocode into a real production application with minor changes.
You can fire a background job that updates timestamp under the key, e.g. every minute.
When service attempts to start the process it must verify key existence (as it does now) + timestamp under the key. If it is more than 1 minute ago then the previous attempt is stale and you can start over.
Sounds like you should be using a messaging queue to schedule tasks for the back office service. Queuing solutions like RabbitMQ allow you to manually acknowledge (or “ack”) that the process is complete. Whenever a subscriber crashes, the queue detects that the connection dropped without acknowledgement and will re-enqueue the same task which will be picked up by the next available subscriber. Here’s another thread talking about this problem specifically focused on messaging queues:
What happens to fetched messages when RabbitMQ consumer crashes?

How to pause #RabbitListener without destroying AnonymousQueue

I have a Spring Boot application that is generally message driven but on special occasions, the incoming messages need to be stopped. However I cannot loose those messages, I need to buffer them and receive them in the correct order later.
There are myriads of questions asked and answered about stopping the Listener via the ListernerEndpointRegistry, such as here.
However when I stop the container the AnonymousQueue seems to disappear. I want the queue to stay on the exchange and buffer any messages, and receive them, when I restart. Is this possible or do I need to buffer them inside my application?
There are two options.
Don't use an anonymous (auto-delete) queue and use stop/start.
Don't stop the container; simply block the listener thread(s) when you want to suspend message delivery and wake them when you want to restart.
However I cannot lose those messages
If you cannot lose messages, you should NEVER use an auto-delete queue - you can lose messages at any time if you have a simple network glitch.

How do I achieve a redelivery delay in azure service bus with amqp using rhea

I'm using rhea in a nodejs application to send messages around over Azure Service Bus using AMQP. My problem is as follows:
Sometimes a message processing attempt can fail because of something that is out of our hands. For instance, a call to some API could fail because a service is down. At that point we unlock the message so it can be picked up at a later time or by another instance. After a certain amount of retries (when delivery-count has hit a certain max) it just ends up in DLQ.
What I want to achieve is that between each delivery attempt there is an increasing pause so the X amount of retries don't just occur in rapid succession until the max is hit. This way I can give whatever is causing the failure some time to come back up if it's just a matter of waiting for some service to become available again. If that doesn't work the message can go to DLQ anyway.
Is there some setting in azure service bus that will achieve this or will I have to program this into my own application?
if you explicitly want to delay processing you can en-queue a new message with ScheduledEnqueueTime set of later delivery (using the message.Clone() function can help in creating the cloned message). You also have the ability to call message.Defer() and will not deliver this message again until you call Receive(Sequenceid) for that specific message at a later time .

Send a message from one microservice to another in Azure Service Fabric (APIs)

What is the best architecture, using Service Fabric, to guarantee that the message I need to send from Service 1 (mostly API) to Service 2 (mostly API) does not get ever lost (black arrow)?
Ideas:
1
1.a. Make service 1 and 2 stateful services. Is it a bad call to have a stateful Web API?
1.b. Use Reliable Collections to send the message from API code to Service 2.
2
2.a. Make Service 1 and 2 stateless services
2.b. Add a third service
2.c. Send the message over a queuing system (i.e.: Service Bus) from service 1
2.d. To be picked up by the third service. Notice: this third service would also have access to the DB that service 2 (API) has access to. Not an ideal solution for a microservice architecture, right?
3
3.a. Any other ideas?
Keep in mind that the goal is to never lose the message, not even when service 2 is completely down or temporary removed… so no direct calls.
Thanks
I'd introduce a third (Stateful) service that holds a queue, 'service 3'.
Service 1 would enqueue the message. Service 3 would run an infinite loop, trying to deliver the message to service 2.
You could use the pub/sub package for this. Service 1 is the publisher, Service 2 is the subscriber.
(If you rely on an external queue system like Service Bus, you'll lower the overall availability of the system. Service Bus downtime would lead to messages being undeliverable.)
​
I think that there is never completely any solution that is 100% sure to never loose a message between two parties. Even if you had a service bus for instance in between two services, there is always the chance (possibly very small, but never null) that the service bus goes down, or that the communication to the service bus goes down. With that being said, there are of course models that are less likely to very seldom loose a message, but you can't completely get around the fact that you still have to handle errors in the client.
In fact, Service Fabric fault handling is mainly designed around clients retrying communication, rather than having the service or an intermediary do that. There are many reasons for this (I guess) but one is the nature of distributed, replicated, reliable services. If a service primary goes down, a replica picks up the responsibility, but it won't know what the primary was doing right at the moment it died (unless it replicated over it's state, but it might have died even before that). The only one that really knows what it wants to do in this scenario is the client. The client knows what it is doing and can react to different fault scenarios in te service. In Fabric Transport, most know exceptions that could "naturally" occur, such as the service dying or the network cable being cut of by the janitor are actuallt retried automatically. This includes re-resolving the address just in case the service primary was replaced with a secondary.
The same actually goes for a scenario where you introduce a third service or a service bus. What if the network goes down before the message has completely reached the service? In this case only the client knows that something went wrong and what it intended to send. What if it goes down after it reached the service but before the response was sent? In this case the client has to assume the message never reached and try to resend it. This is also why service methods are recommended to be idempotent - the same call can be made a number of times by the same client.
Even if you were to introduce a secondary part, like the service bus, there is still the same risk that the service bus goes down, or more likely, the network connecting to the service bus goes down. So, client needs to retry, and when it has retried a number of times, all it can do is put the message in a queue of failed messages or simply just log it, or throw an exception back to the original caller (in your scenario, the browser).
Ok, that's was me being pessimistic. But it could happen. All of the things above, its just that some are not very likely to happen. But they might happen.
On to your questions:
1) the problem with making a stateless service stateful is that you now have to handle partitions in your caller. You can put up Http listeners for stateful services, but you have to include the partition and replica information in the Uri, and that won't work with the load balancer, so in this case the browser has to select partition when calling the API. Not an ideal solution.
2) yes, you could do this, i.e. introduce something else in between that queues messages for you. There is nothing that says that a Service Bus or a Database is more reliable than a Stateful service with a reliable queue there, it's just up to you to go for what you are most comfortable with. I would go for a Stateful service, just so I can easily keep everything within my SF application. But again, this is not 100% protection from disgruntled janitor with scissors, for that you still need clients that can handle faults.
3) make sure you have a way of handling the errors (retry) and logging or storing the messages that fail (after retries) with the client (Service 1).
3.a) One way would be to have it store it localy on the node it is running and periodically (RunAsync for instance) try to re-run those failed messages. This might be dangerous in the scenario where the node it is running on is completely nuked and looses it data though, that data won't be replicated.
3.b) Another would be to use semantic logging with ETW and include enough data in the events to be able to re-create the message from the logged and build some feature, a manual UI perhaps, where you can re-run it from the logged information. Much like you would retry a failed message on an error queue in a service bus.
3.c) Store the failed messages to anything else (database, service bus, queue) that doesn't fail for the same reasons your communication with Service 2.
My main point here is (and I could maybe have started with that) is that there are plenty of scenarios where only the client knows enough to handle the situation. So, make sure you have a strategy for handling faults in your clients.

Scheduling a MDB

I'm looking for a way to schedule a MDB. My requirement is that the MDB is set to feed a system from the company. This system goes out for maintenance every night, but the other systems don't know about it and may keep trying to feed it. A persistent queue is great in the way that my messages could be pilled until system goes back online.
How could I manage that? I've run into that already: schedule a message driven bean to access a queue during certain times? but it uses java 7, and worst, message is lost if the server restarts (messages is taken out of the JMS Queue and kept in memory until timer process it).
Another use of this would be to implement a "retry" queue. In case of error I want to retry processing my message, but not immediately, after a certain amount time only.
Any ideas to keep my MDB offline for a certain amount of time?
Most versions of JBoss publish a management MBean that allows you to stop delivery on a MDB.
If you're using EJB3, however, they auto-start, so you will need to register a startup class to stop starting MDBs at boot time if boots occur in your MDB's blackout period. Once past that snafu, you can schedule a simple quartz job to start and stop the MDBs according to your delivery windows.
Well, it looks like there is no way to pause a MDB in a generic way. The best solution is, like most people will answer, to use the DLQ (or DMQ).
Now, if I want to introduce a timer on a message, I set the time to live of the producer to the amount of time I want the message to wait. Then I send it to a normal queue, lets say waitingQueue which has no consumer. After expiration, the message is sent to default destination (mq.sys.dmq for Glassfish MQ, make sure to create a jms resource with mq.sys.dmq as imqDestinationName). I have a MDB listening to the error queue and responsible of sending the message again. Now, if I want to "close" a queue for some time, when a message arrives in the queue, I check if current time is allowed or not. Just set the time to live to the amount of time before next opening hours and send it to waitingQueue.
The reason I didn't use it since the beginning is that I fell into a few pitfalls. Here are a few useful properties to set when using DMQ with Glassfish 3.1.1 and its embedded MQ.
imq.message.expiration.interval=1 that's for the poll interval on each queue before sending timed out messages to the DMQ. Default is 60 seconds. If like me you want to test your application with little latency, this is what you need.

Resources