We have a RabbitMQ cluster (3 nodes, with 1 primary and 2 secondary, RabbitMQ version - 3.7.24) between 2 of our Microservices. The first service produces a message and pushes the messages to 2 different HA enabled queue. The second micro service consumes the message from both the queues. It was running good for several months until few days back.
2 days back our first service pushed around 100 messages at different time to both the queues for the whole day, out of which, around 60 were consumed immediately but the rest 40 messages were delivered/consumed only on the next day, all together at the same time from both the queues.
We dont have any delay settings and our microservices were not restarted for several months and there is no RabbitMQ connection issue as well as there were messages received throughout the day. Its very hard to find why this happened ? Do someone have any clue how this could have happened ? Thank you.
Related
I have a question about some strange behaviour of consumer.
Recently we had strange situation on production environment. Two consumers on two different microservices were stuck at some messages. The first one was holding 20 messages from rabbitMQ queue and the second one 2 messages and they weren't processing them. These messages were visible as Unacked in RabbitMQ for two days. They went back to Ready state just when that two microservices were restarted. At that time when consumers took this messages the whole program was processing thousands messages per hour, so basically our Saga and all consumers were working. When these messages went back to Ready state they were processed in one second after that so I don't think that it's problem with them.
The messages are published by Saga to Exchange and besides these two stucked consumers we have also EventLogger consumer subscribed to all messages and this EventLogger processed this 22 messages normally without any problems (from his own queue). Also we have connected Application Insights to consumers and there is no information about receiving these 22 messages by these two consumers (there are information about receiving it by EventLogger).
The other day we had the same issue with one message on test environment.
Recently we updated version of MassTransit in our project from version 6.2.0 to 7.1.6 and before that we didn't notice any similar issues with consumers but maybe it's just coincidence. We also have retry, redelivery, circuit breaker and in memory outbox mechanisms but I don't think that's problem with them because the consumer didn't even start to process these 22 messages.
Do you have any suggestions what could happened to this consumers?
Usually when a consumer doesn't even start to consume the message once it has been delivered to MassTransit by RabbitMQ, it could be an issue resolving the consumer from the container, such as a dependency to another backing service (database, log server, file, network connection, device, etc.).
The message remains unacknowledged on the broker because the transport/delivery mechanism to the consumer is waiting for a resource to become available. If there isn't anything in the logs for that time period indicating an issue with a resource, it's hard to know what could have blocked those messages from being consumed. The fact that they were ultimately consumed once the services were restarted seems to indicate the message content itself was fine.
Monitoring the lack of message consumption (and likely an associated queue depth increase) would give an indication that the situation has occurred. If it happens again, I'd increase the logging detail levels to see if the issue occurs again and can then be identified.
We're currently having problems with our ActiveMQ 5.16.1 which suddenly starts piling up messages without any apparent reason. The following image shows the ActiveMQ QueueSize:
The ActiveMQ is used as JMS message broker without any other components for e.g. high availability or load balancing. Several producers (in total and worst case around 20) produce small/simple JSON messages which are send to the broker and consumed by a JAVA-based microservice. The microservice processes the message and saves the data to an Oracle database. Average processing time for one request is about 30ms. From those 20 producers only some are active at the same time which might vary between 2 and 10 producers. Each producer sends a message every 3 secondes resulting in 20 messages/min per producer. E.g.: having 10 producers the broker will get 200 messages/min or 30 messages/sec. Preserving the order is crucital thus I'm working with JMSXGroupIds which works good so far. Messages are send via MQTT and routed (via Camel) to an JMS queue:
<route id="handleData">
<from uri="activemq://topic:some.topic.here?clientId=uniqueClientId" />
<setHeader headerName="tName">
<constant>ABC123</constant>
</setHeader>
<setHeader headerName="JMSXGroupId">
<jsonpath>$.producerId</jsonpath>
</setHeader>
<to uri="activemq://queue:myQueue" />
</route>
But for any reason the messages get stuck after some time and I can't find any significant hint why that happens. There is nothing in the log files nor the OS event log. I have to restart the ActiveMQ service in order to "reanimate" it. Afterwards all stuck messages will be processed and everything is working fine until the next "accident". This time it took about 10 days before the messages got stuck.
I already checked whether there might be a network or database-related issue. Even moved the ActiveMQ to a freshly new server in order to asure that nothing else is influencing the ActiveMQ processes. But I couldn't find any hints either. I watched the JVM, heap space growth, memory usage, etc. - everything unremarkable.
Does anybody has an idea what I could check additionally to find out what the problem is?
Add the advisoryForSlowConsumer destinationPolicy setting for those topics and watch the topic://ActiveMQ.Advisory.. for any enqueue counts that would indicate slow consumer occurred.
Your Camel route is almost identical to what a VirtualTopic does. You'll see better consistency with server-side routing in these types of scenarios, since there is no remote process (ie. Camel route) to manage connections, sessions, etc.
Bonus: MQTT transport supports using Virtual Topics to back-end the subscriptions so you any MQTT topic consumers would automatically pull from the queue.
ref: https://activemq.apache.org/virtual-destinations
If I set QueueExpiration in MassTransit during configuration to 5 hours, does that mean that the queue will be deleted if no activity has happened in the queue for 5 hours, or will it delete itself even if there is activity, after 5 hours?
Edit: I am using RabbitMQ transport, and I am setting it inside the IOC configuration step.
The queue will be deleted if there is no activity for five hours. A connected bus with a receive endpoint on the queue is considered "activity" even if there are no messages received.
We just developed a system that integrates azure queue with an azure cloud service to process batch items. One requirement we had was to have items be set in the future to process. So for example, we batch it now, but tell it not to start for 5 hours.
This is built right into azure queues AddMessage using initialVisibilityDelay, so we did not see this as being an issue. However, we just noticed when we add auto scale on our Cloud Service, it is going off the total items in queue. In our situation we added 100,000 queue items to be sent 5 days from now, however it is scaling assuming these 100,000 are ready to go right now.
So in our situation, we would basically have dozens of instances of our app running until these messages can even send, 5 days from now.
I feel like there is something simple we are missing here.
Any feedback would be very helpful.
Anthony
Have you considered using one queue for the waiting messages and another queue for the actual messages to be processed and scaling on that latter queue?
I have an online service that receives incoming events (few every second). Service needs to process a job when there were no events for 30 seconds or more. Service is distributed across several PCs and uses Amazon webservices (SQS and SimpleDB) as a backbone.
I understand how can I schedule a job when there IS an incoming event (just put a message into message queue and you are done), but how can I schedule a job when the condition is "NO EVENTS FOR X SECONDS" ?
Ideally I would want a message queue that does not allow duplicate messages, allows scheduling for the future and allows adjusting "delivery date" on each message.
Is there such a message queue implementation?
Is this problem can be solved at all without persisting some data in database?
Thank you
Both BizTalk or SQL Server Service Broker fit your requirements. If they are too heavyweight, you could write a simple service that peeks the queue every couple of seconds and times out if it does not see anything in 30 seconds. That would be more difficult to scale horizontally across machines, however.