MassTransit: How should I handle error queues?

MassTransit: How should I handle error queues? - masstransit

I saw a question & its answer below;
https://stackoverflow.com/a/46128844/7419921
Although I understood that I cannot do anything for the error queue via MassTransit, what should I handle the error queue?
Error messages would be accumlating. It's pressing storage capacity.
It seems that I have nothing to do for the error queue. Is there no choice but to remove them?
If so, I cannot imagine a meaning of the error queue.

The meaning of error queues is very simple. Messages come to error queues because, well, of errors! When you fix issues in your application, you can move messages from the error queue back to the regular queue using Shovel plugin, and voila - you recovered lost data. We do this very often.
If you cannot move them back because these messages aren't actual anymore or they contain wrong data - this is also very valuable since using these messages you can reproduce the issue and see if you can fix the sender.

Related

Kafka consumer and fails while handling some messages

I have a spring boot app with single kafka consumer to get messages from some topic.
But sometime errors are occurred while message handling.
I want to continue to receive the following messages as usual and at the same time be able not to lose that message and receive it, for example, the next time the service is restarted with the consumer after fixing it.
Is it possible to do this?
I understand that I need to disable auto-commit and commit successful messages manually, but, in this case, if I don't throw any exception for this exception case and commit each next successful message manually, then I will lose the previous unsuccessful one, right?

If I understand your question correctly, your assumption is that the exception occurs due to a problem in your code and not while reading the message from the topic. In that case no retry or other measures will solve your problem.
What we usually do is to catch the exception and send it to another Kafka topic. Ideally, you will also add some details on why or in which code part the exception occurred. After you have fixed the bug in your application you can consume the messages from that other topic.
I understand that I need to disable auto-commit and commit successful messages manually, but, in this case, if I don't throw any exception for this exception case and commit each next successful message manually, then I will lose the previous unsuccessful one, right?
Yes, your understanding is correct. To be more precise, you will not "loose" the message but as soon as your ConsumerGroup commits a higher offset it will never try to read the lower offset again without any manual modification.
Alternative
If you only expect very rare cases where an exception could be thrown, but you just ignore it, you can always use the consumer.seek() method in pure Kafka
public void seek(TopicPartition partition, long offset)
to start reading from a particular offset out of a topic partition.

Yes you have to manually commit them. You retry a particular message 2-3 times. If it fails after retries then you can move those messages to another topic and consume those messages when you fix whatever is causing it to fail. This will not block your queue and you won't lose and messages too.

I want to continue to receive the following messages as usual and at
the same time be able not to lose that message and receive it, for
example, the next time the service is restarted with the consumer
after fixing it.
Is it possible to do this?
You don't need to do a manual commit, instead, you can choose to implement a mechanism to do a retrial, by publishing the event in another queue and delayed consuming the event. =====> Amazon SQS has delay Queue but unfortunately there is no such thing in kafka and you have to write the implementation by yourself.
Reference articles:
Article 1
Article 2
If you are retrying the message processing, then the order of the messages can change based on your implementation. Please do keep it in mind.
Do remember that kafka does consider a consumer dead in case the message processing time exceeds max.poll.interval. Read this

Receiver of ActiveMQ: not able to retrieve whats in the Queue

I have executed the hello world program as mentioned in the below link.. http://www.coderpanda.com/jms-example-using-apache-activemq/ Also, I have downloaded the ActiveMQ jar and related files as mentioned. I am able to compile and run the all the java files too. Noticed that the receiver java file compiles successfully but when the Receiver executes no output message gets generated on the console. The message sent to the queue is not getting retrieved. I can see that message count getting increased on UI of ActiveMQ on each hit(hosted on local host url) but the message put on the queue is not yet printed/retrieved. Can anyone suggest any other implementation for publisher subscriber, if any? Or your thoughts on JMS Q...

The answer from Vihar is right.
When you see a message dequed count increasing, then it is clear that message is successfully consumed by some consumer, and on running of your receiver, why did your consumer count increase in queue? Are there multiple instances? Or you haven't closed the connection properly
I did not close the connection and consumers consumed the message in the queue whtn I ran the receive multiple times, I had no clue why and how it happened not until I ran it one at a time keeping a tab at the queue at the same time.

MQ error 2102(Resource_Prob in datastage)

When datastage Job is trying to put the message in queue, The error message comes, I guess this may be the issue with size of file which we were trying to post, which were around 3MB, but when we tried with around 2MB size, it got posted succesfuly in the queue.
Can somebody guide on this.

MQ error 2102 indicates "insufficient system resources to complete the call successfully".
As per your question, you get this error when you post bigger message.
So, it makes me believe that your server is running out of space which is causing this issue.

JMS Messages not consumed till producer connection close :-(

I am relatively new to JMS and have encountered a weird problem implementing my first real application. I'm desporate for any help or advice.
Background: I use AtiveMQ (java) as the message broker with non-transacted, non-persitent queues.
The Design: I have a straight forward producer/consumer system based around a single queue. A number of nodes(currently 2) place messages onto/ consume from the queue. Selectors are used to filter which messages a node recieves.
The Problem: The producer succesfully places its items on to the queue (i have verified they are there using the web interface) however the consumers remain blocked and do not read them. Only when i close the JMS connection in the producer do the consumers jump into life and consume the messages as expected.
This bevaior seems very weird to me, surely you shouldnt have to completely hang up the producer connection for the consumers to be able to read from the queue. I must have made a mistake somewhere(possibly with sessions) but the at the moment the number of things that could be wrong is to large and i have no idea what would cause this behaviour.
Any hints as to a solution, the cause of the problem or just how to continue debugging would be greatly appreciated.
Thanks for your time,
P.S If you requrie any additional information i am happy to provide it

Hard to say without seeing the code, but it sounds like the producer is transacted. You should not have to close the producer in order for the consumers to receive a message but a transacted producer won't send it messages until you call commit. Other things to check is that the connection has been started. Also if you have many consumers you should look at the prefetch setting to ensure that one consumer doesn't hog all the messages, setting to prefetch of 1 might be needed, but hard to say without further insight into your use case.

WebSphere MQ: Message keeps toggling between input queue and backout queue

The logic flow is like this
A message is sent to an input queue
A ProcessorMDB's onMessage() is invoked. Within this method several operations/validations are done
In case of a poison message(msg that application code cannot handle) a RuntimeException is thrown.
This should rollback the transaction. We are seeing evidence in the log file.
There is a backout threshold defined with a backout queue name
once threshold is reached, the message is sent to backout queue
But immediately it starts going back and forth between the input queue and backout queue.
We are using MQMON tool to observe this weird behavior. It continues for ever almost even after the app server(where MDB is running) is shutdown.
We are using Weblogic 10.3.1 and WebSphere MQ 6.02
Any help will be much appreciated, looks like we are running out of ideas.

This sounds like a syncpoint issue. If the QMgr were to issue a COMMIT when a message is requeued inside of a unit of work it would affect all messages under syncpoint inside of that thread. This would cause serious problems if an application had performed several PUT or GET calls prior to hitting the poison message. Rather than issue a COMMIT outside of the program's control, the QMgr just leaves the message on the backout queue inside the unit of work and waits for the program to issue the COMMIT. This can lead to some unexpected behavior such as what you are seeing where a message lands back on the input queue.
If another message is in the queue behind the "bad" one and it is processed successfully by the same thread, everything works out perfectly. The app issues a COMMIT on the new message and this also affects the poison message on the Backout Queue. However if the thread were to exit uncleanly (without an explicit disconnect or COMMIT) then the transaction is rolled back and the poison message is returned to the input queue.
The usual way of dealing with this is that the next good message (or batch of messages if transactions are batched) in the input queue will force the COMMIT. However in some cases where the owning thread gets no new work (perhaps it was performing a GET by Correlation ID) there is nothing to push the bad message through. In these cases, it is important to make sure that the application issues a COMMIT before ending. One way to do this is to write the code to perform the GET by CORRELID with a wait interval. If the wait interval expires, the application would get a return code of 2033 and then issue a COMMIT before closing the thread. If the reply message is legitimately late for whatever reason, the COMMIT will have no effect. But if the message arrived and had been backed out and requeued, the COMMIT will cause it to stay in the Backout Queue.
One way to see exactly what is going on is to run a trace against the queue in question. You can use the built-in trace function - strmqtrc - which has a few more options in V7 than does the V6 version. However if you want very fine grained control you can use the trace exit in SupportPac MA0W. With MA0W you can see exactly what API calls are made by the program and those made on its behalf.
[EDIT] Updating the response with some info from the PMR:
The following is from the WMQ V7 Infocenter:
MessageConsumers are single threaded below the Session level, and
any requeuing of poison messages
takes place within the current unit of
work. This does not affect the
operation of the application, however
when poison messages are requeued
under a transacted or
Client_acknowledge Session, the
requeue action itself will not be
committed until the current unit of
work is committed by the application
code or, if appropriate, the
application container code."
Hence, if it is important for the customer to have poison messages
committed immediately after they are
backed out, it is recommended they
either make use of the Application
Server Facilities
(ConnectionConsumer) which can commit
the message immediately, or
another mechanism to move poison
messages from the queue.
Here is the link to this information in the V6 and V7 Information Centers. Since you are using the V6 client so you would want to refer to the V6 Infocenter. Note that with the V6 client, there is no mention in the Infocenter of ASF being able to commit the poison message immediately, even when using a ConnectionConsumer. The way I read it, this means you probably will need to upgrade to the V7 client to get the behavior you are looking for. Will be interested to see if the PMR results in a similar recommendation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio