I think I tried to start a channel that is already running or whatever. Whenever I start the sender channel, the receiver channel goes to a PAUSED state. I looked it up and found something about AdoptNewMCA configuration, not sure how to set it at the queue manager level. How do I fix this smoothly. Merely stopping and restarting the channels does not do it.
Error log says:
/02/2012 12:38:41 PM - Process(19161.269) User(mqm) Program(amqrmppa)
Host() Installation(Installation1)
VRMF(7.1.0.0) QMgr(QM_TEST2)
AMQ9514: Channel 'QM_TEST1.TO.QM_TEST2' is in use.
EXPLANATION: The requested operation failed because channel
''QM_TEST1.TO.QM_TEST2' is currently active. ACTION: Either end the channel
manually, or wait for it to close, and retry the operation.
----- amqrcsia.c : 1042 -------------------------------------------------------
08/02/2012 12:38:41 PM - Process(19161.269) User(mqm) Program(amqrmppa)
Host(...) Installation(Installation1)
VRMF(7.1.0.0) QMgr(QM_TEST2)
AMQ9999: Channel ''QM_TEST1.TO.QM_TEST2' to host '17.2.33.44' ended abnormally.
EXPLANATION: The channel program running under process ID 19161 for
channel ''QM_TEST1.TO.QM_TEST2' ended abnormally. The host name is
'17.2.33.44'; in some cases the host name cannot be
determined and so is shown as '????'. ACTION: Look at previous error
messages for the channel program in the error logs to determine the
cause of the failure. Note that this message can be excluded
completely or suppressed by tuning the "ExcludeMessage" or
"SuppressMessage" attributes under the "QMErrorLog" stanza in qm.ini.
Further information can be found in the System Administration Guide.
----- amqrmrsa.c : 887 --------------------------------------------------------
When looking these things up, I'd start first with the product manuals. In this case, the Infocenter topic on channel states says that a channel in PAUSED state is waiting on a retry interval. The sub-topic on channel errors explains why sending or receiving channels can be in retry:
If a channel is unable to put a message to the target queue because
that queue is full or put inhibited, the channel can retry the
operation a number of times (specified in the message-retry count
attribute) at a time interval (specified in the message-retry interval
attribute). Alternatively, you can write your own message-retry exit
that determines which circumstances cause a retry, and the number of
attempts made. The channel goes to PAUSED state while waiting for the
message-retry interval to finish.
So if you stop your channels, you should see a message in the XMitQ on the sending side. If you GET-enable that queue you can browse the message, look at the header and see which queue it is destined for. On the receiving side, look to see if that queue is full.
Classic fast-sender, slow-consumer problem here. If the consumer can't keep up, the messages back up on the receiving QMgr, then the channel goes to retry and they begin to back up on the sending QMgr. Got to monitor depth and input handles on request queues.
Make sure a DLQ is set.
Try reducing the message retry count to 1 to speed up use of the DLQ.
Related
We have a PHP app that forwards messages from RabbitMQ to connected devices down a WebSocket connection (PHP AMQP pecl extension v1.7.1 & RabbitMQ 3.6.6).
Messages are consumed from an array of queues (1 per websocket connection), and are acknowledged by the consumer when we receive confirmation over the websocket that the message has been received (so we can requeue messages that are not delivered in an acceptable timeframe). This is done in a non-blocking fashion.
99% of the time, this works perfectly, but very occasionally we receive an error "RabbitMQ PRECONDITION_FAILED - unknown delivery tag ". This closes the channel. In my understanding, this exception is a result of one of the following conditions:
The message has already been acked or rejected.
An ack is attempted over a channel the message was not delivered on.
An ack is attempted after the message timeout (ttl) has expired.
We have implemented protections for each of the above cases but yet the problem continues.
I realise there are number of implementation details that could impact this, but at a conceptual level, are there any other failure cases that we have not considered and should be handling? or is there a better way of achieving the functionality described above?
"PRECONDITION_FAILED - unknown delivery tag" usually happens because of double ack-ing, ack-ing on wrong channels or ack-ing messages that should not be ack-ed.
So in same case you are tying to execute basic.ack two times or basic.ack using another channel
(Solution below)
Quoting Jan Grzegorowski from his blog:
If you are struggling with the 406 error message which is included in
title of this post you may be interested in reading the whole story.
Problem
I was using amqplib for conneting NodeJS based messages processor with
RabbitMQ broker. Everything seems to be working fine, but from time to
time 406 (PRECONDINTION-FAILED) message shows up in the log:
"Error: Channel closed by server: 406 (PRECONDITION-FAILED) with message "PRECONDITION_FAILED - unknown delivery tag 1"
Solution <--
Keeping things simple:
You have to ACK messages in same order as they arrive to your system
You can't ACK messages on a different channel than that they arrive on If you break any of these rules you will face 406
(PRECONDITION-FAILED) error message.
Original answer
It can happen if you set no-ack option of a Consumer to true that means you souldn't call ack function manually:
https://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.consume.no-ack
The solution: set no-ack flag to false.
If you aknowledge twice the same message you can have this error.
A variation of what they said above about acking it twice:
there is an "obscure" situation where you are acking a message more than once, which is when you ack a message with multiple parameter set to true, which means all previous messages to the one you are trying to ack, will be acked too.
And so if you try to ack one of the messages that were "auto acked" by setting multiple to true then you would be trying to "ack" it multiple times and so the error, confusing but hope you understand it after a few reads.
Make sure you have the correct application.properties:
If you use the RabbitTemplate without any channel configuration, use "simple":
spring.rabbitmq.listener.simple.acknowledge-mode=manual
In this case, if you use "direct" instead of "simple", you will get the same error message. Another one looks like this:
spring.rabbitmq.listener.direct.acknowledge-mode=manual
I see SENDER channel goes into RETRY mode after LONGRTS start. It remains in RETRY mode and re-started after LONGMTR(1200) seconds. My question is - does Sender channel comes back to RUNNING as soon as message come, without completion of LONGMTR or it waits for LONGMTR time?
A SENDER channel will go into STATUS(RETRY) - a.k.a. Retry Mode - when the connection to its partner fails.
To begin with, on the assumption that many network failures are very short lived, a SENDER channel will try a small number of fairly close together attempts to re-make the network connection. It will try 10 times at 60 seconds apart, to re-make the connection. This is known as the "short retries".
This 10 times and 60 seconds apart, are coded in the SENDER channel fields called SHORTRTY and SHORTTMR.
If after these first 10 attempts, the SENDER channel has still not managed to get reconnected to the network partner, it will now move to "long retries". It is now operating with the assumption that the network outage is a longer one, for example the partner queue manager machine is having maintenance applied, or there has been some other major outage, and not just a network blip.
The SENDER channel will now try what it hopes is an infinite number of slightly more spaced apart attempts to re-make the connection. It will try 999999999 times at 1200 seconds apart, to re-make the connection.
This 999999999 and 1200, are coded in the SENDER channel fields called LONGRTY and LONGTMR.
You can see how many attempts are left by using the DISPLAY CHSTATUS command and looking at the SHORTRTS and LONGRTS fields. These should how many of the 10 or 999999999 are left. If SHORTRTS(0) then you know the SENDER is into "long retry mode".
If, on any of these attempts to re-make the connection, it is successful, it will stop retrying and you will see the SENDER channel show STATUS(RUNNING). Note that the success is due to the network connection having been successfully made, and is nothing to do with whether a message arrives or not.
It will not continue making retry attempts after it successfully connects to the partner (until the next time the connection is lost of course).
If your channel is in STATUS(RETRY) you should look in the AMQERR01.LOG to discover the reason for the failure. It may be something you can fix at the SENDER end or it may be something that needs to be fixed at the RECEIVER end, for example restarting the queue manager or the listener.
If I acknowledge the same message twice using the Delivery.Ack method, my consumer channel just closes by itself.
Is this expected behaviour? Has anyone experienced this ?
The reason I am acknowledging the same message twice is a special case where I have to break the original message into copies and process them on the consumer. Once the consumer processes everything, it loops and acks everything. Since there are copies of the entity, it acks the same message twice and my consumer channel shuts down
According to the AMQP reference, a channel exception is raised when a message gets acknowledged for the second time:
A message MUST not be acknowledged more than once. The receiving peer
MUST validate that a non-zero delivery-tag refers to a delivered
message, and raise a channel exception if this is not the case.
Second call to Ack(...) for the same message will not return an error, but the channel gets closed due to this exception received from server:
Exception (406) Reason: "PRECONDITION_FAILED - unknown delivery tag ?"
It is possible to register a listener via Channel.NotifyClose to observe this exception.
I am trying to setup message channel on IBM MQ v8.
I am running IBM MQ Server 8.x on Ubuntu Linux.
I have 2 queue managers QM1, and QM2.
On QM1, I have created a Sender Channel, and on QM2, I have created a Receiver channel.
On QM1:
Remote queue definition
DEFINE QREMOTE(RMQ1) DESCR('Remote queue for QM2') REPLACE +
PUT(ENABLED) XMITQ(QM2) RNAME(Q_ON_QM2) RQMNAME(QM2)
Transmission queue definition
DEFINE QLOCAL(QM2) DESCR('Transmission queue to QM2') REPLACE +
USAGE(XMITQ) PUT(ENABLED) GET(ENABLED) TRIGGER TRIGTYPE(FIRST) +
TRIGDATA(QM1.TO.QM2) INITQ(SYSTEM.CHANNEL.INITQ)
Sender channel definition for a TCP/IP connection:
DEFINE CHANNEL(QM1.TO.QM2) CHLTYPE(SDR) TRPTYPE(TCP) +
REPLACE DESCR('Sender channel to QM2') XMITQ(QM2) +
CONNAME('127.0.0.1(**1491**)') //-- QM2's listener is on 1490
On 2nd Queue manager (QM2)
Local queue definition
DEFINE QLOCAL(Q_ON_QM2) REPLACE PUT(ENABLED) GET(ENABLED) +
DESCR('Local queue ')
Receiver channel definition
For a TCP/IP connection:
DEFINE CHANNEL(QM1.TO.QM2) CHLTYPE(RCVR) TRPTYPE(TCP) +
REPLACE DESCR('Receiver channel from QM1')
At the end of configuration, my sender channel remains in "Retrying" state, and Receiver channel remains in "inactive" state.
How do I get this channel running?
At first glance, it appears the problem is with your port.
The conname for connection should specify the port where the listener is actually running. Is it 1491 or 1490?
CONNAME('127.0.0.1(1491)') //-- QM2's listener is on 1490
Verify the listener is running for the receiving qmgr and specify that port in your conname.
There could be many reasons of a sender channel going in retrying state.
1. Wrong parameters.
Check connection name as Valerie suggested. Make sure IP address and port number points to the receiver queue manager.
2. Transmission queue unavailable.
Make sure the transmission queue is available. Note: Sometimes the transmission queue will be available but it might be GET disabled, in that case also the sender channel will go in retrying state. The sender channel opens the transmission queue in exclusive mode, which means if the transmission queue is opened by another application (say RFHUTIL), then the sender channel will not be able to access transmission queue and hence the channel will go in retrying state. So, make sure the transmission queue is not opened by some other application.
3. Receiver channel unavailable.
This could be the case when the receiver queue manager is down.
Also, make sure the name of receiver channel is same as the sender channel(which seems correct in your case).
4. Receiver channel and sender channel go out of sequence
The receiver channel and sender channel maintains a sequence number for message transmission. Due to environmental issues like network glitches, the sequence number might become inconsistent between the sender and receiver channel.
RESET your sender and receiver channels to overcome this issue.
We are attempting to consolidate the DLQs across the board in our enterprise, into a single Q (an Enterprise_DLQ if you will...). We have a mix of QMs on various platforms - Mainframe, various Unix flavours - Linux,AIX,Solaris etc., Windows, AS/400....
The idea was to configure the DLQ on the QM (set the DEADQ attribute on the QM) to that of the ENTERPRISE_DLQ which is a Cluster Q. All the QMs in the Enterprise are members of the Cluster. This approach, however does not seem to work when we tested it.
I have tested this by setting up a simple Cluster with 4 QMs. On one of the QM, defined a QRemote to a non-existent QM and non-existent Q, but a valid xmitq and configure the requsite SDR chl between the QMs as follows:
QM_FR - Full_Repos
QM1, QM2, QM3 - members of the Cluster
QM_FR hosts ENTERPRISE_DLQ which is advertised to the Cluster
On QM3 setup the following:
QM3.QM1 - sdr to QM1, ql(QM1) with usage xmitq, qr(qr.not_exist) rqmname(not_exist) rname(not_exist) xmitq(qm1), setup QM1 to trigger-start QM3.QM1 when a msg arrives on QM1
On QM1:
QM3.QM1 - rcvr chl, ql(local_dlq), ql(qa.enterise_dlq), qr(qr.enterprise.dlq)
Test 1:
Set deadq on QM1 to ENTERPRISE_DLQ, write a msg to QR.NOT_EXIST on QM3
Result: Msg stays put on QM1, QM3.QM1 is RETRYING, QM1 error logs complain about not being able to MQOPEN the Q - ENTERPRISE_DLQ!!
ql(qm1) curdepth(1)
Test 2:
Set deadq on QM1 to qr.enterprise.dlq, write a msg to QR.NOT_EXIST on QM3
Result: Msg stays put on QM1, QM3.QM1 is RETRYING, QM1 error logs complain about not being able to MQOPEN the Q - qr.enterprise.dlq (all caps)!!
ql(qm1) curdepth(2)
Test 3:
Set deadq on QM1 to qa.enterise_dlq, write a msg to QR.NOT_EXIST on QM3
Result: Msg stays put on QM1, QM3.QM1 is RETRYING, QM1 error logs complain about not being able to MQOPEN the Q - qa.enterise_dlq (all caps)!!
ql(qm1) curdepth(3)
Test 4:
Set deadq on QM1 to local_dlq, write a msg to QR.NOT_EXIST on QM3
Result: Msg stays put on QM1, QM3.QM1 is RUNNING, all msgs on QM3 ql(QM1) make it to local_dlq on QM3.
ql(qm1) curdepth(0)
Now the question: Looks like the DLQ on a QM must be a local queue. Is this a correct conclusion? If not, how can I make all the DLQs msg go to a single Q - Enterprise_DLQ above?
One obvious solution is to define a trigger on local_dlq on QM3 (and do the same on others QMs) which will read the msg and write it to the Cluster Q - ENTERPRISE_DLQ. But this involves additional moving parts - trigger, trigger monitor on each QM. It is most desirable to be able to configure a Cluster Q/QRemote/QAlias to be a DLQ on the QM. Thoughts/ideas???
Thanks
-Ravi
Per the documentation here:
A dead-letter queue has no special requirements except that:
It must be a local queue
Its MAXMSGL (maximum message length) attribute must enable the queue to accommodate the largest messages that the queue manager has
to handle plus the size of the dead-letter header (MQDLH)
The DLQ provides a means for a QMgr to handle messages that a channel was unable to deliver. If the DLQ were not local then the error handling for channels would itself be dependent on channels. This would present something of an architectural design flaw.
The prescribed way to do what you require is to trigger a job to forward the messages to the remote queue. This way whenever a message hits the DLQ, the triggered job fires up and forwards the messages. If you didn't want to write such a program, you could easily use a bit of shell or Perl code and the Q program from SupportPac MA01. It would be advisable that the channels used to send such messages off the QMgr would be set to not use the DLQ. Ideally, these would exist in a dedicated cluster so that DLQ traffic did not mix with application traffic.
Also, be aware that one of the functions of the DLQ is to move messages out of the XMitQ if a conversion error prevents them from being sent. Forwarding them to a central location would have the effect of putting them back onto the cluster XMitQ. Similarly, if the destination filled up, these messages would also sit on the sending qMgr's cluster XMitQ. If they built up there in sufficient numbers, a full cluster XMitQ would prevent all cluster channels from working. In that event you'd need some kind of tooling to let you selectively delete or move messages out of the cluster XMitQ which would be a bit challenging.
With all that in mind, the requirement would seem to present more challenges than it solves. Recommendation: error handling for channels is best handled without further use of channels - i.e. locally.