How to resolve this channel issue from WMQ?

How to resolve this channel issue from WMQ? - ibm-mq

Below is the related part from a QMGR log file about a WMQ channel issue:
-------------------------------------------------------------------------------
2012-7-23 10:35:25 - Process(340.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9206: Error sending data to host 86.0.223.5(1602) 。
EXPLANATION:
An error occurred sending data over TCP/IP to 86.0.223.5(1602). This may be due to
a communications failure.
ACTION:
The return code from the TCP/IP(send) call was 10054 X('2746'). Record these
values and tell your systems administrator.
----- amqccita.c : 2612 -------------------------------------------------------
2012-7-23 10:35:25 - Process(340.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'CZWJNS.CZWJCZ' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CZWJNS.CZWJCZ' in the
error files to determine the cause of the failure.
----- amqrccca.c : 834 --------------------------------------------------------
2012-7-23 10:35:35 - Process(3616.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9002: Channel “CZWJNS.CZWJCZ' is starting。
EXPLANATION：
Channel “CZWJNS.CZWJCZ' is starting。
ACTION：
None。
-------------------------------------------------------------------------------
2012-7-23 10:40:35 - Process(3616.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9206: Error sending data to host 86.0.223.5(1602) 。
EXPLANATION:
An error occurred sending data over TCP/IP to 86.0.223.5(1602). This may be due to
a communications failure.
ACTION:
The return code from the TCP/IP(send) call was 10054 X('2746'). Record these
values and tell your systems administrator.
----- amqccita.c : 2612 -------------------------------------------------------
2012-7-23 10:40:35 - Process(3616.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'CZWJNS.CZWJCZ' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CZWJNS.CZWJCZ' in the
error files to determine the cause of the failure.
----- amqrccca.c : 834 --------------------------------------------------------
2012-7-23 10:40:45 - Process(4848.1) User(MUSR_MQADMIN) Program(runmqchl.exe)
AMQ9002: Channel “CZWJNS.CZWJCZ' is starting。
EXPLANATION：
Channel “CZWJNS.CZWJCZ' is starting。
ACTION：
None。
-------------------------------------------------------------------------------
Right now, the situation is that the target channel (CZWJNS.CZWJCZ) can finally run, but only after a few retry attempts. It keeps happening often. All the messages can be delivered to the target queue in the remote QMGR host successfully. However, they're always delayed due to the multiple retry attempts.
I've searched through the internet for the return code 10054 and it means the connection has been reset by the peer.
My WMQ version is 6.0.10 on Windows 2003.

The "Connection reset by peer" means that something between this node and the other node closed the connection. The cause can range from dodgy/noisy network, to firewall timing out, to channel exits that refuse the connection, or many other causes.
The key to diagnosis in these cases is to narrow down the cause. This requires looking at the error logs on both QMgrs (or the client and QMgr) for the same event. In the case of a channel exit, a look at the channel definitions on both sides reveals whether such an exit is in place but if it is then you need to look at the exit's configuration and logs as well.
If the problem is in the network, the error logs from both QMgrs will show similar errors. However if one QMgr closed the connection intentionally then you will see that in its log files.

Related

MQ error queue manager error logs showing "The error code returned was '701'." MQ errors on MQ V9.2.0.0

One of the MQ queue manager is having below errors in it's error logs and due to which SSL channel is in retrying state. Can anyone know about such kind of error.
An error indicating a software problem was returned from a function which is
used to provide SSL or TLS support. The error code returned was '701'. The
function call was 'ccigsk_attrib_set_enum - GSK_TRUNCATE_PEER_CERTCHAIN'.
The channel is XXX; in some cases its name cannot be
determined and so is shown as '????'. The channel did not start.

IBM MQ client 7.5 MQRC_HOST_NOT_AVAILABLE

We've tryed to test connection to the remote queue manager after installing MQ client v7.5 on Windows Server 2019. We've used Rfhutilc for this and got 'Host not available' inspite of the fact that telnet connection to the corresponding address was succecfully established. Also we tryed to connect using MQ client v9.0 with the same result.
AMQERR01.LOG (client v.7.5) reported following details:
29.09.2020 15:36:10 - Process(10828.2) User(Администратор) Program(rfhutilc.exe)
Host(-) Installation(Installation1)
VRMF(7.5.0.6)
AMQ9208: Error on receive from host 'X.X.X.X'.
EXPLANATION: An error occurred receiving data from 'X.X.X.X' over TCP/IP. This may be due to a communications failure.
ACTION: The return code from the TCP/IP recv() call was 10054 (X'2746'). Record these values and tell the systems administrator.
----- amqccita.c : 4065 -------------------------------------------------------
29.09.2020 15:37:56 - Process(10828.1) User(Администратор) Program(rfhutilc.exe)
Host(-) Installation(Installation1)
VRMF(7.5.0.6)
AMQ9202: Remote host 'X.X.X.X' not available, retry later.
EXPLANATION: The attempt to allocate a conversation using TCP/IP to host 'X.X.X.X' was not successful. However the error may be a transitory one and it may be possible to successfully allocate a TCP/IP conversation later.
ACTION: Try the connection again later. If the failure persists, record the error values and contact your systems administrator. The return code from TCP/IP is 10060 (X'274C'). The reason for the failure may be that this host cannot reach the destination host. It may also be possible that the listening program at host 'X.X.X.X' was not running. If this is the case, perform the relevant operations to start the TCP/IP listening program, and try again.
Here is an example of how traffic data looks like when Rfhutilc refuses to connect to the queue.
As soon as according to the picture there was some code page issue we've tryed to set MQCCSID environment variable with the value 1208 and it helpled.
Also connection attempt via Rfhutilc was succeful while running under another user with login "admin" even though without setting MQCCSID variable.
But I failed to find explanation for this. Did the CCSID of the MQ client differ from system code page of what? And how could I find out default CCSID of MQ client then?
MQ client v7.5 worked just fine on the Windows Server 2012 R2 right after installing. Rfhutilc v7.5 was used both on Server 2012 and Server 2019 for testing.

WebSphereMQ + Centos 7

I try to install WebSphere MQ v8 on Centos 7... I did have no problem instaling the server and test it. When I try to configure client every is OK until I try to put a message at queue with the following command: ./amqsputc queue manager. Here part of the log file.
-------------------------------------------------------------------------------
08/01/15 13:16:17 - Process(37991.4) User(mqm) Program(amqrmppa)
Host(localhost.localdomain) Installation(Installation1)
VRMF(8.0.0.0) QMgr(my.manager)
AMQ9776: Channel was blocked by userid
EXPLANATION: The inbound channel 'CANAL1' was blocked from address
'127.0.0.1' because the active values of the channel were mapped to a
userid which should be blocked. The active values of the channel were
'MCAUSER(mqm) CLNTUSER(mqm) ADDRESS(localhost)'. ACTION: Contact the
systems administrator, who should examine the channel authentication
records to ensure that the correct settings have been configured. The
ALTER QMGR CHLAUTH switch is used to control whether channel
authentication records are used. The command DISPLAY CHLAUTH can be
used to query the channel authentication records.
----- cmqxrmsa.c : 1257 -------------------------------------------------------
08/01/15 13:16:17 - Process(37991.4) User(mqm) Program(amqrmppa)
Host(localhost.localdomain) Installation(Installation1)
VRMF(8.0.0.0) QMgr(my.manager)
AMQ9999: Channel 'CANAL1' to host '127.0.0.1' ended abnormally.
EXPLANATION: The channel program running under process ID 37991 for
channel 'CANAL1' ended abnormally. The host name is '127.0.0.1'; in
some cases the host name cannot be determined and so is shown as
'????'. ACTION: Look at previous error messages for the channel
program in the error logs to determine the cause of the failure. Note
that this message can be excluded completely or suppressed by tuning
the "ExcludeMessage" or "SuppressMessage" attributes under the
"QMErrorLog" stanza in qm.ini. Further information can be found in the
System Administration Guide.
----- amqrmrsa.c : 925 --------------------------------------------------------
I appreciate all the help you can give me, thanks in advance.

It is very easy to troubleshoot the reasons why you have been blocked by a CHLAUTH rule. There is a blog post on it, I'm being blocked by CHLAUTH - how can I work out why?
However, I can tell you from here exactly which rule is blocking you, it is the default rule which bans remote privileged access, i.e. mqm access from client connections. If you want to have access without being privileged, read A non-privileged MQ administrator, and alternatively if you do want to allow the risky remote access from privileged users, read CHLAUTH - Allow some privileged admins

It's worth noting MQ v8 doesn't support CentOS at all, and hasn't declared support for RHEL 7 either.
http://www-969.ibm.com/software/reports/compatibility/clarity-reports/report/html/softwareReqsForProduct?deliverableId=1350550241693&osPlatform=Linux
At time of writing MQ v8 supports the following Linux distros:
Asianux 3.0
RHEL 6
SLES 11
Ubuntu 12.04
That said, the error 'AMQ9776: Channel was blocked by userid' you pasted above show that your client is failing the channel authentication checks.
You can check this by disabling channel authentication via the following MQSC command:
'ALTER QMGR CHLAUTH(DISABLED)'
There's a good article on developerworks that explains how to work out why your connection attempt was blocked you can look at here:
https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/blocked_by_chlauth_why?lang=en

AMQ9999 occuring in AMQERR01.LOG

I have the following error showing up in AMQERR01.LOG
AMQ9999: Channel 'MGATESrvChannel' to host 'Mgate (127.0.0.1)' ended
abnormally.
EXPLANATION:
The channel program running under process ID 1060(4364) for channel
'MGATESrvChannel' ended abnormally. The host name is 'Mgate (127.0.0.1)'; in
some cases the host name cannot be determined and so is shown as '????'.
This error is preceded with following message:
AMQ9508: Program cannot connect to the queue manager.
EXPLANATION:
The connection attempt to queue manager 'MGATE.QM' failed with reason code
2059.
ACTION:
Ensure that the queue manager is available and operational.
According to what I have been told this can be caused by an application that is using queue manager, however, it seems to me that this has more to do with the way that manager was set up or similar. Can anyone please shed some light on this?
Thanks in advance!

The 2059 says that a connection request was received and refused because the QMgr was not available. We used to see this a lot when the listener was run as a separate process or when inetd was used to start channels. This is because the listener was there to accept the connection but the QMgr processes were not.
Now that the listener is run as a child process of the QMgr, it is quite rare to see this on the WMQ error logs though clients commonly see it. This is because when the listener is run as a child process of the QMgr, there is nothing listening to receive the connection request and it bounces off of the host's IP stack before ever getting to MQ code.
The AMQ9999 message says that a channel program, one of the QMgr's child processes, died or was killed and this caused the channel to terminate. There are many reasons for a channel process to die including being killed by the OS if resources are short, or being killed by a human operator. Other than that the most common way they can die due to running in trusted or fastpath mode and the attached program corrupts them.
It would help to narrow down the field to know the details of the QMgr in question - version and fix pack, how the listeners are started, channel settings, etc.

Start your listener up, you may check the Control property for that channel, so it start up automatically when the Queue Manager restart.

keepalive timeout on unix/windows

What is the error returned on aix/linux when a connection breaks down due to keepalive activity? Is it a unique error code which can be distinguished from other socket errors?
On windows this can be either WSAECONNRESET or WSAENETRESET.
Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned?
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

Is there a way to differentiate the error due to keepalive activity when WSAECONNRESET is returned ?
No. The underlying condition is a 'connection reset' in all cases.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio