MQ process amqrmppa get defunct status in Linux - ibm-mq

i've got a problem with process amqrmppa showing defunct status, please see detail as below:
mqm 2055 2912 0 Sep01 ? 00:00:31 [amqrmppa] <defunct>
mqm 2524 2912 0 Sep02 ? 00:00:23 [amqrmppa] <defunct>
mqm 2570 2912 0 Sep03 ? 00:00:21 [amqrmppa] <defunct>
mqm 4754 2912 0 Sep03 ? 00:00:19 [amqrmppa] <defunct>
mqm 5628 2912 0 Sep02 ? 00:00:23 [amqrmppa] <defunct>
i checked the error log file but got nothing clue there. can you help me figure it out? how to handle this?
Thanks
WebSphere MQ for Linux (x86-64 platform)
7.0.1.5
Linux 2.6.32.12-0.7-default

Those are the channel pooling processes. There are several reasons for this state, the most common of which being that channels were preemptively killed or that the parent process for these was killed. Other less common errors include improperly coded channel exits or the occasional bug. (Yes, amqrrmpa shows up in the APAR list from time to time.) It is really difficult to provide any more specific response than this without seeing the channel configurations. If it is not possible to post them for security reasons, you'll need to open a PMR and let IBM look at them. Specifically, I'd be looking to see whether the channels run in FASTPATH or trusted mode, whether exits are defined, etc. I'd also look at the version and fix pack level, then look at the APAR list for the later fixes to see if any address the problem. If these do not point to potential issues, then it's time to run through the MustGather procedure and open a PMR.
For what its worth, it is usually a good idea to go to the MustGather page and drill down by platform and problem category to get some really helpful diagnostic advice.

Related

Sphero 2 core api commands get timeout responses

Although I can command Sleep and Ping, other Core DID 0 commands do not work for me.
For example CID 40h Diagnostic Level 1 gets an MRSP of 35h.
Am I missing something?
Thanks
Error 35h means the state machine that processes the API commands timed out. Double-check the packet you sent, it probably starts fine but then lacks one or more bytes.
Dan

The Cluster refresh solution

Update: We are using AIX environment.
We have been facing some random issues with our queues (cluster queues), like:
2189 Cluster resolution error (Most frequent one)
2270 MQRC_NO_DESTINATIONS_AVAILABLE
2053 Queue full error(Weirdest) : Post one message, it will be successfully posted, post some 3-4 messages, it will throw this error
for the rest of the messages.
All these issues get resolved once we do a cluster refresh. But, I want to know the root cause, why we get these errors. What goes wrong?
How cluster refresh resolve these errors?
Could be a socket issue. You can monitor sockets according to your OS - like on windows can do
netstat -a -b -o >/newfile.txt
You could also use TCP Viewer on windows (one exe from Microsoft/ sysinternals) http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx actually all the sys internal toos should be in your prod box if windows.
For sockets in linux/Un* there are other tools, some just ls commands into the RAM, depending on the version. Maybe a google will help.
Also if using windows consider moving some stuff to linux, you will have some pain in the beggining but will get better.
If this did not help you should post yor environment on your quesiton and give any other details. And if you get a jprofiler into production and use it when the issue happens.
At the very least you can do a jstack and jmap
What is version/ name of OS and of java, websphere?
If it is a socket issue can try increasing sockets (registry) and then profiling your code to see who is making too many sockets, what needs to be throttled or re-written.
Remember every page, every db connection, external cache hit (if you use) or any other URL work/ remote connection is usually a socket.

how to back trace in android native code

My Apk is receiving a sig_stop at epoll_wait. I am unable to figure out the part of code that is calling this. I know this is a part of libc.It tells me the line number for this epoll_wait . when I tried the tool addr2line , not much luck .
./toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi- addr2line -C -f -e ~/SoxPlayer/obj/local/armeabi/libc.so 0xafd0c75c
??
??:0
This is what I get ..
I want to know what part in my code is forking the thread 1 which calls the epoll_wait.
Thread [1] (Suspended: Signal 'SIGSTOP' received. Description: Stopped (signal).)
3 epoll_wait() 0xafd0c75c
2 <symbol is not available> 0xa8125706
1 <symbol is not available> 0xa8125706
Not much information here.
gdb will also provide the same information. ndk-stack tool wont help as I dont have the the kind of sigev fault or app crash which is needed for such a tool. My apk is waiting for the signal , event in my code. but what tool can I use to get who is calling epoll_wait in my code. I am able to see what is happening in my code but not able to pinpoint where exactly this epoll_wait gets called. Need advice about a tool I can use to figure this out.
Thanks

Handling repeated events in a log

I have a logging system where some events are repeated infinitely. for example:
12:03 - Restart attempted
12:03 - Restart failed
12:02 - Restart attempted
12:02 - Restart failed
12:01 - Restart attempted
12:01 - Restart failed
This might go on for days. I imagine there are standard ways that systems deal with spammy events like this.
What are the common ways logging systems deal with these kind of events without flooding the log system?
One approach would be to coalesce matching entries that repeat within some time delta of each other, something like
12:03 - Restart attempted [3 times since 12:01]
12:03 - Restart failed [3 times since 12:01]
12:02 - Something
11:23 - Restart attempted [17 times since 11:21]
They typically either compel you to fix the problem, or have flags to suppress them. And I think properly so.
Error logs are typically chronically underadministered and undermonitored anyway. If it's an application I'm in any way involved with, I'd as soon see as much flag-waving as possible if it gets someone's attention.

Oracle listener state blocked

I have a webpp which works fine when under a light load. However, when we run a lot of threads each with their own database connection, then we start getting the error
ORA-12519: TNS:no appropriate service handler found
After looking online I found that running lsnrctl services was a good diagnostic step, so I did that. The result for our service was
Service "orcl" has 1 instance(s).
Instance "orcl", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:130 refused:0 state:blocked
LOCAL SERVER
The number of established connections is consistent with the number of threads. However, the state:blocked seems like a cause and/or symptom of this problem.
So what's my next step? The max number of open sessions is 1024, which is more than enough, and there's no limit to the number of sessions per user. I ran this test after a reboot of the machine, and no other programs were connected. I'm really not sure what to try next, so any help will be greatly appreciated.
EDIT: Upping the processes and sessions parameters seemed to do the trick. In addition to finding Matthew's suggestion helpful, this email described my problem perfectly.
Have your checked your alert log it should tell you what is going wrong if oracle is running out of resources? Sounds like you may be out of processes,
Run in sqlplus,
SQL> show parameter processes
it will show you how many processes oracle will allow. You may need to increase this a bit.
If you have a metalink account, then check article 240710.1 for more details.

Resources