I see there are threads in hang in the websphere application server. How can I troubleshoot this problem? What documentation should I send to the application developer?
Thanks.
The most important thing is the thread stack - that should show up with the message indicating the hung thread, and it'll tell you what that thread was doing.
That, on its own, might not be enough, particularly if that thread is waiting on some other thread. In that case, you might need a thread dump. That can be triggered with "kill -3" against the process ID on non-Windows systems (I'd have to do more research to tell you the equivalent process on Windows, although there are tools that can simulate "kill -3"), and the server also can be configured to do that when it detects a hung thread, using the JVM system property com.ibm.websphere.threadmonitor.dump.java (set to either "true" or an integer value representing the maximum number of thread dumps you want).
The thread dump will go to a file called "javacore...txt" (the "..." will be a long string representing stuff like the timestamp), except on Solaris, where it will go to the server's native_stdout.log. The javacore has a lot more than just thread stacks, so you can search for "Thread Details" to find that section quickly. You'll need to search on the thread name/stack from the server log to figure out which thread is the hung one and go from there.
If you are experiencing performance, hang, or high CPU issues with WebSphere Application Server, there is a procedure documented by IBM support team to collect the data necessary to diagnose and resolve the kind of issues. This procedure is based essentially on
enabling Application Server verboseGC
running a script, at the time of the problem, which collects 3 javacores for the problematic JVM
At the end of this procedure, you need to collect:
*.tar.gz file generated by the script
javacores generated by the script
server logs (SystemOut.log, native_stderr.log,...)
and send the results to IBM Support.
To get the script and for additional information about this procedure, I suggest to give a look to the following articles:
WebSphere MustGather procedure on Linux
WebSphere MustGather procedure on Windows
A similar document exists also for AIX platform.
Related
I am looking for an explanation in differences between methods generating thread and heap dumps.
What I know so far:
system signal eg. kill -3 triggers instant creation of both (thread and heap dump)
script shipped with Liberty does run java agent which does magic and generates customizable output: thread dump alone or together with heap dump or core dump (or even with both)
server javadump myserver --include=thread,heap,system
https://www.ibm.com/support/knowledgecenter/SSEQTP_liberty/com.ibm.websphere.wlp.doc/ae/rwlp_command_server.html
..so my questions are:
what's better and why?
is there any difference in generated dumps?
which one would you use for providing exposed and automated way of dumps creation (eg. for developers)?
anyone has any experience with my previous point? I would highly appreciate your ProTips
..and also anything you might consider worth to mention here.
PS
What I've noticed. If I do system signal multiple times in a row nothing hangs and the number of generated dumps is equal to number of attempts made. The same happens if I do the same using script based solution (of course it takes longer).
..but if I do kill -3 <PID> ; server javadump myserver --include=thread,heap then server hungs and dumps are not generated - this state is unrecoverable without a restart. <- I've not spent much time on this behaviour so it could be just a failure unrelated to commands performed.
Thank you and best regards!
I have a .NET application which spawns multiple child 'worker processes'. I am using the Windows Job Object API and the JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE setting to ensure the child processes always get killed if the parent process is terminated.
However, I have observed a number of orphaned processes still running on the machine after the parent has been closed. Using Process Explorer, I can see they are correctly still assigned to the Job, and that the Job has the correct 'Kill on Job Close' setting configured.
The documentation for JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE states:
"Causes all processes associated with the job to terminate when the last handle to the job is closed."
This would seem to imply that a handle to the Job was still open somewhere... I did a search for handles to my Job object, and found instances of WmiPrvSE.exe in the results. If I kill the relevant WmiPrvSE.exe process, the outstanding handle to Job is apparently closed, and all the orphaned application processes get terminated as expected.
How come WmiPrvSE.exe has a handle to my Job?
You may find this blog in sorting out what WmiPrvSE is doing.
WmiPrvSE is the WMI Provider host. That means it hosts WMI providers, which are DLLs. So it's almost surely the case that WmiPrvSE doesn't have a handle to your job, but one of the providers it hosts does. In order to figure out which provider is the culprit, one way is to follow the process here and then see which of the separate processes holds the handle.
Once you have determined which provider is holding the handle you can either try to deduce, based on what system components the provider manages, what kind of query would have a handle to your Job. Or you can just disable the provider, if you don't care about losing access to the management of the components the provider provides.
If you can determine what kind of query would be holding a handle, you may be able to deduce what program is issuing the query. Or maybe the eventlog can tell you that (first link above).
To get more help please provide additional details in the OP, such as which providers are running in WmiPrvSE, any relevant eventlog events, and any other diagnostics info you obtain.
EDIT 1/27/16
An approach to find out what happened that caused WMIPrvSE to obtain your job's handle is to use Windbg's !htrace extension. You need to run !htrace -enable after you load you .EXE but before you execute it in Windbg. Then you can break in later and execute !htrace <handle> to see stack traces when the handle was manipulated. You may want to start with this article on handle implementation.
I need to have multiple logins and query executions into an Oracle db, 10 users per process, 10 processes per PC.
I was thinking that I would create 10 threads, one thread per user login.
Is this feasible? Any advice is appreciated.
Very new to threads.
Update:
Thanks for all the comments and answers.
Here are some additional details:
Using Oracle 10.2, Delphi XE, and dbExpress components created on the fly.
Our design is to run 10 processes per machine and simulate 10 user-logins per process. Each login is within its own thread (actually I need to have two logins in each thread, so I am actually creating 200 sessions per machine).
For this simulation exercise, after establishing a connection, each thread retrieves a bunch of data by calling several stored procedures within a loop. For each stored procedure I create a TSQLProcedure object on the fly and close and then free it after using it. Now I am getting ORA1000 Max Cursors exceeded, which I don't understand since I close and free each sp object.
Changing the settings on the server side is out of the question. I saw some documentation that says that on the application side you can set RELEASE_CURSOR=YES. I am guessing that it's an option set at the procedure level.
Yes, it is feasible. You may need a thread for each session you need (see here for an explanation), and you have to ensure OCI is called in a thread safe way, how to do it depends on the library you use to call OCI, if you don't call OCI directly.
Yes it is feasible. Remember that the UI runs on its own thread and can't be accessed directly by the other threads. Also remember you can't share state between threads unless you secure it. This is a start. Here an example on using threads with databases and the dbGo library. I suggest you give it a try and come back if you have specific questions.
In perfmon in Windows Server 2003, there are counter objects to get per-process processor time and memory working set statistics. The only problem is that in an environment with multiple application pools, there is no way to reliably identify the correct worker process. In perfmon, they are all called "w3wp", and if there is more than one, they are w3wp, w3wp#1, w3wp#2, and so on. Even these names are unreliable - the number depends on which one started first, and obviously changes when an app pool is recycled because the process is destroyed and restarted.
I haven't found any ASP.NET-specific counters, and for some reason, my IIS object doesn't separate instances - there's only one "global" instance.
Ultimately, I just want the "% Processor Time" and "Working Set" counters for a specific IIS App Pool. Any suggestions?
We'd always collect the stats for all the w3wp processes, and we would capture PID. This is one of the counters in the Process group.
There's a script that site in Server 2003's system32 folder called IISApp.vbs, that will list all the processes and their PIDs. You will need to run this to capture the PID's.
I'm sure there has to be a better way but this worked when we needed to do adhoc monitoring.
The w3wp instance may not appear, if the worker process is idle for a long time .
The UI interface has to be used for small course of time , so that the worker process (w3wp) can show up in the instances.
I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.