We are trying to troubleshoot a nasty problem on a production server where the server will start misbehaving after running for awhile.
Diagnostics have led us to believe there may be a bug in a DLL that is used by one of the processes running on this server that is resulting in a global atom leak. The assumed vector is a process that is calling RegisterClass without a corresponding UnregisterClass (and the class name is using a random number as part of the name, so it's a different class name each time the process starts).
This article provided some information: https://blogs.msdn.microsoft.com/ntdebugging/2012/01/31/identifying-global-atom-table-leaks/
But we are reluctant to attempt kernel mode debugging on a production server, so we have tried installing windbg and using the !gatom command to list atoms for a given session.
I use windbg to attach to a process in one of the sessions (these processes are running as Windows Services if that matters), then invoke the !gatom command. The returned atom list doesn't have any window classes in it.
Then I read this: https://blogs.msdn.microsoft.com/oldnewthing/20150429-00/?p=44984
and it sounds like there is a separate atom table for windows classes. And no way to query it. I was hoping that we'd be able to actually see how many windows class atoms have been registered, and see if that list gets bigger over time, indicating a leak.
The documentation on !gatom is sparse, and I'm hoping I can get some expert confirmation or recommendations on how to proceed.
Does anyone have any ideas on how we can get at the list of registered Windows classes on a production server?
More detail about what happens when the server starts to misbehave:
We run many instances (>50) of the same application as separately registered services running from isolated executables and DLLs - so each of those 50 instances has their own private executables and DLLs.
During their normal run, the processes unload and reload a DLL (about every hour). There is a windows class used that's part of a "session handle" used by the DLL (the session handle is part of the registered windows class name), and that session handle is unique each time the DLL is loaded. So every hour, there is an additional Window class registration, made by a DLL (our service stays running).
After some period of time, the system will get into a state where further attempts to load the DLL in question fail. This may happen for one of the services, then gradually over time, other services will start to have the same problem.
When this happens, restarting the service does not fix the problem. The only way that we've found to get things running properly again is to reboot the server.
We are monitoring memory commit load, and we are well within the virtual memory of the server. We are even within the physical memory size.
I just did a code review the vendor of the DLL, and it looks like they are not actually calling RegisterClass from the DLL itself (they only make one RegisterClass call from the DLL, and it's a static string - not a different class name for each session). The DLL launches an EXE, and that EXE is the one that registers the session specific class name. Their EXE does call UnregisterClass (and even if it didn't, the EXE is terminated when we unload their DLL, so it seems that this may not be what is going on).
I am now out of bullets on this one. The behavior seems like some sort of resource leak or pool exhaustion. The next time this happens, I will try connecting to the failing process with windbg and see what the application atom pool looks like - but I'm not hopeful that is going to shed any light.
Update: The excellent AtomTableMonitor tool has narrowed the problem to rogue RegisterWindowMessage. I'm going to ask a more specific question focused on this exact issue: Diagnosing RegisterWindowsMessage leak
You may try using this standalone global atom monitor
The application appears to have capabilities to monitor atoms in services
that run in a different session
btw if you have narrowed it to RegisterWindowMessage
then spy++ can log the Registered messages system wide along with thread and process
spy++ (i am using it from vs2015 community)
ctrl+m select all windows in system
in the messages tab clear all and select registered
and start logging
you can also save the log (it is plain text in-spite of strange extension )
powershell -c "gc spy++.sxl -Tail 3"
<000152> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.584 point:(408, 221)
<000153> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)
<000154> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)
Related
I need to debug a dll that's getting loaded on a specific application pool process. I'm using WinDbg, and so far I have successfully found the correct w3wp.exe process. Problem is, the app pool recycles itself, sometimes before it reaches what I want to find (a very elusive second chance exception). Then, I need to start over.
How do I configure WinDbg to automatically attach to a w3wp.exe process that's connected to a specific app pool, every time it starts?
Another solution would be to get the crash dump - I tried using ADPlus.exe for this but it also needs to be started on a process and I didn't find a way to re-run it automatically on only the process I need AND every time it starts.
So to sum it up, I need a way to get a crash dump from a w3wp.exe process that's connected to a specific IIS App Pool when it crashes on a second chance exception, while the process gets restarted once in a while (not enough time for me to run the debug tools manually each time).
Eventually, I discovered this awesome tool called DebugDiag, which is an official Microsoft tool. It has a nice interface that allows collecting dumps on certain events, such as exceptions, and creates very useful logs.
On top of that - it has a special section specifically for IIS debugging, which allowed me to select the app pool I was interested in.
Download available here.
my question may seem too weird but i thought about the windows hibernation thing and i was wondering if there is a way to hibernate a specific process or application.
i.e : when windows start up from a normal shutdown/restart it will load all startup programs normally but in addition of that it will load a specific program with it`s previous status before shutting down the computer.
I have though about reserving the memory location and retrieve it back when computer start up , but is there any application that does that in windows environment ?
That cannot work. The state of a process is almost never contained in just the process itself. A gui app creates user32 and gdi objects that are stored in a heap associated with the desktop. It makes calls to Windows that affect the window manager state. It makes I/O calls that cause code inside drivers to run. Which in turn affects allocations inside the kernel pools. Multiply the trouble by every pipe or rpc channel it opens to talk to other processes. And shared resources like the clipboard.
Only making a snapshot of the entire operating system state works.
There are multiple solutions for this now, in Linux OS: CRIU, CryoPID2, BLCR.
I think docker can be used (both for windows & linux), but it requires pre-packaging your app in a docker, which bears some overhead(s).
I have a completely random error popping up on a particular piece of software out in the field. The application is a game written in VB6 and is running on Windows 7 64-bit. Every once in a while, the app crashes, with a generic "program.exe has stopped responding" message box. This game can run fine for days on end until this message appears, or within a matter of hours. No exception is being thrown.
We run this app in Windows 2000 compatibility mode (this was its original OS), with visual themes disabled, and as an administrator. The app itself is purposely simple in terms of using external components and API calls.
References:
Visual Basic for Applications
Visual Basic runtime objects and procedures
Visual Basic objects and procedures
OLE Automation
Microsoft DAO 3.51 Object Library
Microsoft Data Formatting Object Library
Components:
Microsoft Comm Control 6.0
Microsoft Windows Common Controls 6.0 (SP6)
Resizer XT
As you can see, these are pretty straightforward, Microsoft-standard tools, for the most part. The database components exist to interact with an Access database used for bookkeeping, and the Resizer XT was inserted to move this game more easily from its original 800x600 resolution to 1920x1080.
There is no networking enabled on the kiosks; no network drivers, and hence no connections to remote databases. Everything is encapsulated in a single box.
In the Windows Application event log, when this happens, there is an Event ID 1000 faulting a seemingly random module -- so far, either ntdll.dll or lpk.dll. In terms of API calls, I don't see any from ntdll.dll. We are using kernel32, user32, and winmm, for various file system and sound functions. I can't reproduce as it is completely random, so I don't even know where to start troubleshooting. Any ideas?
EDIT: A little more info. I've tried several different versions of Dependency Walker, at the suggestion of some other developers, and the latest version shows that I am missing IESHIMS.dll and GRPSVC.dll (these two seems to be well-known bugs in Depends.exe), and that I have missing symbols in COMCTRL32.dll and IEFRAME.dll. Any clues there?
The message from the application event log isn't that useful - what you need is a post mortem process dump from your process - so you can see where in your code things started going wrong.
Every time I've seen one of these problems it generally comes down to a bad API parameter rather than something more exotic, this may be caused by bad data coming in, but usually it's a good ol fashioned bug that causes the problem.
As you've probably figured already this isn't going to be easy to debug; ideally you'd have a repeatable failure case to debug, instead of relying on capturing dump files from a remote machine, but until you can make it repeatable remote dumps are the only way forwards.
Dr Watson used to do this, but is no longer shipped, so the alternatives are:
How to use the Userdump.exe tool to create a dump file
Sysinternals ProcDump
Collecting User-Mode dumps
What you need to get is a minidump, these contain the important parts of the process space, excluding standard modules (e.g. Kernel32.dll) - and replacing the dump with a version number.
There are instructions for Automatically Capturing a Dump When a Process Crashes - which uses cdb.exe shipped with the debugging tools, however the crucial item is the registry key \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug
You can change your code to add better error handling - especially useful if you can narrow down the cause to a few procedures and use the techniques described in Using symbolic debug information to locate a program crash. to directly process the map files.
Once you've got a minidump and the symbol files WinDbg is the tool of choice for digging into these dumps - however it can be a bit of a voyage to discover what the cause is.
The only other thing I'd consider, and this depends on your application structure, is to attempt to capture all input events for replay.
Another option is to find a copy of VMWare 7.1 which has replay debugging and use that as the first step in capturing a reproducible set of steps.
Right click your executable object and let it be WINXP compatible pending
when you discover source of the problem to finally solve it
I need to monitor and, if it is needed, decline process start in the Windows XP and Vista OS?
What are known/documented/undocmented methods? What about known hacks of this methods?
(It will be used for the shareware firewall/security software).
Be very careful with any code that thinks it knows enough about what a user is doing to know whether or not to allow a process to start. It's a great way to find out how much you don't know about your users, but only if you provide an email address for the users to send complaints to.
An example was some VPN software I worked with that hooked into the Windows system to be notified whenever a DLL was loaded. It actually caused BSOD when running a very common application - Visual Studio. The manufacturer wasn't aware of how modular VS is, and that starting it loads many DLLs, and sometimes even more during execution, as new features are loaded.
When you put yourself in the position to do things for your users, you have the responsibility to know enough to do them correctly.
For monitoring you can use WMI events.
There is no[1] method to decide whether to allow the start or not. If you are on Pro/Biz/Ent/Ultimate editions group policy can be used to block specified executables from being launched, or limit to a specified list.
[1] As far as I am aware.
An application is hanging occasionally, and I would like to see the dump at the time to figure it out. I had written an application that the user can run to automatically create a dump that I can look at. However I can't seem to get the users to remember to run it when it hangs, no matter what I try. They always end up closing the program, which invokes Windows Error Reporting.
WER will create dumps in the temp directory, but unfortunately they are deleted as soon as the dialog for sending the info to Microsoft or not is closed.
Becoming an ISV and getting this info from Microsoft's error reporting servers is one solution.. but not one that is realistic at the moment.
I can't imagine that I am the only one faced with this issue. The software is used concurrently by dozens upon dozens of staff, so reaching them all and getting them to run an application or not click close on that dialog until running some other application or etc has not been working out.
The app is running on Windows Server 2003. Too bad, since I know Server 2008 has some LocalDumps options that will let me retain them.
Any ideas for somehow keeping these dumps around so I can analyze them? The obstacle is the user, in the sense that I've accepted to their stubbornness and do not expect them to run any other application or do anything special.
Thanks for any advice!
You could opt for an automatic solution. I believe there're multiple options at your disposal for detecting if you're hung.
One would be the use of SendMessageTimeout (also pay attention to SMTO_ABORTIFHUNG as one of the fuFlags values) from a separate thread in your app. Once you have determined the main thread is not responding you can save a dump file wherever you want.
There's also a IsHungAppWindow() (user32.dll) available since w2k.