Win32: Storing debug info to survive crash - windows

I'd like to store some info somewhere for debugging purposes, if my program crashes. On the next start I would read this information. I could do this in the registry or file system, but that's not very fast. I could use shared memory and a second process, but I'd like to avoid running a second program all the time.
I'd like to know if there is any faster way for this. Is there any windows mechanism (pipes, shared memory, whatever) that is fast and not lost if a program crashes?

You may consider using the Application Event Log:
(C# code here, but you can do it unmanaged as well)
string cs = "MyAppEvents";
if (!EventLog.SourceExists(cs))
EventLog.CreateEventSource(cs, "Application");
EventLog.WriteEntry(cs, message, EventLogEntryType.Error);
Events written there are sent to a service, which may be fast enough.
The architecture of the event logging service is explained here:
http://oreilly.com/catalog/winlog/chapter/ch02.html
(it is old, but has not changed much)
EDIT: it WAS changed since the days of NT4 :)
Never versions of windows (>2000 - NT5) have a more efficient feature called Event Tracing. You can read more about it here http://msdn.microsoft.com/en-us/magazine/cc163437.aspx
A reasonable implementation should use a named pipe between advapi32 in your process and the services, which on the same machine uses shared memory maps (in fact, there is a named pipe called "eventlog" among the others you can find in an windows system)

Related

Diagnosing Win32 RegisterClass leak

We are trying to troubleshoot a nasty problem on a production server where the server will start misbehaving after running for awhile.
Diagnostics have led us to believe there may be a bug in a DLL that is used by one of the processes running on this server that is resulting in a global atom leak. The assumed vector is a process that is calling RegisterClass without a corresponding UnregisterClass (and the class name is using a random number as part of the name, so it's a different class name each time the process starts).
This article provided some information: https://blogs.msdn.microsoft.com/ntdebugging/2012/01/31/identifying-global-atom-table-leaks/
But we are reluctant to attempt kernel mode debugging on a production server, so we have tried installing windbg and using the !gatom command to list atoms for a given session.
I use windbg to attach to a process in one of the sessions (these processes are running as Windows Services if that matters), then invoke the !gatom command. The returned atom list doesn't have any window classes in it.
Then I read this: https://blogs.msdn.microsoft.com/oldnewthing/20150429-00/?p=44984
and it sounds like there is a separate atom table for windows classes. And no way to query it. I was hoping that we'd be able to actually see how many windows class atoms have been registered, and see if that list gets bigger over time, indicating a leak.
The documentation on !gatom is sparse, and I'm hoping I can get some expert confirmation or recommendations on how to proceed.
Does anyone have any ideas on how we can get at the list of registered Windows classes on a production server?
More detail about what happens when the server starts to misbehave:
We run many instances (>50) of the same application as separately registered services running from isolated executables and DLLs - so each of those 50 instances has their own private executables and DLLs.
During their normal run, the processes unload and reload a DLL (about every hour). There is a windows class used that's part of a "session handle" used by the DLL (the session handle is part of the registered windows class name), and that session handle is unique each time the DLL is loaded. So every hour, there is an additional Window class registration, made by a DLL (our service stays running).
After some period of time, the system will get into a state where further attempts to load the DLL in question fail. This may happen for one of the services, then gradually over time, other services will start to have the same problem.
When this happens, restarting the service does not fix the problem. The only way that we've found to get things running properly again is to reboot the server.
We are monitoring memory commit load, and we are well within the virtual memory of the server. We are even within the physical memory size.
I just did a code review the vendor of the DLL, and it looks like they are not actually calling RegisterClass from the DLL itself (they only make one RegisterClass call from the DLL, and it's a static string - not a different class name for each session). The DLL launches an EXE, and that EXE is the one that registers the session specific class name. Their EXE does call UnregisterClass (and even if it didn't, the EXE is terminated when we unload their DLL, so it seems that this may not be what is going on).
I am now out of bullets on this one. The behavior seems like some sort of resource leak or pool exhaustion. The next time this happens, I will try connecting to the failing process with windbg and see what the application atom pool looks like - but I'm not hopeful that is going to shed any light.
Update: The excellent AtomTableMonitor tool has narrowed the problem to rogue RegisterWindowMessage. I'm going to ask a more specific question focused on this exact issue: Diagnosing RegisterWindowsMessage leak
You may try using this standalone global atom monitor
The application appears to have capabilities to monitor atoms in services
that run in a different session
btw if you have narrowed it to RegisterWindowMessage
then spy++ can log the Registered messages system wide along with thread and process
spy++ (i am using it from vs2015 community)
ctrl+m select all windows in system
in the messages tab clear all and select registered
and start logging
you can also save the log (it is plain text in-spite of strange extension )
powershell -c "gc spy++.sxl -Tail 3"
<000152> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.584 point:(408, 221)
<000153> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)
<000154> 001F01A4 P message:0xC1B2 [Registered:"nsAppShell:EventID"] wParam:00000000 lParam:06EDFCE0 time:4:2
7:49.600 point:(408, 221)

How to hibernate application?

my question may seem too weird but i thought about the windows hibernation thing and i was wondering if there is a way to hibernate a specific process or application.
i.e : when windows start up from a normal shutdown/restart it will load all startup programs normally but in addition of that it will load a specific program with it`s previous status before shutting down the computer.
I have though about reserving the memory location and retrieve it back when computer start up , but is there any application that does that in windows environment ?
That cannot work. The state of a process is almost never contained in just the process itself. A gui app creates user32 and gdi objects that are stored in a heap associated with the desktop. It makes calls to Windows that affect the window manager state. It makes I/O calls that cause code inside drivers to run. Which in turn affects allocations inside the kernel pools. Multiply the trouble by every pipe or rpc channel it opens to talk to other processes. And shared resources like the clipboard.
Only making a snapshot of the entire operating system state works.
There are multiple solutions for this now, in Linux OS: CRIU, CryoPID2, BLCR.
I think docker can be used (both for windows & linux), but it requires pre-packaging your app in a docker, which bears some overhead(s).

Flow of execution of file in WINDOWS

My question is:
What is the flow of execution of an executable file in WINDOWS? (i.e. What happens when we start a application.)
How does the OS integrate with the application to operate or handle the application?
Does it have a central control place that checks execution of each file or process?
Is the process registered in the execution directory? (if any)
What actually happens that makes the file perform its desired task within WINDOWS environment?
All help is appreciated...
There's plenty going on beyond the stages already stated (loading the PE, enumerating the dlls it depends on, calling their entry points and calling the exe entry point).
Probably the earliest action is the OS and the processor collaborating to create a new address space infrastructure (I think essentially a dedicated TLB). The OS initializes a Process Environment Block, with process-wide data (e.g., Process ID and environment variables). It initializes a Thread Environment Block, with thread-specific data (thread id, SEH root handlers, etc). Once an appropriate address is selected for every dll and it is loaded there, re-addressing of exported functions happens by the windows loader. (very briefly - at compile time, the dll cannot know the address at which it will be loaded by every consumer exe. the actual call addresses of its functions is thus determined only at load time). There are initializations of memory pages shared between processes - for windows messages, for example - and I think some initialization of disk-paging structures. there's PLENTY more going on. The main windows component involved is indeed the windows loader, but the kernel and executive are involved. Finally, the exe entry point is called - it is by default BaseProcessStart.
Typically a lot of preparation happens still after that, above the OS level, depending on used frameworks (canonical ones being CRT for native code and the CLR for managed): the framework must initialize its own internal structures to be able to deliver services to the application - memory management, exception handling, I/O, you name it.
A great place to go for such in depth discussions is Windows Internals. You can also dig a bit deeper in forums like SO, but you have to be able to break it down into more focused bits. As phrased, this is really too much for an SO post.
This is high-level and misses many details:
The OS reads the PE Header to find the dlls that the exe depends on and the exe's entry point
The DllMain function in all linked dlls is called with the DLL_PROCESS_ATTACH message
The entry point of the exe is called with the appropriate arguments
There is no "execution directory" or other central control other than the kernel itself. However, there are APIs that allow you to enumerate and query the currently-running processes, such as EnumProcesses.
Your question isn't very clear, but I'll try to explain.
When you open an application, it is loaded into RAM from your disk.
The operating system jumps to the entry point of the application.
The OS provides all the calls required for showing a window, connecting with stuff and receiving user input. It also manages processing time, dividing it evenly between applications.

Minidump creates empty dump file

We have an in-proc crash handler which is using MiniDumpWriteDump() from DbgHelp to write a minidump is case of a process crash.
I know its not the best way to do it, however, at the moment we do not have other option.
The problem is: one certain executable always creates 0 byte dumps. But it works well for other processes. What could be the possible reason behind this behavior?
We had this problem from time to time with our minidumping code. In the end we changed it to spawn a lightweight secondary process on startup and used a simple MMF to communicate with the dumper process when we needed a minidump generated.
We had all sorts of problems using MiniDumpWriteDump from within the process being dumped. Since the change to a dedicated dumping process, it's been very reliable.
If at all possible I suggest you consider the same. It ended up not being that much work.

Invoke Blue Screen of Death using Managed Code

Just curious here: is it possible to invoke a Windows Blue Screen of Death using .net managed code under Windows XP/Vista? And if it is possible, what could the example code be?
Just for the record, this is not for any malicious purpose, I am just wondering what kind of code it would take to actually kill the operating system as specified.
The keyboard thing is probably a good option, but if you need to do it by code, continue reading...
You don't really need anything to barf, per se, all you need to do is find the KeBugCheck(Ex) function and invoke that.
http://msdn.microsoft.com/en-us/library/ms801640.aspx
http://msdn.microsoft.com/en-us/library/ms801645.aspx
For manually initiated crashes, you want to used 0xE2 (MANUALLY_INITIATED_CRASH) or 0xDEADDEAD (MANUALLY_INITIATED_CRASH1) as the bug check code. They are reserved explicitly for that use.
However, finding the function may prove to be a bit tricky. The Windows DDK may help (check Ntddk.h) - I don't have it available at the moment, and I can't seem to find decisive info right now - I think it's in ntoskrnl.exe or ntkrnlpa.exe, but I'm not sure, and don't currently have the tools to verify it.
You might find it easier to just write a simple C++ app or something that calls the function, and then just running that.
Mind you, I'm assuming that Windows doesn't block you from accessing the function from user-space (.NET might have some special provisions). I have not tested it myself.
I do not know if it really works and I am sure you need Admin rights, but you could set the CrashOnCtrlScroll Registry Key and then use a SendKeys to send CTRL+Scroll Lock+Scroll Lock.
But I believe that this HAS to come from the Keyboard Driver, so I guess a simple SendKeys is not good enough and you would either need to somehow hook into the Keyboard Driver (sounds really messy) or check of that CrashDump has an API that can be called with P/Invoke.
http://support.microsoft.com/kb/244139
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters
Name: CrashOnCtrlScroll
Data Type: REG_DWORD
Value: 1
Restart
I would have to say no. You'd have to p/invoke and interact with a driver or other code that lives in kernel space. .NET code lives far removed from this area, although there has been some talk about managed drivers in future versions of Windows. Just wait a few more years and you can crash away just like our unmanaged friends.
As far as I know a real BSOD requires failure in kernel mode code. Vista still has BSOD's but they're less frequent because the new driver model has less drivers in kernel mode. Any user-mode failures will just result in your application being killed.
You can't run managed code in kernel mode. So if you want to BSOD you need to use PInvoke. But even this is quite difficult. You need to do some really fancy PInvokes to get something in kernel mode to barf.
But among the thousands of SO users there is probably someone who has done this :-)
You could use OSR Online's tool that triggers a kernel crash. I've never tried it myself but I imagine you could just run it via the standard .net Process class:
http://www.osronline.com/article.cfm?article=153
I once managed to generate a BSOD on Windows XP using System.Net.Sockets in .NET 1.1 irresponsibly. I could repeat it fairly regularly, but unfortunately that was a couple of years ago and I don't remember exactly how I triggered it, or have the source code around anymore.
Try live videoinput using directshow in directx8 or directx9, most of the calls go to kernel mode video drivers. I succeded in lots of blue screens when running a callback procedure from live videocaptureing source, particulary if your callback takes a long time, can halt the entire Kernel driver.
It's possible for managed code to cause a bugcheck when it has access to faulty kernel drivers. However, it would be the kernel driver that directly causes the BSOD (for example, uffe's DirectShow BSODs, Terence Lewis's socket BSODs, or BSODs seen when using BitTorrent with certain network adapters).
Direct user-mode access to privileged low-level resources may cause a bugcheck (for example, scribbling on Device\PhysicalMemory, if it doesn't corrupt your hard disk first; Vista doesn't allow user-mode access to physical memory).
If you just want a dump file, Mendelt's suggestion of using WinDbg is a much better idea than exploiting a bug in a kernel driver. Unfortunately, the .dump command is not supported for local kernel debugging, so you would need a second PC connected over serial or 1394, or a VM connected over a virtual serial port. LiveKd may be a single-PC option, if you don't need the state of the memory dump to be completely self-consistent.
This one doesn't need any kernel-mode drivers, just a SeDebugPrivilege. You can set your process critical by NtSetInformationProcess, or RtlSetProcessIsCritical and just kill your process. You will see same bugcheck code as you kill csrss.exe, because you set same "critical" flag on your process.
Unfortunately, I know how to do this as a .NET service on our server was causing a blue screen. (Note: Windows Server 2008 R2, not XP/Vista).
I could hardly believe a .NET program was the culprit, but it was. Furthermore, I've just replicated the BSOD in a virtual machine.
The offending code, causes a 0x00000f4:
string name = string.Empty; // This is the cause of the problem, should check for IsNullOrWhiteSpace
foreach (Process process in Process.GetProcesses().Where(p => p.ProcessName.StartsWith(name, StringComparison.OrdinalIgnoreCase)))
{
Check.Logging.Write("FindAndKillProcess THIS SHOULD BLUE SCREEN " + process.ProcessName);
process.Kill();
r = true;
}
If anyone's wondering why I'd want to replicate the blue screen, it's nothing malicious. I've modified our logging class to take an argument telling it to write direct to disk as the actions prior to the BSOD weren't appearing in the log despite .Flush() being called. I replicated the server crash to test the logging change. The VM duly crashed but the logging worked.
EDIT: Killing csrss.exe appears to be what causes the blue screen. As per comments, this is likely happening in kernel code.
I found that if you run taskkill /F /IM svchost.exe as an Administrator, it tries to kill just about every service host at once.

Resources