DumpCount in WER doesn't work precisely - windows

I work on a Windows 64-bit PC. My application launches 10+ instances of a 32-bit process with the same name, Proc.exe. These instances are launched very close in time; in a certain scenario they crash, also very close in time.
I try to collect crash dumps for these crashes, using WER. I use the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\Proc.exe, as explained in "Collecting User-Mode Dumps" MSDN article. I use the full dump (DumpType=2). In the target dump folder I obtain a number of dump files whose names differ by the PID number: Proc.exe.1836.dmp, Proc.exe.5428.dmp, etc.
The problem: the total number of the generated dump files is not always exactly according to the value of DumpCount, often there are more files, and their actual number varies from run to run. I saw up to 5 files for DumpCount=1, and up to 8 files for DumpCount=3.
I tried to work with the global LocalDumps settings (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps) instead of the per-process settings, and the results are similar.
It's important for me to control the number of the generated dump files, since the free disk space is very limited.
Do I miss something, or what I reported now is a bug in Windows?

Related

Memory dump for period of time

When a program is misbehaving, it is pretty easy to capture a memory dump of the process, and then analyze it with a tool like WinDBG. However, this is pretty limited, you only get a snapshot of what the process is doing, and in some cases finding why a certain part of the code was reached is really difficult.
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture, which would indicate what changed in that period of time, and the parts of the code that were executed in that time interval?
Recording many memory dumps
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture
Yes, that exists. It's called Procdump and you can define the number of dumps with the -n parameter and the seconds between dumps with -s. It might not work well for small values of s, because it takes longer to take the crash dump.
Example:
procdump -ma -n 10 -s 1 <PID> ./dumps
However, this technique is usually not very helpful, because you now have 10 dumps to analyze instead of just 1 - and analyzing 1 dump is already difficult. AFAIK, there's no tool that would compare two dumps and give you the differences.
Live debugging
IMHO, what you need is live debugging. And that's possible with WinDbg, too. Development debugging (using an IDE) and production debugging are two different skills. So you don't need to install a complete IDE such as Visual Studio on your customer's production environment. Actually, if you copy an existing WinDbg installation onto a USB stick, it will run portable.
Simply start WinDbg, attach to a process (F6), start a log file (.logopen), set up Microsoft symbols, configure exceptions (sx) and let the program run (g).
Remote debugging
Perhaps you may even want to have a look into WinDbg's remote debugging capabilities, however, that's a bit harder to set up, usually due to IT restrictions (firewall etc.).
Visual Studio also offers remote debugging, so you can use VS on your machine and just install a smaller program on your customer's machine. I hardly have experience with it, so I can't tell you much.
Logging
the parts of the code that were executed in that time interval?
The most typical approch I see applied by any company is turning on the logging capabilities of your application.
You can also record useful data with WPT (Windows Performance Toolkit), namely WPR (Windows Performance Recorder) and later analyze it with WPA (Windows Performance Analyzer). It will give you call stacks over time.

How to enable Windows Error Reporting LocalDumps for one process only, not system-wide

The user mode dump collection feature in Windows is well documented. As long as the LocalDumps key under HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting is created, every application crash on the system produces a core dump using the default setting. A subkey under it, named after the executable file for the process I am interested in, is overriding the global (default) setting, and I am getting the crash dumps just nominally, full memory burp in the location I specify.
However, there is apparently no setting in the documentation under the LocalDumps that would tell the system not to create dumps, such that I would override one for my service's subkey. In our milieu, there are multiple service processes running under multiple identities, so there are going to be many core dumps scattered around all there identities' %LOCALAPPDATA%\CrashDumps directories, needlessly. These are unattended servers, and more trash means more scripted cleanup.
Is there a documented way to disable core dumps by default for all programs, and override the prohibition for just one?
It is documented that dumps are not created if the location is not writable by the crashing process' identity, but I do not want them for LocalSystem services too, and this guy can write anywhere. Also, it seems that the dumps are not created if the DumpFolder points to an invalid filepath. To make it even maybe a little less hacky, I set it to the NUL device in the device namespace (: stands for "value under key" below):
LocalDumps:DumpFolder=\\.\NUL
LocalDumps:DumpCount=1
LocalDumps\myservice.exe:DumpFolder=C:\Program Files\myservice\data\crash_dumps
LocalDumps\myservice.exe:DumpType=2
LocalDumps\myservice.exe:DumpCount=5
This ostensibly works in Server 2012, 2012R2 and 2016. But I must keep fingers crossed that a next update would not change this (undocumented) behavior.

Increasing no of file handles in Windows 7 64 bit

I have an application that has 10000 threads running at a time. Each thread opens the same file. The problem is whenever I launch the application with 10K threads, the application terminates after creating 500 threads(file handles). I have tried the same application on Linux and is running fine after I tweaked the ulimit option. Is there any limit on the file handles a process can open in Windows? I have been googling and all I get is to change the entries in config.nt file in C\Windows\System32....
But I found out that the said file does not exist for 64 bit OS. Is there any way that I can change the limit in Windows?
My OS is WINDOWS 7 64 bit.
To view the total number of handles (not just file handles) your application opened at a given time: Just to make sure it's the handle limit.
Download Process Explorer from https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx
Make sure to set appropiate refresh speed.
Open it and go to View -> Select Columns -> press on the tab "Process Performance"and click on "Handle Count".
For Windows 7 x64 bit, a process can have 16.711.680 handles opened simultaneously.
If you want to check limits for yourself then read below.
Check that by using a tool from Windows Internals Book (https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx). Tool's name is TestLimit and you will find it in the lower part of the page under the Book Tools header.
There are no ways to increase this limit for Windows Operating Systems as far as I know, and I looked also.
As others stated, think of a method to minimize the large number of threads.
Maybe your application closes the file, but not the handle.
My advice, if you really need using very large handle count, start a new process every time handle count is about 16m.
running out of Microsoft C runtime library file descriptors (Harry Johnston) appears to be the right diagnosis.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/file-handling. The default max is 512 open file descriptors. The solution therefore is a line of code early in main:
_setmaxstdio(newmax);
How to do that in Python is another question.

Why does FileCopy fail at random on Windows 7?

I have a VB6 program running on Windows 7. It is copying a large number of files and sometimes FileCopy fails with an access violation (between every 60 and 500 files).
I cannot reproduce it using a single file, only during such mass-copying operations this problem happens.
It makes no difference, if source/target are on hard disks, network shares or CD-ROMs.
What could trigger this problem?
EDIT: My question might be a little bit convoluted, so here's some more data:
Run 1:
Start copying 5.000 files
Access violation on file #983
Access violation on file #1437
Access violation on file #1499
Access violation on file #2132
Access violation on file #3456
Access violation on file #4320
Done
Run 2:
Start copying 5.000 files
Access violation on file #60
Access violation on file #3745
Done
Observations
The affected files are always different
The number of affected files tends to decrease if the same file batch is copied multiple times in succession.
Running as Administrator makes no difference
The application has read/write access to all necessary file system objects
This problem happens on Windows 7 workstations only!
Best guess: Is it possible that another user/application is using the specified file at the time the process is running? (anti-virus scanner, Win7 search indexing tool, windows defender, etc) You might try booting the machine in safe-mood to eliminate any of the background services/apps and try running the process to see.
Is there any consistency in the file types or size of the files causing the issue?
Is the machine low on resources? RAM/Disk Space
You said it occurs on Win7 – is it multiple Win7 machines or just one. (help to rule out system resources vs. software/OS)
Any hints from the event viewer (control panel > admin tools) – doubtful
Does the process take a long time to complete? If you can take the performance hit you might look at destroying and recreating the FSO object after every copy or every X files to make sure there isn’t some odd memory leak issue with Win7/VB6.
Not necessarily a recommended solution but if all else fails you could handle that error and save the files that trigger it in a dictionary/collection and reloop through the process with any those files when done. No guarantee it wouldn’t happen again.
Not enough information (as you probably know). Do you log the activity? If not, it's a good place to start. Knowing whether certain files are the problem, and if the issue is repeatable, can help narrow it down.
In your case I would also trap (and log) all errors and retry N times after waiting N seconds. You could be trying to copy in-use files locked by another process, and a retry may allow time for that lock to go away.
Really, more data is the key, and logging is the way to get it.
Is there any chance your antivirus program or some indexer is getting in the way?
Try creating a procmon trace while reproducing the error and see what is actually failing. With the trace you can see if there is another program causing the issue or if your app is trying to write somewhere it should't (incorrect permissions) or can't (a temp/scratch directory without enough space).
Check out the presentations linked to on the procmon page or Mark Russinovich's blog for some cool examples of using this tool to solve various Windows/application mysteries.
Is there a a hidden/system file in the directory that is potentially blocking it?
Does running the VB6 App with right-click "Run As Administrator" make a difference?
Is the point where it dies at the max # of files in the directory? e.g. Are you sure the upper limit on whatever loop structure you are using in VB6 is correct (Count vs count -1)?

Clear file cache to repeat performance testing

What tools or techniques can I use to remove cached file contents to prevent my performance results from being skewed? I believe I need to either completely clear, or selectively remove cached information about file and directory contents.
The application that I'm developing is a specialised compression utility, and is expected to do a lot of work reading and writing files that the operating system hasn't touched recently, and whose disk blocks are unlikely to be cached.
I wish to remove the variability I see in IO time when I repeat the task of profiling different strategies for doing the file processing work.
I'm primarily interested in solutions for Windows XP, as that is my main development machine, but I can also test using linux, and so am interested in answers for that environment too.
I tried SysInternals CacheSet, but clicking "Clear" doesn't result in a measurable increase (restoration to timing after a cold-boot) in the time to re-read files I've just read a few times.
Use SysInternal's RAMMap app.
The Empty / Empty Standby List menu option will clear the Windows file cache.
For Windows XP, you should be able to clear the cache for a specific file by opening the file using CreateFile with the FILE_FLAG_NO_BUFFERING options and then closing the handle. This isn't documented, and I don't know if it works on later versions of Windows, but I used this long ago when writing test code to compare file compression libraries. I don't recall if read or write access affected this trick.
A command line utility can be found here
from source:
EmptyStandbyList.exe is a command line tool for Windows (Vista and
above) that can empty:
process working sets,
the modified page list,
the standby lists (priorities 0 to 7), or
the priority 0 standby list only.
Usage:
EmptyStandbyList.exe workingsets|modifiedpagelist|standbylist|priority0standbylist
A quick googling gives these options for Linux
Unmount and mount the partition holding the files
sync && echo 1 > /proc/sys/vm/drop_caches
#include <fcntl.h>
int posix_fadvise(int fd, off_t offset, off_t len, int advice);
with advice option POSIX_FADV_DONTNEED:
The specified data will not be accessed in the near future.
I've found one technique (other than rebooting) that seems to work:
Run a few copies of MemAlloc
With each one, allocate large chunks of memory a few times
Use Process Explorer to observe the System Cache size reducing to very low levels
Quit the MemAlloc programs
It isn't selective though. Ideally I'd like to be able to clear the specific portions of memory being used for caching the disk blocks of files that I want to no longer be cached.
For a much better view of the Windows XP Filesystem Cache - try ATM by Tim Murgent - it allows you to see both the filesystem cache Working Set size and Standby List size in a more detailed and accurate view. For Windows XP - you need the old version 1 of ATM which is available for download here since V2 and V3 require Server 2003,Vista, or higher.
You will observe that although Sysinternals Cacheset will reduce the "Cache WS Min" - the actual data still continues to exist in the form of Standby lists from where it can be used until it has been replaced with something else. To then replace it with something else use a tool such as MemAlloc or flushmem by Chad Austin or Consume.exe from the Windows Server 2003 Resource Kit Tools.
As the question also asked for Linux, there is a related answer here.
The command line tool vmtouch allows for adding and removing files and directories from the system file cache, amongst other things.
There's a windows API call https://learn.microsoft.com/en-us/windows/desktop/api/memoryapi/nf-memoryapi-setsystemfilecachesize that can be used to flush the file system cache. It can also be used to limit the cache size to a very small value. Looks perfect for these kinds of tests.

Resources