Dump File analysis - windows

Recently I start facing issue on few servers where CPU start consuming more resources than usual trend. I am trying to find out the root cause for this and took the dump of w3wp process from Task Manager(right click on process and took the dump).
Now the dmp file size is 14GB and I am trying to analyze it through WinDBG but the tool is not working and getting message:
I also took few minidumps but some of them opening fine while few are not so it's not related to confusion between 32bit or 64bit.(The collected dump is 64bit).
I am trying to know what causing this issue. Is it file size or I am not taking the dump properly.
I checked link but it's not helpful.

Windbg is not the right tool for this job. Dumps are only snapshots so you have no idea what happened before. Use ETW and here the CPU Sampling, which sums all calls and shows you in detail the CPU usage.
Install the Windows Performance Toolkit which is part of the Windows 10 SDK (V1607 works on Win8/8.1(Server2012/R2) and Win10 or the V1511 SDK if you use Windows 7/Server2008R2)), run WPRUi.exe and select CPU Usage
and press on Start. Capture 1-2 minutes of the high CPU usage and next click on Save. Open the generated ETL with WPA.exe (Perf analyzer), drag and drop the CPU Usage (Sampled) graph to the analysys pane
and load the Debug Symbols. Now select your process in the graph, zoom in and expand the stack, here you see the weight of the CPU usage of all calls
In this sample most CPU usage from Internet Explorer comes from HTML stuff.
For .NET applications WPA shows you .net related groupings like GC or JIT:
Expand the stack of the w3wp process to see what it is doing. From the names you should have a clue what happens.

Related

Debugging memory usage in a very short lived application without Windows

I have a console application that runs for less than 500ms, but that according to BenchmarkDotNet allocates more than 100 MB.
I am trying to figure out what are those 100Mb because it does not add up. However I cannot find a tool to do so in Linux or Mac. Once the method the app calls is over, the GC can clean all that memory without problems, so it is not a leak I can see in a dump, unless I take the dump in the very exact moment before exiting the method. I am not clear which is the moment in which the algorithm peaks in memory usage.
I can take CPU traces using dotnet-trace and show it in the browser with Speedscope, but I cannot show in Speedscope a trace when using gc-verbose or gc-collect as provider.
Is there a way with dotnet-trace to print in the console the stats of the created objects or anything like that?
try out dotnet dump and checkout this article by Tess Ferrandez.
and maybe you can share a little bit more information please.

Memory dump for period of time

When a program is misbehaving, it is pretty easy to capture a memory dump of the process, and then analyze it with a tool like WinDBG. However, this is pretty limited, you only get a snapshot of what the process is doing, and in some cases finding why a certain part of the code was reached is really difficult.
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture, which would indicate what changed in that period of time, and the parts of the code that were executed in that time interval?
Recording many memory dumps
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture
Yes, that exists. It's called Procdump and you can define the number of dumps with the -n parameter and the seconds between dumps with -s. It might not work well for small values of s, because it takes longer to take the crash dump.
Example:
procdump -ma -n 10 -s 1 <PID> ./dumps
However, this technique is usually not very helpful, because you now have 10 dumps to analyze instead of just 1 - and analyzing 1 dump is already difficult. AFAIK, there's no tool that would compare two dumps and give you the differences.
Live debugging
IMHO, what you need is live debugging. And that's possible with WinDbg, too. Development debugging (using an IDE) and production debugging are two different skills. So you don't need to install a complete IDE such as Visual Studio on your customer's production environment. Actually, if you copy an existing WinDbg installation onto a USB stick, it will run portable.
Simply start WinDbg, attach to a process (F6), start a log file (.logopen), set up Microsoft symbols, configure exceptions (sx) and let the program run (g).
Remote debugging
Perhaps you may even want to have a look into WinDbg's remote debugging capabilities, however, that's a bit harder to set up, usually due to IT restrictions (firewall etc.).
Visual Studio also offers remote debugging, so you can use VS on your machine and just install a smaller program on your customer's machine. I hardly have experience with it, so I can't tell you much.
Logging
the parts of the code that were executed in that time interval?
The most typical approch I see applied by any company is turning on the logging capabilities of your application.
You can also record useful data with WPT (Windows Performance Toolkit), namely WPR (Windows Performance Recorder) and later analyze it with WPA (Windows Performance Analyzer). It will give you call stacks over time.

Visual Studio 100% disk usage

I have VS 2013 and Microsoft Windows 8.1
The issue appeared at the ending of last week. Without any updating or important changing, when I do somethings in VS, disk usage reaches 100%. For example, when I click on "Check In" button in the "Team Explorer" window, disk usage raises up to 100%. Sometimes by a simple right-click in text editor this problems happens.
I googled about 100% disk usage problem but there are some things about this problem on windows 8.1 but on my computer, all applications are running without any problem, just VS2013 has a "full disk usage" problem.
Some information about my system:
OS Name: Microsoft Windows 8.1 Pro
OS Version: 6.3.9600 N/A Build 9600
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
Intel64 Family 6 Model 60 Stepping 3GenuineIntel ~3500 Mhz
Total Physical Memory: 8,131 MB
Available Physical Memory: 3,836 MB
Virtual Memory: Max Size: 10,947 MB
Virtual Memory: Available: 5,275 MB
Virtual Memory: In Use: 5,672 MB
Page File Location(s): C:\pagefile.sys
(Comment for others landing here as #Marta explains that the problem no longer persists on their machine.)
In general, any performance issue in Visual Studio should be reported to Microsoft. It's easy to do this directly from VS using the Report a Problem tool. That feature will automatically attach logs/traces which are shared privately with Microsoft. Internally, tooling will analyse those attachments to assign a ticket to the relevant team. With such attachments, there is a high likelihood that the problem can be diagnosed and fixed in a future release of Visual Studio.
Instructions on the Report a Problem tool:
https://learn.microsoft.com/en-us/visualstudio/ide/how-to-report-a-problem-with-visual-studio?view=vs-2019
If you prefer to diagnose high disk IO yourself, FileMon can be a useful tool:
https://learn.microsoft.com/en-us/sysinternals/downloads/filemon
I am using Visual Code 1.71.2 as well as Visual Studio 2022 (Community Edition) on Windows 10. I am also facing the same issue.
After lot of checking, found disabling superfetch mitigates this issue. But again, Windows, applications startup take lot of time.
As a workaround, I found that by clearing %temp% folder after using visual studio or code eliminates this issue and disk activity is normal.
But every time, I may not remember this cleanup and hate it for forgetting :(
Hope this helps someone in similar situation.
It could be related to Visual Studio updates - which would show under C:\ProgramData\Package Cache.
A disk space management tool like TreeSize Pro will help figure it out though ... it will show which directory is using the most space. You can then target what aspect of Visual Studio is eating up your drive space.
There is a free trial at https://www.jam-software.com/treesize/
You can also use this tool to export and post a screenshot / export of the usage here and it may help identify what is going on.
I had a similar issue that turned out to be the built-in git provider having issues with large codebases containing a moderate-to-large amount of changes before a commit.
Changing to a third-party one fixed the issue.
The operating system manages the resources (CPU cores, disk drives, GPU) to deliver what you have asked of it.
Ideally (what the OS designers are hoping for), when you perform an action, all the resources spin up into action and due to a well balanced system, they all go to 100% utilization, for a brief length of time, then go back to idle.
This form of utilization is, in practice impossible to achieve, as the PC builders would have to know what your system is going to be used for.
When the task manager describes the CPU as 100% utilized, it means that all the cores on the box, are busy running code, and are the bottleneck.
When the task manager describes the disk as 100% utilized, it (as far as I can tell), means that there is always a queue of items to be read, or written to/from the disk. Even with 100% utilization, it may be that the metric is the only reason you are concerned, and the system is otherwise responsive.
In either of these cases, it shows that for a given workload, the CPU or the disk drive has become the rate determining step.
In practice, it should not matter, unless the length of time the system is at 100% is longer than a few minutes, or that your machine feels otherwise sluggish.
Further diagnosis can be performed by using the tool Sys internals : procmon, or the Microsoft : ADK
I would look using the procmon, at what files are being accessed during the 100%disk usage period, and decide whether
The behavior is sensible (if not raise a bug with Microsoft)
The machine is working usably (if not consider a hybrid or ssd disk)
I've had some exasperating problems with disk usage and source control explorer.
What fixed the issue for me was making sure that I never opened Source Control Explorer in more than one project at a time, keep it closed when I could and limit the amount of VS instances you have open.
An SSD may can solve this... Are you sure this is caused by visual studio? when I was using Windows 8.1, the Windows Defender get to 100% disk usage from time to time. If you're sure it occurs when you use Visual Studio, you can try to repair it using the installer. Hope these would help you.
Try moving the source code to SSD drive.
HDDs have much slower disk I/O performance compared to SSD drives.
Generally in windows C drive comes as SSD drive.

VS2013 C++ : slow linking process

Have you ever experienced a slow linking process in a C++ project compile (I'm using VS2013)? I get linking process taking 15 minutes after a 5 minutes Build. A Rebuild or Visual Studio restart does not fix the issue.
Task manager shows mspdbsrv.exe taking a full core (25% on a 4-core), and pdb file creation is very slow, Kb after Kb it reaches 50 Mb.
Additional info:
the same project on another computer build and link in acceptable time.
antivirus is disabled
I've tried changing the pdb file creation location: no success
I've set the linker output Verbose, but it stops at one debug row and then wait all the time there.
VS2013 is updated to SP4
Hard disk activity led on my laptop is mostly turned off, so there's no full activity on the disk.
I'm running Windows 7 Pro.
VS2013 is run with administrator rights
Thank you.
Fixed changing "Debug Information Format" from /ZI (Edit and Continue feature) to /Zi. I still have no clue why this cause a slow linking only on a specific computer and not for the others. Hope this could help people coming here for the same issue.
Is it using Link-Time Code Generation? On VS2010 that step was single threaded so it takes much longer than the apparent compile step. On later versions this step was improved to be multithreaded, according to blog posts. It also mentioned that PDB access is a bottleneck. So, maybe you have an old mspdbsrv.exe on that system? Look at the version info on the file, compared across machines. There might also be some option to control its locking and multi-use behavior, hidden away someplace.

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Resources