ANTS 4 OMP abort error - ants

I have downloaded the trial versions of ANTS and dotTrace profilers.
When profiling in ANTS, at a certain point the program stops and I get this error in a Console window:
system error(-2147467259):
__kmp_setaffinity: SetThreadAffinityMask: Unspecified
error
OMP abort: fatal system error
detected.
The program profiles without drama in dotTrace.. What does the error mean and is it a problem with my code or with ANTS?

ANTS Profiler needs to set the thread affinity to save having to implement complicated cross-thread synchronization that would slow down the application and introduce serious complication to the Profiler code.
This is done using the Win32 API function SetThreadAffinityMask -- the failure message leaves a lot to be desired, but what you could do as a workaround is open
%userprofile%\Local Settings\Application Data\Red Gate\ANTS
Performance Profiler 5\PerformanceProfilerSettings.xml
and change UseThreadAffinity to False.
If any problems occur during profiling, you could try downloading a third-party program to set the affinity for your application.

Related

Can SysInternals' Process Monitor log when a thread blocks awaiting for an event?

I need to diagnose a server that is unable to reach peak performance. CPU usage drops to zero for around 500ms and then spikes to 100% while trying to process the queued requests, this pattern repeats during a number of hours after which the operation becomes smooth again (Operation had been smooth for years)
This suggests to me that the worker threads are idling while awaiting for an external event to occur. The application is complex and we haven't been able to pinpoint the culprit.
Can Process Monitor be configured to log every time a thread sleeps awaiting for some event?
If possible, can the event be related to a particular stack trace?
If the above is possible, perhaps I could correlate the CPU drops with wait events and pinpoint the culprit.
I have successfully used Windbg before to diagnose these kinds of problems, however in this case, the wait is very brief and I'm not confident that I can make the debugger break exactly while the processor is idling.
Windbg and ProcMon are not the right tools for this job. Install the Windows Performance Toolkit which is part of the Windows 10 SDK on your Developer device.
Now xcopy the folder C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit to the server, open cmd.exe as admin and run wpr.exe -start CPU && timeout -1 && wpr.exe -stop C:\Hang.etl, now minimize the cmd.
After you got the hang, switch back to cmd and press a key to stop logging.
Move the Hang.etl + NGENPDB folder to the dev PC, open the Hang.etl with Windows Performance Analyzer (WPA.exe), load debug symbols and start finding the hang by adding the CPU (Precise) to analysis pane
and make sour you see the columns NewProcess, NewThreadId, NewStack, ReadyingProcess, ReadyingThreadId, ReadyingStack, Waits(us). Click on Waits(us) to see most long on top. Now look for long times, with a small Count (so small operations that take long time, not many operations) and inspect the callstack to have any clues what happens.

Differences between Nsight debug launch and normal OS launch

I'd like to know what sets the "Debug with Nsight" option apart from simply executing the binary through Visual Studio or the OS's command line.
The reason I ask is because my program works fine if I run it by "Debugging with Nsight", but I get a few unspecified cudaErrors with some cudaMemcpys following a driver crash when launching it with Visual Studio's launch button (or simply launching the executable), which leads me to believe that Nsight must have some kind of specific launch parameters necessary for the program to run correctly.
The driver crash followed by API errors occurs when your app hits a windows TDR event due to kernel execution taking too long. You can work around this by modifying the system registry, or putting a Quadro or Tesla GPU in TCC mode, or reducing the run-time of your kernel(s).
When you debug with nsight, your kernel execution may get halted for various reasons (single step, breakpoints, and other reasons), and then restarted, depending on what you are doing exactly in your debug session. The halting of the kernel execution allows the windows watchdog to be satisfied without a TDR event.
CUDA nSight debugger allows you to debug the CUDA kernels line by line, you can't do this with the standard Visual Studio debugger.
Presumably nSight performs some code injection to enable it to detect the runtime of kernels, its also possible on your settings that when debugging with nSight your kernels may not be executing on the GPU. These could be the cause of errors coming and going between debuggers. I know when I used them I had similar inconsistencies.
If you run your program through the nSight profiler it should be able to clearly log the memCpy errors for you.

Does cuda-gdb automatically pick the correct focus for a CUDA_EXCEPTION_5 Warp out-of-range address error?

I have a kernel that's failing with CUDA_EXCEPTION_5, Warp Out-of-range Address.
cuda-gdb automatically places the focus on a specific block. Is that the block where the error is occuring or when nvidia documentation states CUDA_EXCEPTION_5 "is not precise" it means the debugger can't determine which block/thread that specific exception occurred in?
If it's only granular to the warp is there a way to find out within cuda-gdb which warp the exception occurred within and which blocks belong to that warp?
cuda-gdb is not always able to precisely detect thread where the exception was thrown. To increase the precision you need to enable memcheck integration before starting your application by using "set cuda memcheck on" command. Please note that running the application with integrated memcheck enabled degrades the performance.
In CUDA 5.0, cuda-memcheck when used from within cuda-gdb has similar memory access error detection capabilities as when run standalone. The standalone cuda-memcheck application has additional capabilities such as detection of race conditions, detection of memory leaks as well as the ability to continue past the first error. Cuda-memcheck is not related to autostep, and when integrated with cuda-gdb will precisely stop the application at the first detected error from an out of bound or misaligned memory accesses.

GPU code gives different time when run from VS2008 and when running only .exe

I have cuda events in my code to record the time of execution. When I click "Start Debuggin" from VS 2008, the timer gives a value of 1.5 seconds. However, when I run the program from .exe file, it gives time of .4 seconds. Why this difference?
There's no inherent reason that running attached to the debugger should introduce a performance difference, other than (just a few possibilities):
Do you have any conditional breakpoints set? Depending on the condition, these can have a dramatic impact on execution time.
Are you explicitly writing large amounts of data to the Debug or Trace listeners? (Edit: that's relevant for C#, probably not for C++.)
Is the EXE compiled in Release mode? By default, the Release config turns on optimizations that aren't present when building in Debug mode.
Is your timing code really only timing the relevant section? If you're starting the timer at the start of program execution instead of around the GPU calls that you're really interested in, you may be accidentally timing some startup tasks that are tied to running with the debugger that won't be active in a standalone app.

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Resources