using visual studio #pragma to make project high priority - visual-studio-2010

I'm trying to test my project in high priority because the wall-clock timing I have in code seemed to be too long. I have already tried setting my .exe to run in high priority through the command line. It brought back the same time. Someone suggested that I use visual studio's #pragma to make it high priority, however I cannot find how to do that.
Note, my current project is single thread.
Is there any other way?
specific project is: timing sound initialization of soundcard (digital->analog) and DAQ (analog->digital). It's taking 1.1013 to 0.998 seconds, and I'm going to need it to be more precise because I'm using the DAQ to acquire data of a process that only takes around 10ms

If a certain function in your software requires to run in high priority, use the Windows functions SetThreadPriority or SetPriorityClass.
As soon as running at high priority is not required anymore, your software should revert back to normal priority. Do not try to keep your application running always and generally as "high priority". This would be hostile behavior :)
More information at:
msdn.microsoft.com/en-us/library/windows/desktop/ms685100.aspx

Related

Windows Timer Resolution vs Application Priority vs Processor scheduling

Please, make it once more clear the technical difference between these three things around MS Windows systems. First is Timer Resolution you may set and get via ntdll.dll non-exported functions NtSetTimerResolution and NtQueryTimerResolution or use the Sysinternals' clockres.exe tool.
One of the scandalous trick used by the Chrome browser some time ago to perform better across the web. (They left high resolution trick for Flash plugin only at the moment). https://bugs.chromium.org/p/chromium/issues/detail?id=153139
https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
In fact Visual Studio and SQL Server in some cases do the trick as well. I personally feel like it performs the whole system better and crisp, not slow down as many people warn out there.
What is the difference between the timer resolution and application I/O and memory priority (realtime/high/above normal/normal/low/background/etc.) you may set via Task Manager except the fact that the timer resolution sets up for the whole system, not a single application?
What is the difference between them and Processor scheduling option you can adjust from CMD > SystemPropertiesPerformance.exe -> Advanced tab. By default, the users' OS versions (like XP/Vista/7/8/8.1/10) set the performance of programs, the servers' versions (2k3/2k8/2k12/2k16) do care of background services. How this option interacts with those two above?
timeBeginPeriod() is the documented api to do this. It is documented to affect the accuracy for Sleep(). Dave Cutler probably did not enjoy implementing it, but allowing Win 3.1 code to port made it necessary. The multi-media api back then was necessary to keep anemic hardware with small buffers going without stuttering.
Very crude, but there is no other good way to do it in the kernel. The normal state for a processor core is to be stopped on a HLT instruction. Consuming (almost) no power, the only way to revive it is with a hardware interrupt. Which is what it does, it cranks up the clock interrupt rate. Normally ticks 64 times per second, you can jack it up to 1000 with timeBeginPeriod, 2000 with the native api.
And yes, pretty bad for power consumption. The clock interrupt handler also activates the thread scheduler, an fairly unsubtle chunk of code. The reason why a Sleep() call can now wake up at (almost) the clock interrupt rate. Tinkered with in Win8.1 btw, the only thing I noticed about the changes is that it is not quite as responsive anymore and a 1 msec rate can cause up to 2 msec delays.
Chrome is indeed notorious for ab/using the heck out of it. I always assumed that it provided a competitive edge for a company that does big business in mobile operating systems and battery-powered devices. The guy that started this web site noticed something was wrong. The more responsible thing to do for a browser is to bump up the rate to 10 msec, necessary to get accurate GIF animation. Multi-media playback does not need it anymore.
This otherwise has no effect at all on scheduling priorities. One detail I did not check is if the thread quantum changes correspondingly (the number of ticks a thread may own a core before being evicted, 3 for a workstation). I suspect it does.

Periodic GPU performance problem

I have a WinForms application that uses XNA to animate 3D models in a control. The app have been doing just fine for months but recently I've started to experience periodic pauses in the animation. Setting out to investigate what is going on I have established these facts:
It happens on my machine only, other machines works fine
Removing everything from my render loop does not improve the problem
In 2. I didn't actually remove everything, I limited my loop to set the viewport on my GraphicsDevice and then do a GraphicsDevice.Present.
Trying to dig further I fired up PIX to capture some statistics. Screenshots of two PIX runs can be viewed here (Run6) and here (Run14). Run6 is using my original render loop and Run14 is using the bare-bones Present loop.
PIX tells me that the GPU is periodically doing something, and I assume this is causing the pauses. What could be the cause of this? Or how do I go about finding out what the GPU is actually doing?
Update: since I usually trust my code to be perfect (who's laughing?) I started a new XNA project from scratch to see if it exhibit the same behavior. So starting a new XNA 3.1 Windows Game project and running PIX I get this timeline. The same periodic pauses. So the problem must be lower in the stack, in XNA or Direct3D.
So PIX shows that the GPU is working on something, I can see the list of DX calls made within each frame and the timing calculations shows that the pause occurs during (or after) the IDirect3DDevice9::Present call.
Update 2: I had previously installed and uninstalled XNA 4.0 CTP on the problematic machine. I cannot be certain that this is related but I thought that perhaps a reinstall of the XNA Game Studio 3.1 bits could make a difference. Turns out it did.
The underlying question remains the same (and the bounty is still up): what could affect XNA 3.1 (or DirectX) to make it behave like this and is there any logging/tracing power tool for the DirectX and/or GPU level out there that could shed some light on what is going on?
Note: I'm using XNA 3.1 on a Windows 7 x64 dual-core machine with 8GB RAM.
Note2: also posted this question on the XNA Creators forums here.
You could try to see if you can find something with Xperf that is close to your periodically problem, do not run your application but keep the programs open that would normally run besides your application. You could also try to do it again with the application running but it could give a cluttered view.
Start the tracing, do this in an elevated prompt.
xperf -on BASE+LATENCY -stackWalk Profile
Wait for a fair amount of time to be sure that the problem is traced.
Stop the tracing and open it like this.
xperf -d trace.etl
xperfview trace.etl
Analyze by looking at the graphs and consulting tables of specific intervals and see if you can find something that is related to the problem, the highest chance on finding it would be in the DPC and Interrupts section. But it might as well be something odd at the CPU or I/O section. Good luck!
Also more information on Xperf and how to obtain it, hopefully this delivers results.
If not, you can alternatively try GPUView which has been used for improvements in DWM,
this is also included next to Xperf with the Windows Performance Toolkit so you can easily try both!
log v
... wait for a fair amount of time to be sure that the problem is traced ...
log
gpuview merged.etl
In the case that gpuview gets out of memory you can try to add "/limit 3" or remove the v.
Read the documentation of the tools if you are stuck somewhere.
Hmm ... this seems to be occurring on the GPU, however it sounds like a CPU garbage collection issue. Can you run the CLR profiler and see if you can see any spikes in GC activity that you can correlate to the slowdowns?
I agree that it sounds unlikely since you can clearly see it in PIX, but it might offer a clue as to the cause.
If it's only happening on your own machine, then could it be drivers? Forgive me for being skeptical, but it's a 64 bit machine after all :D
This looks like either a vsync issue or GPU in its last throes. Since going back to a different version fixed it, and the "bottleneck" is in IDirect3DDevice9::Present lets go with the former option.
I'm not familiar with XNA so I don't know how much of the workings of D3D are exposed, but do you know what your PresentationParameters are set to?
Specifically try setting the swap effect set to Discard.

Windows System Idle Processes interfering with performance measurement

I am doing some performance measurement of my code on a Windows box and I am finding that I am getting dramatically different results between measurements. A quick bit of ad hoc exploration during a slow one shows in the task manager System Idle Processes taking up almost 100% CPU.
Does anyone know what System Idle Processes actually means and what Windows features it may be running?
NB: I am not measuring performance using the task manager, I just used it to take a look at what else was running during a particularly slow measurement.
Please think before saying this is not programming related and closing the question. I would not ask it unless I thought there were grounds to say it is. In this case I believe it clearly is because it is detrimentally affecting my development and test environments and in order to sort it out I need to know a bit more about it. Programming does not start and end with the writing of the code.
The idle process typically doesn't do any useful work except execute the HLT instruction, which puts that CPU core into a lower power state (C1). However, the fact that your benchmark is not consuming 100% CPU time does open the door for speculation about what is going on.
If your application is single-threaded and your test system is multicore/hyperthreaded/multi-CPU, then you should expect to see around 50% idle CPU time for two cores, 75% for four, etc. This is because the CPU time percentages in Task Manager include all cores. (I believe that older versions of Windows had an option to change this, but I don't see it on Vista.)
If the idle process is consuming a lot of CPU, that may indicate that your application is spending a lot of time sleeping. It might be waiting for data from some external source (e.g. a disk or network). It might be spending a lot of time waiting for synchronization objects (e.g. mutexes or events). It also might be spending a lot of time calling the Sleep() function. Profiling your code should identify where it's spending the time.
Getting fully reproducible benchmark results may require you to disable processor/disk/network-intensive background applications and services (e.g. search indexing, SMS software inventory, virus scanning, Windows Update downloads, IncrediBuild/distcc) or to connect the machine to an isolated network (or to no network at all).
I'm assuming that you wrote a benchmark for your application, and that you're just trying to use Task Manager to diagnose why the benchmark results aren't what you expected. Task Manager isn't an accurate way to measure application performance.
System Idle Process is a sort of a default process that Windows runs on the processor when it has nothing else to schedule for running. This process is like a housekeeper that does things like trying to save system power etc.
If you're measuring the performance of your program, don't use the Windows Task manager. Use Performance Monitor instead (which you can start by typing 'perfmon' at the commandline). Or better still, use a profiler.
If you "system idle process" is taking up 100% then essentially you machine is bored, nothing is going on. If you add up everything going on in task manager, subtract this number from 100%, then you will have the value of "system idle process." Notice it consumes almost no memory at all and cannot be affecting performance.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Edit:
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
//...
#pragma code_seg
#pragma data_seg(".startUp")
//...
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)

Operating System Overheads while profiling?

I am doing profiling of a C code in Microsoft VS 2005 on a Intel Core-2Duo platform.
I measure the time(secs:millisecs) counsumed by my function. But i have some doubts about the accuracy of this measurement as the operating system will not continuously run my application, but instead schedule others apps/services in between the execution of my code.(Although i have no major applications running while i do the profile run, still windows will have lot of code of its own which it will run by preempting my app.). Because of all this i believe the profiling number(time taken by my app to run) is not accurate.
So my question is there any way to find out the Operating system overheads, scheduling overhead on a typical windows system(I run Windows XP)e.g. if my applications says it ran for 60 milliseconds, out of that 60 msec, how much time really was used by my app. and how much time it was sitting idle, due to being pre-empted by some other task scheduled by the OS?
or
Atleast is there any ball-park number to get such OS overhead, based on your experience you came across while doing something similar?
#Kogus: Even if i run outside debugger(standalone app. from a command prompt) it still could be preempted by OS and cause a incorrect measurement of the time consumed by my app.
Is'nt it?
-AD
I think you are going to have some problems with the granularity. See similar questions GetLocalTime() API time resolution and Is gettimeofday() guaranteed to be of microsecond resolution?
Also, you may want to take a look at the Windows Resource Kits Tools which include timeit.exe (similar to time on unix/linux) to give you elapsed and process times.
Suggestion
Try run on multi CPU systems.
The best way of doing this is a dedicated profiling tool. There are lots out there. I haven't used one for C for a few years, someone else will hopefully be able to give better advice. As you are using Visual Studio 2005 this might be a good place to start:
AQ, but I've never used it.
1 - Put some debug logging in your code (include timestamps of course), and run it outside of the debugger
2 - Run again in the debugger
3 - Repeat many times, to get statistically valid data.
4 - Compare.
If there is a significant difference in the average execution time of the standalone vs. the debugger, then you are right to be suspicious of the OS (or the overhead of the debugger hooks themselves...). If no difference, then don't sweat it.
Edit0: Obviously the debug messages have some overhead of their own. You may want to leave those in the code even when you are running from the debugger. That way, both the standalone and the debugger are running the very same code.
Edit1: I misunderstood the question. I thought your concern was that --while debugging--, the OS might interrupt your app more frequently than in a normal mode of execution. If you want to know how much time your app actually spent working, just compare the time taken to the "CPU Time" in the Task Manager.
Edit2: Compare the time returned by GetProcessTimes for your process to the actual execution time. The difference is the time spent by the CPU on somebody else.

Resources