Scheduling Windows threads to run with high accuracy timing? - windows

I'm asking this in a general sense without a specific language in mind.
What I want to be able to do is have a thread run every 100ms with a high accuracy. It seems that the highest accuracy I'm able to get using normal threads (in a quick test application) is about 5ms or so. The thread takes about 10ms to perform its task.
Basically, is there any way to ask Windows to schedule my thread with at certain delays? (To the best of its ability, I'm aware it's not a realtime operating system)

On Windows, probably the best you can do is to use a multimedia timer. These are the platform's high resolution timers.

Related

windows c++ and possibility of a microsecond sleep

Is there anyway at all in the windows environment to sleep for ~1 microsecond? After researching and reading many threads and various websites, I have not been able to see that this is possible. Since the scheduler appears to be the limiting factor and it operates
at the 1 millisecond level, then I believe it can't be done without going to a real time OS.
It may not be the most portable, and I've not used these functions myself, but it might be possible to use the information in the High-Resolution Timer section of this link and block: QueryPerformanceCounter
Despite the fact that windows is claimed to be not a "real-time" OS, events can be generated at microsecond resolution. The use of a combination of system time (file_time) and the performance counter frequency has been described at other places. However, careful
implementation with taking care about processor affinity and process/thread priorities opens the door to timed events at microsecond resolution.
Since the windows scheduler and the windows timer services do rely on the systems interrupt
mechanism, the microsecond can only be tuned for by polling. Particulary on multicore systems
polling is not so ugly anymore. And the polling only has to last for the shortest possible
interrupt period. The multimedia timer interface allows to put the interrupt period down to
about 1ms, thus one can get near the desired (microsecond resolution) time and the polling will last for 1ms at most.
My implementation of microsecond resolution time services for windows, test code and an extensive description can be found at the
Windows Timestamp Project located at windowstimestamp.com

Optimal CPU utilization thresholds

I have built software that I deploy on Windows 2003 server. The software runs as a service continuously and it's the only application on the Windows box of importance to me. Part of the time, it's retrieving data from the Internet, and part of the time it's doing some computations on that data. It's multi-threaded -- I use thread pools of roughly 4-20 threads.
I won't bore you with all those details, but suffice it to say that as I enable more threads in the pool, more concurrent work occurs, and CPU use rises. (as does demand for other resources, like bandwidth, although that's of no concern to me -- I have plenty)
My question is this: should I simply try to max out the CPU to get the best bang for my buck? Intuitively, I don't think it makes sense to run at 100% CPU; even 95% CPU seems high, almost like I'm not giving the OS much space to do what it needs to do. I don't know the right way to identify best balance. I guessing I could measure and measure and probably find that the best throughput is achived at a CPU avg utilization of 90% or 91%, etc. but...
I'm just wondering if there's a good rule of thumb about this??? I don't want to assume that my testing will take into account all kinds of variations of workloads. I'd rather play it a bit safe, but not too safe (or else I'm underusing my hardware).
What do you recommend? What is a smart, performance minded rule of utilization for a multi-threaded, mixed load (some I/O, some CPU) application on Windows?
Yep, I'd suggest 100% is thrashing so wouldn't want to see processes running like that all the time. I've always aimed for 80% to get a balance between utilization and room for spikes / ad-hoc processes.
An approach i've used in the past is to crank up the pool size slowly and measure the impact (both on CPU and on other constraints such as IO), you never know, you might find that suddenly IO becomes the bottleneck.
CPU utilization shouldn't matter in this i/o intensive workload, you care about throughput, so try using a hill climbing approach and basically try programmatically injecting / removing worker threads and track completion progress...
If you add a thread and it helps, add another one. If you try a thread and it hurts remove it.
Eventually this will stabilize.
If this is a .NET based app, hill climbing was added to the .NET 4 threadpool.
UPDATE:
hill climbing is a control theory based approach to maximizing throughput, you can call it trial and error if you want, but it is a sound approach. In general, there isn't a good 'rule of thumb' to follow here because the overheads and latencies vary so much, it's not really possible to generalize. The focus should be on throughput & task / thread completion, not CPU utilization. For example, it's pretty easy to peg the cores pretty easily with coarse or fine-grained synchronization but not actually make a difference in throughput.
Also regarding .NET 4, if you can reframe your problem as a Parallel.For or Parallel.ForEach then the threadpool will adjust number of threads to maximize throughput so you don't have to worry about this.
-Rick
Assuming nothing else of importance but the OS runs on the machine:
And your load is constant, you should aim at 100% CPU utilization, everything else is a waste of CPU. Remember the OS handles the threads so it is indeed able to run, it's hard to starve the OS with a well behaved program.
But if your load is variable and you expect peaks you should take in consideration, I'd say 80% CPU is a good threshold to use, unless you know exactly how will that load vary and how much CPU it will demand, in which case you can aim for the exact number.
If you simply give your threads a low priority, the OS will do the rest, and take cycles as it needs to do work. Server 2003 (and most Server OSes) are very good at this, no need to try and manage it yourself.
I have also used 80% as a general rule-of-thumb for target CPU utilization. As some others have mentioned, this leaves some headroom for sporadic spikes in activity and will help avoid thrashing on the CPU.
Here is a little (older but still relevant) advice from the Weblogic crew on this issue: http://docs.oracle.com/cd/E13222_01/wls/docs92/perform/basics.html#wp1132942
If you feel your load is very even and predictable you could push that target a little higher, but unless your user base is exceptionally tolerant of periodic slow responses and your project budget is incredibly tight, I'd recommend adding more resources to your system (adding a CPU, using a CPU with more cores, etc.) over making a risky move to try to squeeze out another 10% CPU utilization out of your existing platform.

What is the 'realtime' process priority setting for?

From what I've read in the past, you're encouraged not to change the priority of your Windows applications programmatically, and if you do, you should never change them to 'Realtime'.
What does the 'Realtime' process priority setting do, compared to 'High', and 'Above Normal'?
A realtime priority thread can never be pre-empted by timer interrupts and runs at a higher priority than any other thread in the system. As such a CPU bound realtime priority thread can totally ruin a machine.
Creating realtime priority threads requires a privilege (SeIncreaseBasePriorityPrivilege) so it can only be done by administrative users.
For Vista and beyond, one option for applications that do require that they run at realtime priorities is to use the Multimedia Class Scheduler Service (MMCSS) and let it manage your threads priority. The MMCSS will prevent your application from using too much CPU time so you don't have to worry about tanking the machine.
Simply, the "Real Time" priority class is higher than "High" priority class. I don't think there's much more to it than that. Oh yeah - you have to have the SeIncreaseBasePriorityPrivilege to put a thread into the Real Time class.
Windows will sometimes boost the priority of a thread for various reasons, but it won't boost the priority of a thread into another priority class. It also won't boost the priority of threads in the real-time priority class. So a High priority thread won't get any automatic temporary boost into the Real Time priority class.
Russinovich's "Inside Windows" chapter on how Windows handles priorities is a great resource for learning how this works:
http://web.archive.org/web/20140909124652/http://download.microsoft.com/download/5/b/3/5b38800c-ba6e-4023-9078-6e9ce2383e65/C06X1116607.pdf
Note that there's absolutely no problem with a thread having a Real-time priority on a normal Windows system - they aren't necessarily for special processes running on dedicatd machines. I imagine that multimedia drivers and/or processes might need threads with a real-time priority. However, such a thread should not require much CPU - it should be blocking most of the time in order for normal system events to get processing.
It would be the highest available priority setting, and would usually only be used on box that was dedicated to running that specific program. It's actually high enough that it could cause starvation of the keyboard and mouse threads to the extent that they become unresponsive.
So basicly, if you have to ask, don't use it :)
Real-time is the highest priority class available to a process. Therefore, it is different from 'High' in that it's one step greater, and 'Above Normal' in that it's two steps greater.
Similarly, real-time is also a thread priority level.
The process priority class raises or lowers all effective thread priorities in the process and is therefore considered the 'base priority'.
So, a process has a:
Base process priority class.
Individual thread priorities, offsets of the base priority class.
Since real-time is supposed to be reserved for applications that absolutely must pre-empt other running processes, there is a special security privilege to protect against haphazard use of it. This is defined by the security policy.
In NT6+ (Vista+), use of the Vista Multimedia Class Scheduler is the proper way to achieve real-time operations in what is not a real-time OS. It works, for the most part, though is not perfect since the OS isn't designed for real-time operations.
Microsoft considers this priority very dangerous, rightly so. No application should use it except in very specialized circumstances, and even then try to limit its use to temporary needs.
Once Windows learns a program uses higher than normal priority it seems like it limits the priority on the process.
Setting the priority from IDLE to REALTIME does NOT change the CPU usage.
I found on My multi-processor AMD CPU that if I drop one of the CPUs ot like the LAST one the CPU usage will MAX OUT and the last CPU remains idle. The processor speed increases to 75% on my Quad AMD.
Use Task Manager->select process->Right Click on the process->Select->Set Affinity
Click all but the last processor. The CPU usage will increase to the MAX on the remaining processors and Frame counts if processing video will increase.
Like all other answers before real time gives that program the utmost priority class. Nothing is processed until that program has been processed.
On my pentium 4 machine I set minecraft to real time a lot since it increases the game performance a lot, and the system seems completely stable. so realtime isn't as bad as it seems, just if you have a multi-core set a program's affinity to a specific core or cores (just not all of them, just to let everything else be able to run in case the real time set programs gets hung up) and set the priority to real time.
It basically is higher/greater in everything else. A keyboard is less of a priority than the real time process. This means the process will be taken into account faster then keyboard and if it can't handle that, then your keyboard is slowed.
Realtime process priority is used mainly to make a specific process run faster at the expense of literally everything else. I personally have my Discord bot's process priority set to realtime, since the bot is lightweight, and I need to keep it responding quickly, but it would be a bad idea to randomly change process priorities unless you know what you're doing, for example, if you were to set Google Chrome's priority to low, it wouldn't cause any problems, but if you were to set registry's priority to low, it would likely cause more than enough problems. Just don't set a heavy program to realtime unless the device it's being run on is completely dedicated to that program.

What is the best way to measure "spare" CPU time on a Linux based system

For some of the customers that we develop software for, we are required to "guarantee" a certain amount of spare resources (memory, disk space, CPU). Memory and disk space are simple, but CPU is a bit more difficult.
One technique that we have used is to create a process that consumes a guaranteed amount of CPU time (say 2.5 seconds every 5 seconds). We run this process at highest priority in order to guarantee that it runs and consumes all of its required CPU cycles.
If our normal applications are able to run at an acceptable level of performance and can pass all of their functionality tests while the spare time process is running as well, then we "assume" that we have met our commitment for spare CPU time.
I'm sure that there are other techniques for doing the same thing, and would like to learn about them.
So this may not be exactly the answer you're looking for, but if all you want to do is make sure your application doesn't exceed certain limits on resource consumption and you're running on linux you can customize /etc/security/limits.con (may be different file on your distro of choice) to force the limits on a particular user and only run the process under that user. This is of course assuming that you have that level of control on your client's production environment.
If I understand correctly, your concern is wether the application also runs while a given percentage of the processing power is not available.
The most incontrovertible approach is to use underpowered hardware for your testing. If the processor in your setup allows you to, you can downclock it online. The Linux kernel gives you an easy interface for doing this, see /sys/devices/system/cpu/cpu0/cpufreq/. There is also a bunch of GUI applications for this available.
If your processor isn't capable of changing clock speed online, you can do it the hard way and select a smaller multiplier in your BIOS.
I think you get the idea. If it runs on 1600 Mhz instead of 2400 Mhz, you can guarantee 33% of spare CPU time.
SAR is a standard *nix process that collects information about the operational use of system resources. It also has a command line tool that allows you to create various reports, and it's common for the data to be persisted in a database.
With a multi-core/processor system you could use Affinity to your advantage.

Multiple threads and performance on a single CPU

Is here any performance benefit to using multiple threads on a computer with a single CPU that does not having hyperthreading?
In terms of speed of computation, No. In fact things will slow down due to the overhead of managing the threads.
In terms of responsiveness, yes. You can for example have one thread wait on an IO operation and have another run a GUI at the same time.
It depends on your application. If it spends all its time using the CPU, then multithreading will just slow things down - though you may be able to use it to be more responsive to the user and thus give the impression of better performance.
However, if your code is limited by other things, for example using the file system, the network, or any other resource, then multithreading can help, since it allows your application to behave asynchronously. So while one thread is waiting for a file to load from disk, another can be querying a remote webserver and another redrawing the GUI, while another is doing various calculations.
Working with multiple threads can also simplify your business logic, since you don't have to pay so much attention to how various independent tasks need to interleave. If the operating system's scheduling logic is better than yours, then you may indeed see improved performance.
You can consider using multithreading on a single CPU
If you use network resources
If you do high-intensive IO operations
If you pull data from a database
If you exploit other stuff with possible delays
If you want to make your app with ultraspeed reaction
When you should not use multithreading on a single CPU
High-intensive operations which do almost 100% CPU usage
If you are not sure how to use threads and synchronization
If your application cannot be divided into several parallel processes
Yes, there is a benefit of using multiple threads (or processes) on a single CPU - if one thread is busy waiting for something, others can continue doing useful work.
However this can be offset by the overhead of task switching. You'll have to benchmark and/or profile your application on production grade hardware to find out.
Regardless of the number of CPUs available, if you require preemptive multitasking and/or applications with asynchronous components (i.e. pretty much anything that combines a responsive GUI with a non-trivial amount of computation or continuous I/O processing), multithreading performs much better than the alternative, which is to use multiple processes for each application.
This is because threads in the same process can exchange data much more efficiently than can multiple processes, because they share the same memory context.
See this Wikipedia article on computer multitasking for a fairly concise discussion of these issues.
Absolutely! If you do any kind of I/O, there is great advantage to having a multithreaded system. While one thread wait for an I/O operation (which are relatively slow), another thread can do useful work.

Resources