Preventing Windows from changing process affinity - windows

I have a multithreaded code that I want to run on all 4 cores that my processor has. I.e. I create four threads, and I want each of them to run on a separate core.
What happens is that it starts running on four cores, but occasionally would switch to only three cores. The only things running are the OS and my exe. This is somewhat disappointing, since it decreases performance by a quarter, which is significant enough for me.
The process affinity that I see in Task Manager allows the process to use any core. I tried restricting thread affinities, but it did't help. I also tried increasing priority of the process, but it did not help the case either.
So the question is, is there any way to force Windows to keep it running on all four cores? If this is not possible, can I reduce the frequency of these interruptions? Thanks!

This is not an issue of affinity unless I am very much mistaken. Certainly the system will not restrict your process to affinity with a specific set of threads. Some other program in the system would have to do that, if indeed that is happening.
Much more likely however is that, simply, there is another thread that is ready to run that the system is scheduling in a round-robin fashion. You have four threads that are always ready to run. If there is another thread that is ready to run, it will get its turn. Now there are 5 threads sharing 4 processors. When the other thread is running, only 3 of yours are able to run.
If you want to be sure that such other threads won't run then you need to do one of the following:
Stop running the other program that wants to use CPU resource.
Make the relative thread priorities such that your threads always run in preference to the other thread.
Now, of these options, the first is to be preferred. If you prioritize your threads above others, then the other threads don't get to run at all. Is that really what you want to happen?
In the question you say that there are no other processes running. If that is the case, and nobody is meddling with processor affinity, and only a subset of your threads are executing, then the only conclusion is that not all of your threads are ready to run and have work to do. That might happen if you, for instance, join your threads at the end of one part of work, before continuing on to the next.
Perhaps the next step for you is to narrow things down a little. Use a tool like Process Explorer to diagnose which threads are actually running.

If this is windows, try SetThreadAffinityMask():
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx
I would assume that if you only set a single bit, then that forces the thread to run only on the selected processor (core).
other process / thread functions:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684847(v=vs.85).aspx
I use a windows video program, and it's able to keep all the cores running at near max when rendering video.

Related

Could process running only on one processor have threads running on other processors?

Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
(Question came from discussion started in one company, regarding using synchronization objects (Events, Mutexes, Semaphores) and WinAPIs, like WaitForSignleObject, etc, especially SignalObjectAndWait for which MSDN states
"Note that the "signal" and "wait" are not guaranteed to be performed
as an atomic operation. Threads executing on other processors can
observe the signaled state of the first object before the thread
calling SignalObjectAndWait begins its wait on the second object"
Does it mean that for single processor it's guaranteed to be atomic?
P.S. Is there any differences for Windows Context Switching that there are multiple processors or single processor with more real cores?
P.P.S. Please be patient with this question if I didn't use exact and concrete terms - this are is still not very good known for me.
No.
The set of processor cores a thread can run on is the intersection of the process affinity mask and the thread affinity mask.
To get the behavior you describe, one would set the thread affinity mask for the main thread, and not mess with the process mask.
For your followup questions: If it isn't atomic, it isn't atomic. There are additional guarantees for threads sharing a core, because preemption follows certain rules, but they are very complex, since relative priority and dynamic priority are important factors in thread scheduling. Because of the complexity, it is best to use proper synchronization.
Notably, race conditions between threads of equal priority certainly still exist on a single core (or single core restricted) system, but they are far less frequent and therefore far more difficult to find and debug.
Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
If not set cpu affinity to only one core, one process could run on multiple cores?
What's the difference between processes and threads?
Thread could have processes or process could have threads?
Could process seen from a thread point of view or vice verse?
What is atomic notion?
when number 1 could seen as multidimensional unit?
Could we divide 1/0 (to zero)? When could we or couldn't?
Does it mean that for single processor it's guaranteed to be atomic?
One cpu: do you remember: run and stay resident? Good old time!
Then Unix: multiprocessing, multithreading, etc. :)
Note:
You couldn't ask a question without knowing answer to that question.
Try to ask something you don't know, that's impossible! You're asking because you have an answer. Look inside your question. Answer is evident. :)

Instruct win32 threads to run on a single processor core

I have a test program which would be much simpler if it could rely on threads being scheduled in strict priority order on Windows. I'm seeing a low priority thread running alongside higher priority threads and wonder if this is happening because the different threads are being scheduled on different processor cores.
Is there a way to force all Win32 threads in a process to use a single processor core? SetThreadAffinityMask looks like it might be interesting but its docs aren't entirely clear and I'm not sure how to use it.
SetThreadAffinityMask function: Sets a processor affinity mask for the specified thread.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686247%28v=vs.85%29.aspx
SetThreadAffinityMask(GetCurrentThread(), (1 << CoreNumber));
Sets the current thread's affinity to 'CoreNumber' variable
Even if you force all threads onto one virtual processor you will still often have low-priority threads running and high-priority threads waiting for them (priority inversion). Once a thread is scheduled by the windows-scheduler it runs until it is either preempted or sleeps (or some other sleep-inducing system call). You will have to change the design of your application so that it no-longer assumes that no low-priority thread runs while a high-priority thread would be ready to run also.

How expensive is a context switch? Is it better to implement a manual task switch than to rely on OS threads?

Imagine I have two (three, four, whatever) tasks that have to run in parallel. Now, the easy way to do this would be to create separate threads and forget about it. But on a plain old single-core CPU that would mean a lot of context switching - and we all know that context switching is big, bad, slow, and generally simply Evil. It should be avoided, right?
On that note, if I'm writing the software from ground up anyway, I could go the extra mile and implement my own task-switching. Split each task in parts, save the state inbetween, and then switch among them within a single thread. Or, if I detect that there are multiple CPU cores, I could just give each task to a separate thread and all would be well.
The second solution does have the advantage of adapting to the number of available CPU cores, but will the manual task-switch really be faster than the one in the OS core? Especially if I'm trying to make the whole thing generic with a TaskManager and an ITask, etc?
Clarification: I'm a Windows developer so I'm primarily interested in the answer for this OS, but it would be most interesting to find out about other OSes as well. When you write your answer, please state for which OS it is.
More clarification: OK, so this isn't in the context of a particular application. It's really a general question, the result on my musings about scalability. If I want my application to scale and effectively utilize future CPUs (and even different CPUs of today) I must make it multithreaded. But how many threads? If I make a constant number of threads, then the program will perform suboptimally on all CPUs which do not have the same number of cores.
Ideally the number of threads would be determined at runtime, but few are the tasks that can truly be split into arbitrary number of parts at runtime. Many tasks however can be split in a pretty large constant number of threads at design time. So, for instance, if my program could spawn 32 threads, it would already utilize all cores of up to 32-core CPUs, which is pretty far in the future yet (I think). But on a simple single-core or dual-core CPU it would mean a LOT of context switching, which would slow things down.
Thus my idea about manual task switching. This way one could make 32 "virtual" threads which would be mapped to as many real threads as is optimal, and the "context switching" would be done manually. The question just is - would the overhead of my manual "context switching" be less than that of OS context switching?
Naturally, all this applies to processes which are CPU-bound, like games. For your run-of-the-mill CRUD application this has little value. Such an application is best made with one thread (at most two).
I don't see how a manual task switch could be faster since the OS kernel is still switching other processes, including yours in out of the running state too. Seems like a premature optimization and a potentially huge waste of effort.
If the system isn't doing anything else, chances are you won't have a huge number of context switches anyway. The thread will use its timeslice, the kernel scheduler will see that nothing else needs to run and switch right back to your thread. Also the OS will make a best effort to keep from moving threads between CPUs so you benefit there with caching.
If you are really CPU bound, detect the number of CPUs and start that many threads. You should see nearly 100% CPU utilization. If not, you aren't completely CPU bound and maybe the answer is to start N + X threads. For very IO bound processes, you would be starting a (large) multiple of the CPU count (i.e. high traffic webservers run 1000+ threads).
Finally, for reference, both Windows and Linux schedulers wake up every millisecond to check if another process needs to run. So, even on an idle system you will see 1000+ context switches per second. On heavily loaded systems, I have seen over 10,000 per second per CPU without any significant issues.
The only advantage of manual switch that I can see is that you have better control of where and when the switch happens. The ideal place is of course after a unit of work has been completed so that you can trash it all together. This saves you a cache miss.
I advise not to spend your effort on this.
Single-core Windows machines are going to become extinct in the next few years, so I generally write new code with the assumption that multi-core is the common case. I'd say go with OS thread management, which will automatically take care of whatever concurrency the hardware provides, now and in the future.
I don't know what your application does, but unless you have multiple compute-bound tasks, I doubt that context switches are a significant bottleneck in most applications. If your tasks block on I/O, then you are not going to get much benefit from trying to out-do the OS.

WIN32: Yielding execution to another (given) thread

I am looking for a way to yield the remainder of the thread execution's scheduled time slice to a different thread. There is a SwitchToThread function in WINAPI, but it doesn't let the caller specify the thread it wants to switch to. I browsed MSDN for quite some time and haven't found anything that would offer just that.
For an operating-system-internals layman like me, it seems that yielding thread should be able to specify which thread does it want to pass the execution to. Is it possible or is it just my imagination?
The reason you can't yield processor time-slices to a designated thread is that Windows features a preemptive scheduling kernel which pretty much places the responsibility and authority of scheduling the processor time in the hands of the kernel and only the kernel.
As such threads don't have any control over when they run, if they run, and even less over which thread is switched to after their time slice is up.
However, there are a few way you may influence context switches:
by increasing the priority of a certain thread you may force the scheduler to schedule it more often in the detriment of other threads (obviously the reverse applies as well - you can lower the priority of other threads)
you can code your process to place threads in kernel wait mode when they don't have work to do in order to help the scheduler do it's job. When using proper kernel wait constructs such as Critical Sections, Mutexes, Semaphores, and Timers you effectively tell the kernel a certain thread doesn't need to be scheduled until a certain codition is met.
Note: There is rarely a reason you should tamper with task priorities so USE WITH CAUTION
You might use 'fibers' instead of 'threads': for example there's a Win32 API named SwitchToFiber which lets you specify the fiber to be scheduled.
Take a look at UMS (User-mode scheduling) threads in Windows 7
http://msdn.microsoft.com/en-us/library/dd627187(VS.85).aspx
The second thread can simply wait for the yielding thread either by calling WaitForSingleObject() on its handle or periodically polling GetExitCodeThread(). The other answers are correct about altering the operating system's scheduling mechanisms - it is better to design the threads properly in the first place.
This is not possible. Only the kernel can decide what code runs next though you can influence it by reducing the non-waiting threads it has to choose from to run next, and by setting thread priorities with SetThreadPriority.
You can use regular synchronization primitives like events, semaphores, etc. to serialize your two threads. This does not in any form prevent the kernel from scheduling other threads in between, or in parallel on another CPU core, or virtually simultaneously on the same core. This is due to preemtive multitasking nature of modern general purpose operating systems.
If you want to do your own scheduling under Windows, you can use fibers, which essentially are threads that you have to schedule yourself. However, given that you describe yourself as a layman to the OS internals world, that would probably be a bad idea, as fibers are something of an advanced feature.
Can I ask why you want to use SwitchToThread?
If for example it's some form of because thread x is computing some value that you want to wait for on thread Y, then I'd really suggest looking at the Parallel Pattern Library or the Asynchronous Agents Library in Visual Studio 2010 which allows you to do this either with message blocks (receive on an asynchronous value) or simply via tasks : wait for a set of tasks to complete and inline their execution while waiting...
//i.e. on an arbitrary thread
task_group* tasks;
tasks->run(... / some functor/)
a call to tasks->wait() will wait and inline any tasks running.

Is it advantageous to use threads in windows?

Some of the fellows in the office think that when they've added threads to their code that windows will assign these threads to run on different processors of a multi-core or multi-processor machine. Then when this doesn't happen everything gets blamed on the existence of these threads colliding with one another on said multi-core or multi-processor machine.
Could someone debunk or confirm this notion?
When an application spawns multiple threads, it is indeed possible for them to get assigned to different processors. In fact, it is not uncommon for incorrect multi-threaded code to run ok on a single-processor machine but then display problems on a multi-processor machine. (This happens if the code is safe in the face of time slicing but broken in the face of true concurrency.)
You generally can only optimally have 1 thread per CPU, but unless your application has some explicit thread affinity to one processor then yes Windows will assign these threads to a free processor.
Windows will automatically execute multiple threads on different processors if the machine has multiple processors. If you are running on a single processor machine, the threads are time-sliced but when you move the process to a multiple processor machine, the process will automatically take advantage of the multiple processors.
Because the code is running simultaneously, the threads may be more likely to step on each others toes on a multi-core machine then on a single core machine since both threads could be writing to a shared location at the same time instead of this happening if the thread swap is timed just right.
Yes, Threads and multi-Threading has almost nothing to do with the number of cpus or cores in a machine...
EDIT ADDITION: To talk about "how many threads run on a cpu" is an oxymoron. Only one thread can ever run on a single CPU. Multi-Threading is about multiple threads in a PROCESS, not on a CPU. Before another thread can be run on any CPU, the thread currently on that CPU has to STOP running, and it's state must be preserved somewhere so that the OS can restart it when it get's it's next "turn".
Code runs in "Processes" which are logical abstractions that can run one or more sequences of code instructions, and manage computer resources independantly from other processes. Within a process, each separate sequence of code instructions is a "thread". Which cpu they run on is irrelevant. A single thread can run on a differnt cpu each time is is allocated a cpu to run on... and multiple threads, as they are each allocated cpu cycles, may, by coincidence, run on the same cpu (although obviously not simultaneously)
The OS (a component of the OS) is responsible for "running" threads. It keeps an in-memory list of all threads, and constantly "switches" (it's called a context switch) among them. It does this in a single CPU machine in almost exactly the same way as it does ion in a multiple-cpu machine. Even in a multiple Cpu machine, each time it "turns on" a thread, it might give it to a different CPU, or to the same cpu, as it did the last time.
There is no guarantee that threads of your process will be assigned to run on different CPUs.

Resources