Scheduling Priorities,windows - windows

Based on msdn ,windows os schedule threads based on base prorety and uses as a boost dynamic priorety
The system treats all threads with the same priority as equal. The
system assigns time slices in a round-robin fashion to all threads
with the highest priority. If none of these threads are ready to run,
the system assigns time slices in a round-robin fashion to all threads
with the next highest priority. If a higher-priority thread becomes
available to run, the system ceases to execute the lower-priority
thread (without allowing it to finish using its time slice), and
assigns a full time slice to the higher-priority thread.
From the above quote
The system treats all threads with the same priority as equal
Does it mean that the system treats threads based on dynamic priorety?And base priorety is used just as low limit for dynamic priorety change?
Thank you

Based on msdn ,windows os schedule threads based on base prorety and uses as a boost dynamic
priorety
Well, you follow that with a nice text snipped that has NO SIGN OF A BOOST DYNAMIC PRIORITY.
More information about that is in the documentation - for example http://msdn.microsoft.com/en-us/library/windows/desktop/ms684828(v=vs.85).aspx is a good start.
In simple words, the scheduler schedules threads based on their current priority, and boost priority changes that, so they get scheduled differently.

Related

Does the Windows scheduler sometimes fail to preempt a running thread immediately to let a higher-priority thread run?

My application operates on pairs of long vectors - say it adds them together to produce a vector result. Its rules state that it must completely finish with one pair before it can be given another. I would like to use multiple threads to speed things up. I am running Windows 10.
I created an OpenMP parallel for construct and divided the vector among all the threads of the team. All threads start, all threads run pretty fast, so the multithreading is effective.
But the speedup is slight, and the reason is that some of the time, one of the worker threads takes way longer than usual. I have instrumented the operation, and I see that sometimes the worker threads take a long time to start - delay varies from 20 microseconds on average to dozens of milliseconds depending on system load. The master thread does not show this delay.
That makes me think that the scheduler is taking some time to start the worker threads. The master thread is already running, so it doesn't have to wait to be started.
But here is the nub of the question: raising the priority of the process doesn't make any difference. I can raise it to high priority or even realtime priority, and I still see that startup of the worker threads is often delayed. It looks like the Windows scheduler is not fully preemptive, and sometimes lets a lower-priority thread run when a higher-priority one is eligible. Can anyone confirm this?
I have verified that the worker threads are created with the default OS priority, namely the base priority of the class of the master process. This should be higher that the priority of any running thread, I think. Or is it normal for there to be some thread with realtime priority that might be blocking my workers? I don't see one with Task Manager.
I guess one last possibility is that the task switch might take 20 usec. Is that plausible?
I have a 4-core system without hyperthreading.

How to control how many tasks to run per executor in PySpark [duplicate]

I don't quite understand spark.task.cpus parameter. It seems to me that a “task” corresponds to a “thread” or a "process", if you will, within the executor. Suppose that I set "spark.task.cpus" to 2.
How can a thread utilize two CPUs simultaneously? Couldn't it require locks and cause synchronization problems?
I'm looking at launchTask() function in deploy/executor/Executor.scala, and I don't see any notion of "number of cpus per task" here. So where/how does Spark eventually allocate more than one cpu to a task in the standalone mode?
To the best of my knowledge spark.task.cpus controls the parallelism of tasks in you cluster in the case where some particular tasks are known to have their own internal (custom) parallelism.
In more detail:
We know that spark.cores.max defines how many threads (aka cores) your application needs. If you leave spark.task.cpus = 1 then you will have #spark.cores.max number of concurrent Spark tasks running at the same time.
You will only want to change spark.task.cpus if you know that your tasks are themselves parallelized (maybe each of your task spawns two threads, interacts with external tools, etc.) By setting spark.task.cpus accordingly, you become a good "citizen". Now if you have spark.cores.max=10 and spark.task.cpus=2 Spark will only create 10/2=5 concurrent tasks. Given that your tasks need (say) 2 threads internally the total number of executing threads will never be more than 10. This means that you never go above your initial contract (defined by spark.cores.max).

What's the best Task scheduling algorithm for some given tasks?

We have a list of tasks with different length, a number of cpu cores and a Context Switch time.
We want to find the best scheduling of tasks among the cores to maximize processor utilization.
How could we find this?
Isn't it like if we choose the biggest available tasks from the list and give them one by one to the current ready cores, it's going to be best or you think we must try all orders to find out which is the best?
I must add that all cores are ready at the time unit 0 and the tasks are supposed to work concurrently.
The idea here is that there's no silver bullet, for what you must consider what are the types of tasks being executed, and try to schedule them as nicely as possible.
CPU-bound tasks don't use much communication (I/O), and thus, need to be continuously executed, and interrupted only when necessary -- according to the policy being used;
I/O-bound tasks may be continuously put aside in the execution, allowing other processes to work, since it will be sleeping for many periods, waiting for data to be retrieved to primary memory;
interative tasks must be continuously executed, but needs not to be executed without interruptions, as it will generate interruptions, waiting for user inputs, but it needs to have a high priority, in order not to let the user notice delays in the execution.
Considering this, and the context switch costs, you must evaluate what types of tasks you have, choosing, thus, one or more policies for your scheduler.
Edit:
I thought this was a simply conceptual question. Considering you have to implement a solution, you must analyze the requirements.
Since you have the length of the tasks, and the context switch times, and you have to maintain the cores busy, this becomes an optimization problem, where you must keep the minimal number of cores idle when it reaches the end of the processes, but you need to maintain the minimum number of context switches, so that your overall execution time does not grow too much.
As pointed by svick, this sounds like a partition problem, which is NP-complete, and in which you need to divide a sequence of numbers into a given number of lists, so that the sum of each list is equal to each other.
In your problem you'd have a relaxation on the objective, so that you no longer need all the cores to execute the same amount of time, but you want the difference between any two cores execution time to be as small as possible.
In the reference given by svick, you can see a dynamic programming approach that you may be able to map onto your problem.

Setting a single thread to real-time priority under windows

is it possible to set the priority of a single thread to real-time priority although the process does not run under real-time priority?
The windows scheduling priorities are describing the situation clearly.
There is no thread priority called "real-time priority". The utmost priority a
thread can obtain is THREAD_PRIORITY_TIME_CRITICAL. But this does not completely
describe the threads base priority. This is also determined by the processes priority class.
For almost all process priority classes the setting of THREAD_PRIORITY_TIME_CRITICAL will
cause the thread to run in base priority of 15. One exception is the REALTIME_PRIORITY_CLASS.
A thread set to THREAD_PRIORITY_TIME_CRITICAL within a process which is set to REALTIME_PRIORITY_CLASS runs at a base priority of 31, which is the highest priority to obtain.
The combination of NORMAL_PRIORITY_CLASS and THREAD_PRIORITY_TIME_CRITICAL puts a thread already well above most other threads (also OS) but I wouldn't call this real-time.
For my point of view (and as described in the "scheduling priorities list") real-time only starts at base priorities above 15. Therefore a thread can only obtain real-time priorties, when its process has REALTIME_PRIORITY_CLASS.
This means the correct answer to your question is NO. A thread cannot obtain real-priority when its process has no real-time priority.
It is possible using SetThreadPriority but it's not recommended.

Scenarios for Thread Ordering Service

Reading up the new Vista/Win2008 features, I wonder what is the point of the Thread Ordering Service. In other words, in which scenario the "classic" scheduler's "fair to all" policy is not sufficient, and a definite order of threads is preferrable?
To clarify. What would be a concrete application that would benefit from it?
Thanks for your answers, though.
The Thread Ordering Service does not apply to all threads, but only to those that are registered to it. You must make your program use the functionality.
The Service ensures that threads are executed in a desirable (configurable) order. That cannot be guaranteed by a "fair for all" scheduler. If your threads have no preferred execution order, the service probably does not provide extra value to you.
The Thread Ordering Service provides cooperative multi-threading in a pre-emptive multi-threading world. When you create the group you specify the maximum time slice that can be used by a thread in the group (period + timeout), and how often to run the group (period).
Your threads will then be run at most once per period, and will get an error if they exceed their maximum time slice.
I imagine this works quite well in scenarios where there's a hard response time limit.

Resources