Linux CFS Volunteer Context Switches SCHED_OTHER SCHED_FIFO

Linux CFS Volunteer Context Switches SCHED_OTHER SCHED_FIFO - linux-kernel

I'm doing some Linux CFS analysis for my OS class and have an observation that I cannot explain.
For two otherwise identical processes, when they are executed with a SCHED_OTHER policy, I am seeing about 50% more voluntary context switches than when I exeucute them with a SCHED_FIFO or a SCHED_RR policy.
This wouldn't surprise me a bit for involuntary switches, since SCHED_OTHER has a much lower priority, so it has to give up the CPU. But why would this be the case for voluntary switches. Why would SCHED_OTHER volunteer to give up the CPU more often than the real-time processes? It's an identical process, so it only volunteers to give up the CPU when it switches over to I/O, right? And I don't think that the choice of policy would affect the frequency of I/O attempts.
Any Linux people have an idea? Thanks!

First understand that the scheduling policies are nothing but the scheduling algorithms that are implemented in the kernel. So SCHED_FIFO, SCHED_RR, SCHED_OTHER are different algorithms in the kernel. SCHED_FIFO and SCHED_RR belong to the real time scheduling algorithms "class". SCHED_OTHER is nothing but the scheduling algorithm for normal processes in the system, more popularly known as the CFS(Completely Fair Scheduler) algorithm.
SCHED_OTHER has a much lower priority
To be precise it doesn't have "much" lower priority but has "a" lower priority than the Real time scheduling class. There are three scheduling classes in the Linux Scheduler - Real-Time scheduling class, Normal Process scheduling class and Idle Tasks scheduling class. The priority levels are as follows:
Real Time Scheduling Class.
Normal Task Scheduling Class.
Idle Tasks Scheduling Class.
Tasks on the system belong to one of these classes. (Note that at any point in time a task can belong to only one scheduling class, although its class can be changed). The scheduler in Linux first checks whether there is a task in the real time class. If any, then it invokes the SCHED_FIFO or SCHED_RR algorithm, depending on what is configured on the system. If there are no real time tasks, then the scheduler checks for the normal tasks and invokes the CFS algorithm depending on whether there is any normal task ready to run. Also
Coming to the main question, why do see more context switches when you run the same process in two different scheduling classes. There are two cases:
Generally on a simple system, there are hardly any real time tasks and most task belong to normal task class. Thus, when you run that process in real time class, you will have all the processor to this process exclusively(since the real time scheduling class has a higher priority than normal task scheduling class, and there are no(/very few real) time task(s) to share the CPU with). When you run the same process in the normal task class, the process has to share the processor with various other processes, thus leading to more context switches.
Even if there are many real time tasks in the system, the nature of the real time scheduling algorithms in question, FIFO and RR, lead to lower context switches. In FIFO, a processor is not switched to other task until the current one completes and in RR there is a fixed interval(time-quanta) that is given to the processes. When you look at CFS, the timeslice that a process gets is proportional to the number of tasks in runqueue of the processor. It is a function of its weight and the total weight of the processor runqueue. I assume you are well versed with FIFO and RR since you are taking OS classes. For more information on CFS I will advice you to google it or if you are brave enough then go through its source code.
Hope the answer is complete :)

Related

FCFS scheduling algorithm is non-preemptive, but what if any system/kernel process arrives?

The first come, first served scheduling algorithm is a non-preemptive algorithm, it means that if there's a process in running state, it cannot be preempted, till it completes. But if some kernel process arrives in between, will the CPU be allocated to that kernel process?
If yes, will this be the case with any higher priority process, irrespective of whether it is a system process or not?

As mrBen told in his answer, there is no notion of priority. It will still be treated as any such process waiting on the ready queue. Hence, this algorithm cannot just be used in Practical.
However, that being said, there are certain situations which makes FCFS a practical use. Consider the use case where the Process Scheduling algorithm uses Priority Scheduling and consider there are 2 processes having the same priority. In this situation to resolve the conflict, FCFS may be used.
In such cases, Kernel Process will always be having higher priority than the user processes. Within Kernel processes, the hardware interrupt has higher priority than the software interrupt since you cannot have a device waiting and starve it while executing signals.
Hope I answered your question!

Thread scheduling under non-uniform memory access times

The specifics are obviously OS dependent, but I'm looking for algorithms that are used to assign threads to physical cores for non-uniform memory access architectures (i.e. accessing different addresses takes different amounts of time. This could be, for instance, because the cache has been divided into physically distributed slices, each placed at a different location and therefore, each has a different access time based on the distance from the core).
Obviously, the scheduler also takes into account things like the number of threads already assigned to the processor among many other variables, but I'm specifically looking for scheduling algorithms that primarily try to minimize memory access time in NUMA architectures.

I can't say I am an expert on the topic - I am not - but so far no one else seems eager to answer, so I will give it my best shot.
It would make sense to assume that, on a NUMA system, it would be beneficial to keep running a thread on the same core as long as possible. This would essentially mean a weak form of processor affinity, where the scheduler decides on which core a thread should be run and may change it dynamically.
Basic scheduling with processor affinity is easy enough to implement: you just take an existing scheduling algorithm and modify it in such a way that each core has its own thread queue (or queues). On a NUMA system, the rest is a matter of determining when it is beneficial to migrate a thread onto another core; I don't think it is possible to give a generally applicable algorithm for that, because the benefits and costs are highly dependent on the specifics of the system in question.
Note that the kind of processor affinity the scheduler would need is weak and automatic: to which core a thread is pinned is entirely up to the scheduler and may change whenever the scheduler considers it beneficial. This is in sharp contrast to processor affinity in, for example, the Linux scheduler, where processor affinity is hard (a thread cannot be run on a core it doesn't have affinity with) and manually managed by the user (see sched_setaffinity and pthread_setaffinity_np).

The Goroutines and the scheduled

i didn't understand this sentence plz explain me in detail and use an easy english to do that
Go routines are cooperatively scheduled, rather than relying on the kernel to manage their time sharing.

Disclaimer: this is a rough and inaccurate description of the scheduling in the kernel and in the go runtime aimed at explaining the concepts, not at being an exact or detailed explanation of the real system.
As you may (or not know), a CPU can't actually run two programs at the same time: a CPU only have one execution thread, which can execute one instruction at a time. The direct consequence on early systems was that you couldn't run two programs at the same time, each program needing (system-wise) a dedicated thread.
The solution currently adopted is called pseudo-parallelism: given a number of logical threads (e.g multiple programs), the system will execute one of the logical threads during a certain amount of time then switch to the next one. Using really small amounts of time (in the order of milliseconds), you give the human user the illusion of parallelism. This operation is called scheduling.
The Go language doesn't use this system directly: it itself implement a scheduler that run on top of the system scheduler, and schedule the execution of the goroutines itself, bypassing the performance cost of using a real thread for each routine. This type of system is called light/green thread.

Looking for a comparison of different scheduling algorithms for a Finite State Machine

Are there any good resources (books, websites) that give very good comparison of different scheduling algorithms for a Finite State Machine (FSM) in an embedded system without an OS?
I am designing a simple embedded web server without an OS. I would like to know what are the various methods used to schedule the processing of the different events that occur in the system.
For example,if two events arrived at the same time how are the events prioritized? If I assign different priorities to events, how do I ensure that the higher priority event gets processed first? If an even higher priority event comes in while an event is being processed, how can make sure that that event is processed immediately?
I'm planning on using a FSM to check various conditions upon an event's arrival and then to properly schedule the event for processing. Because the embedded web server does not have an OS, I am considering using a cyclic executive approach. But I would like to see a comparison of the pros and cons of different algorithms that could be used in this approach.

If I knew what the question meant the answer would still probably be Miro Samek's Practical UML Statecharts in C/C++, Second Edition: Event-Driven Programming for Embedded Systems

You state: "I mean for example scheduling condion in like ,if two task arrived at the same time which task need to be prioritized and simillar other situations in embedded webserver."
Which I interpret as: "What is the set of rules used to determine which task gets executed first (scheduled) when multiple tasks arrive at the same time."
I used your terminology, "task" to illustrate the similarity. But Clifford is correct. The proper term should be "event" or "message".
And when you say "scheduling condition" I think you mean "set of rules that determines a schedule of events".
The definition of algorithm is: A process or set of rules to be followed in calculations or other problem-solving operations, esp. by a computer.
From a paper entitled Scheduling Algorithms:
Consider the central processing unit of a computer that must process a
sequence of jobs that arrive over time. In what order should the jobs
be processed in order to minimize, on average, the time that a job is
in the system from arrival to completion?
Which again, sounds like what you're calling "scheduling conditions".
I bring this up because using the right words to describe what you are looking for will help us (the SO community) give you better answers. And will help you as you research further on your own.
If my interpretation of your question still isn't what you have in mind, please let me know what, in particular, I've said is wrong and I will try again. Maybe some more examples would help me better understand.
Some further reading on scheduling (which is what you asked for):
A good starting point of course is the Wikipedia article on Scheduling Disciplines
A bit lower level than you are looking for but still full of detailed information on scheduling is Scheduling Algorithms for High-Level Synthesis (NOTE: for whatever reason the PDF has the pages in reverse order, so start at the bottom)
An example of a priority interrupt scheduler:
Take an architecture where Priority Level 0 is the highest. Two events come in simultaneously. One with Priority 2 and another with Priority 3. The scheduling algorithm starts processing the one with Priority 2 because it has a higher priority.
While the event with Priority 2 is being processed, another event with Priority 0 comes in. The scheduler interrupts the event with Priority 2 and processes the event with Priority 0.
When it's finished processing the Priority 0 event, it returns to processing the Priority 2 event. When it's finished processing the Priority 2 event, it processes the Priority 3 event.
Finally, when it's done with processing all of the priority interrupts, it returns control to the main processing task which handles events where priority doesn't matter.
An illustration:
In the above image, the "task" is the super loop which DipSwitch mentioned or the infinite loop in main() that occurs in a cyclic executive which you mentioned. The "events" are the various routines that are run in the super loop or interrupts as seen above if they require prioritization.
Terms to search for are Priority Interrupt and Control Flow. Some good reading material is the Toppers Kernel Spec (where I got the image from), the ARM Interrupt Architecture, and a paper on the 80196 Interrupt Architecture.
I mention the Toppers Kernel Spec just because that's where I got the image from. But at the heart of any real-time OS is it's scheduling algorithm and interrupt architecture.
The "on event" processing you ask about would be handled by the microprocessor/microcontroller interrupt subsystem. How you structure the priority levels and how you handle non-priority events is what makes up the totality of your scheduling algorithm.
An example of a cooperative scheduler:
typedef struct {
void (*task)(void); // Pointer to the task function.
uint32_t period; // Period to execute with.
uint32_t delay; // Delay before first call.
} task_type;
volatile uint32_t elapsed_ticks = 0;
task_type tasks[NUM_TASKS];
void Dispatch_Tasks(void)
{
Disable_Interrupts();
while (elapsed_ticks > 0) { // TRUE only if the ISR ran.
for (uint32_t i = 0; i < NUM_TASKS; i++) {
if (--tasks[i].delay == 0) {
tasks[i].delay = tasks[i].period;
Enable_Interrupts();
tasks[i].task(); // Execute the task!
Disable_Interrupts();
}
}
--elapsed_ticks;
}
Enable_Interrupts();
}
// Count the number of ticks yet to be processed.
void Timer_ISR(void)
{
++elapsed_ticks;
}
The above example was take from a blog post entitled "Simple Co-Operative Scheduling".
A cooperative scheduler is a combination of a super loop and a timer interrupt. From Section 2.4 in NON-BLOCKING HARDWARE CODING FOR EMBEDDED SYSTEMS:
A Cooperative scheduler is essentially a combination of the two
previously discussed schedulers. One timer is set to interrupt at a
regular interval, which will be the minimum time resolution for the
different tasks. Each task is then assigned a period that is a
multiple of the minimum resolution of the interrupt interval. A
function is then constantly called to update the interrupt count for
each task and run tasks that have reached their interrupt period. This
results in a scheduler that has the scalability of the Superloop with
the timing reliability of the Time Triggered scheduler. This is a
commonly used scheduler for sensor systems. However, this type of
scheduler is not without its limitations. It is still important that
the task calls in a cooperative scheduler are short. If one task
blocks longer than one timer interrupt period, a time-critical task
might be missed.
And for a more in depth analysis, here is a paper from the International Journal of Electrical & Computer Sciences.
Preemptive versus Cooperative:
A cooperative scheduler cannot handle asynchronous events without some sort of a preemption algorithm running on top of it. An example of this would be a multilevel queue architecture. Some discussion on this can be found in this paper on CPU Scheduling. There are, of course, pros and cons to each. A few of which are described in this short article on the RTKernel-32.
As for "any specific type preemptive scheduling scheduling process that can satisfy priority based task scheduling (like in the graph)", any priority based interrupt controller is inherently preemptive. If you schedule one task per interrupt, it will execute as shown in the graph.

Win32 Thread scheduling

As I understand, windows thread scheduler does not discriminate beween threads belonging two different processes, provided all of them have the same base priority. My question is if I have two applications one with only one thread and the other with say 50 threads all with same base priority, does it mean that the second process enjoys more CPU time then the first one?

Scheduling in Windows is at the thread granularity. The basic idea behind this approach is that processes don't run but only provide resources and a context in which their threads run. Coming back to your question, because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. In your example, if process A has 1 runnable thread and process B has 50 runnable threads, and all 51 threads are at the same priority, each thread would receive 1/51 of the CPU time—Windows wouldn't give 50 percent of the CPU to process A and 50 percent to process B.
To understand the thread-scheduling algorithms, you must first understand the priority levels that Windows uses. You can refer here for quick reference.
Try reading Windows Internals for in depth understanding.

All of the above are accurate but if you're worried about the 50 thread process hogging all the CPU, there ARE techniques you can do to ensure that no single process overwhelms the CPU.
IMHO the best way to do this is to use job objects to manage the usage of a process. First call CreateJobObject, then SetInformationJobObject to limit the max CPU usage of the processes in the job object and AssignProcessToJobObject to assign the process with 50 threads to the job object. You can then let the OS ensure that the 50 thread process doesn't consume too much CPU time.

The unit of scheduling is a thread, not a process, so a process with 50 threads, all in a tight loop, will get much more of the cpu than a process with only a single thread, provided all are running at the same priority. This is normally not a concern since most threads in the system are not in a runnable state and will not be up for scheduling; they are waiting on I/O, waiting for input from the user, and so on.
Windows Internals is a great book for learning more about the Windows thread scheduler.

That depends on the behavior of the threads. In general with a 50 : 1 difference in thread count, yes, the application with more threads is going to get a lot more time. However, windows also uses dynamic thread prioritization, which can change this somewhat. Dynamic thread prioritization is described here:
https://web.archive.org/web/20130312225716/http://support.microsoft.com/kb/109228
Relevant excerpt:
The base priority of a thread is the base level from which these upward adjustments are made. The current priority of a thread is called its dynamic priority. Interactive threads that yield before their time slice is up will tend to be adjusted upward in priority from their base priority. Compute-bound threads that do not yield, consuming their entire time slice, will tend to have their priority decreased, but not below the base level. This arrangement is often called heuristic scheduling. It provides better interactive performance and tends to lessen the system impact of "CPU hog" threads.

There is a local 'advanced' setting that purportedly can be used to shade scheduling slightly in favor of the app with focus. With the 'services' setting, there is no preference. In previous versions of Windows, this setting used to be somewhat more granular than just 'applications with focus'(slight preference to app with focus) and 'services' (all equal weigthing)
As this can be set by the user on the targe machine, it seems like it is asking for grief to depend on this setting...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio