Are there any good explanations of kernel schedulers?

Are there any good explanations of kernel schedulers? - linux-kernel

I've recently begun wondering about kernel schedulers and whatnot. Is there any resource that provides an overview of commonly used kernel scheduling algorithm? The CFS scheduler has a lot of literature on its implementation, but I can't seem to find much along the lines of the queuing theory behind the algorithm.

Linux Kernel Scheduler Resources:
Inside the Linux scheduler
A short history of Linux schedulers
Completely Fair Scheduler (since 2.6.23)
Multiprocessing with the Completely Fair Scheduler
Real-Time Linux Kernel Scheduler
O(1) Scheduler (prior 2.6.23)
Linux Scheduler simulation

This set of docs have been the most helpful for me by far although they don't talk about the queuing theory behind the algorithms.

Related

Do kernel components like the scheduler execute on their own dedicated CPU/Core or do they share?

Can someone explain how the Windows scheduler executes its code. Does it work from a dedicated CPU or does it share with all other kernel mode/user mode processes ? I have read somewhere that modern day processors offer architectural extensions providing for several banks of registers that can be swapped in hardware see this post.

They share. A typical system spends so little of its time running kernel code that dedicating an entire core to it would be an absurd waste, and the scheduler itself is a tiny fraction even of that. And in cases where it does need to run a lot of kernel code, that's exactly when you want that work shared among as many cores as possible.
I'm not sure about Windows specifically, but a common OS design is that every core executes the scheduler when it's time to decide which task that core should execute next.

How to test scheduling algorithms for parallel realtime task-models on multiprocessor

I do research on scheduling of parallel real-time tasks. I have some theories and need a possibility to test and play around with them. I found some simulators but the most are outdatet already or have a spare documentation. Does someone know any simulator or framework i can use for my use-case.

I know that with starpu task based runtime one can implement its own scheduler and that execution can be simulated with simgrid, is it what you are looking for ?

Is there a FreeRTOS howto for Cortex M7 about how to supervise/trace a system with few tasks (what features of kernel to be used)

I'm slowly assembling the picture of how to use FreeRTOS in a real world application.
I've read a lot of partial features (stack supervision, memory, malloc etc...).
But haven't anywhere found a good instruction, what "supervision" to use to be able to follow the performance of tasks, system also after debugger is not connected anymore...
Can anyone help with some pointers, advices?
What features do you activate when a FreeRTOS app is designed?
How do you supervise, what is going on with tasks?
I'd rather read something short, to try feature by feature and see how it works. Something more for beginners. I understand, I have the documentation, but what I'm after is gradual introduction in FreeRTOS with examples. Maybe I overlooked a good info to read...
Let me illustrate it with few questions that I don't have the answers for:
Should I have a separate supervision task, that gathers the info about other tasks (state,memory,..) ?
What features should be used to supervise FreeRTOS based app in an "professional" way?
Should I use ITM/SWO, or maybe RTT?
Do you leave serial console on the system to supervise it?
Thanks in advance,
regards.

I'm slowly assembling the picture of how to use FreeRTOS in a real
world application. I've read a lot of partial features (stack
supervision, memory, malloc etc...). [...]
Can anyone help with some pointers, advices?
On the freeRTOS website, you find a lot of documentation for introduction as well as to understand detail features in depth.
I'd rather read something short, to try feature by feature and see how
it works. Something more for beginners. I understand, I have the
documenation, but what I'm after is gradual introduction in FreeRTOS
with examples. Maybe I overlooked a good info to read...
There is also a lot of third-party documentation. You may want to read general literature about RTOSes and how to use them: First, because many of them refer to one of the most well-known OSS implementation - freeRTOS. Second, because when working with RTOS, one has to take care of virtually the same aspects independent from which RTOS implementation is used.
How do you supervise, what is going on with tasks?
This depends on the purpose of supervision:
If the system that runs the RTOS is critical in some meaning
(e. g., it implements functional safety or
security requirements),
you'll probably need certain supervision measures at runtime that depend on the type and level of criticality.
Violating the expectations of such supervision usually triggers the system to switch off and fall into some kind of safe/secure operation mode.
More usually, you need supervision to debug or trace the application during development and testing to gain insights why certain errors appear in system behaviour, or how long the tasks/ISRs in the system need to execute and how long they are suppressing other contexts in doing so.
This will often allow you to attach a debug/trace adapter to the system all the time.
Violating the expectations here means guiding the developer to a remaining error in the system under development/test.
For many kinds of applications, you may have to measure (and log) the task timings over larger periods in order to get reliable statistics under controlled laboratory (or real-life) conditions.
Then you usually cannot keep a debug/trace adapter dongle at the embedded system because this would disturb the procedures under test. So, a logging concept/implementation is needed.
You have to evaluate the purpose of supervision. Then you can look up this board and others for more specific help and re-post further questions you may have.
But haven't anywhere found a good instruction, what "supervision" to use to be able to follow the performance of tasks, system also after debugger is not connected anymore...
What features do you activate when a FreeRTOS app is designed?
All your application requires (see above). One by one!
Let me illustrate with few questions, that I don't have the answers
for:
Should I have a separate supervision task, that gathers the info about other tasks (state,memory,..)?
What features should be used to supervise FreeRTOS based app in an "professional" way ?
Should I use ITM/SWO, or maybe RTT?
Do you leave serial console on the system to supervise it?
This all depends on the answers you find about the purpose of supervision.
The professional way to deal with it is a top-down approach to focus on the system requirements (and development needs), and to design/implement everything that is necessary to fulfill them.
If you are looking for a way to get a first insight how to activate ITM/SWO trace of freeRTOS for educational purposes, I can recommend the beautiful tutorial in the Atollic blog, a beginners' intro spread over several free articles, step-by-step.
For RTOS architecture hints, you may also like youtube introductions like the channel of beningo engineering, for example.

OpenMPI custom fault tolerance for lowly coupled parallel processes

I do computations on the Amazon EC3 platform, using multiple machines which are connected through OpenMPI. To reduce the cost of the computation, spot instances are used, which are automatically shut down when the cost of a machine goes above a maximum preset price: : http://aws.amazon.com/ec2/spot-instances/ . A weird behaviour occurs: when a machine is shut down, the other processes in the MPI communicator still continue to run. I think that the network interfaces are silenced before the process has the time to indicate to the other processes that it has received a kill signal.
I have read in multiple posts that MPI does not provide a lot of high-level resources regarding fault-tolerance. On the other hand, the structure of my program is very simple: a master process is queried by slave processes, for the permission to execute a portion of code. The master process only keeps track of the number of queries it has replied to, and tell the slave to stop when an upper limit is reached. There is no coupling between the slaves.
I would like to be able to detect when a process silently died as mentioned previously. In that case I would re-attribute the work he was doing to a slave that is still alive. Is there a simple way to check whether a died ? I have thought of using threads and sockets to do that independently of the rest of the MPI layer, but that seem cumbersome. I also though of maintaining on the master process (which is launched on a non spot instance) a list of the time of last communication with each process, and specify a timeout, but that would not guarantee me that a slave process is dead. There is also the problem that "barrier" and "finalize functions will not see all the processes, and potentially hang.
My question would then be what kind of solution would you implement to detect if processes are silently dead ? And how would you modify the remainder of the code to be compatible with a reduced number of processes ?

Which version of Open MPI are you using?
I'm not sure exactly what Open MPI might be doing (or not doing) that wouldn't detect that a process is gone. The usual behavior of Open MPI after a failure is that the runtime would abort the entire job.
Unfortunately, there is no mechanism in Open MPI for discovering failed processes (especially in the case where it sounds like Open MPI doesn't even know they're failed). However, there is a lot of work ongoing to add this to future versions of all MPI libraries. One of the example implementations that supports this behavior is a branch of Open MPI called ULFM (www.fault-tolerance.org). There's lots of documentation there to see exactly what's going on, but essentially, it's a new chapter in the MPI standard to add fault tolerance.
There is an older effort that's available in MPICH 3.0.3 (unfortunately, it's broken in 3.0.4, but it should be back for 3.1) (www.mpich.org). The documentation for using that work is in the README.
The problem with both of these efforts is that they aren't compliant with the MPI Standard. Eventually, there will be a chapter describing fault tolerance in MPI and all of the MPI implementations will become compatible, but in the meantime, there is no good solution for everyone.

PVM might be a reasonable alternative to MPI in your case. While no longer developed after it lost to MPI years ago, PVM still comes pre-packaged with most Linux distributions and provides built-in fault tolerance. It's API is conceptually very similar to that of MPI, but its execution model differs a bit. One could say that it allows for one degree less coupling between the tasks in the parallel program than MPI does.
There is an example implementation of a fault-tolerant master-worker PVM application in Beowulf Cluster Computing with Linux. Read the relevant chapter from the book here.
As for fault tolerance in MPI, the proposed addition to the standard was rejected when the MPI Forum voted for inclusion of new features in MPI-3.0. It might take much longer than anticipated before FT becomes a standard feature of MPI.

Condor, Sun Grid Engine, or something else?

I'm trying to work out whether we should try out Condor or Sun Grid Engine at work (or possibly something else).
We often have lots of unused WinXp workstations. The hope is that we could use wake-on-LAN, run all our jobs, and then shut down automatically. We'd mainly be running Matlab, Java or Python simulations for either monte-carlo or parameter explorations.
With my limited knowledge of Condor, it sounds like using a the vm universe might be a convenient way of taking care of snapshots without having to modify existing code.
Is SGE or something else better than condor for this kind of work?

SGE doesn't really support windows. It comes with all kinds of caveats and missing bits on Windows.
I've been running Condor pools for many years now and it is a superb HTPC setup for both cycle-stealing and dedicated, always-on hardware, on Linux and Windows machines. The recent addition of their Rooster daemon lets you put machines to sleep between job cycles and wake them up when new work appears in the pool. They also have an active and very helpful support community. Checkpointing is the only Condor feature not available on Windows. Everything else is there. With the addition of the VM Universe, checkpointing is getting less and less useful. Really: to use checkpointing successfully you need to be able to relink your entire code stack. So if you're running Matlab jobs, even on Linux, checkpointing isn't going to be possible.
If you have specific questions about getting Condor running on Windows I'd be happy to answer them, share my experiences with it. I run Condor across 4 pools around the globe with a total of about 1500 dedicated machines in all the pools and some 1000 or so additional desktop machines that are available as users care to donate them.

I'd start with Condor. It has good support for Windows, and newer versions have built-in support for sending wake-on-lan in a very configurable way when jobs can run on certain machines. It can also shut the machines down based on user-defined policies.

After Oracle's takeover of SGE (Sun Grid Engine), there is the Open Grid Scheduler project that still offers open-source Grid Engine.
http://gridscheduler.sourceforge.net/

For dedicated hardware I'd go with Grid Engine.
For scavenging clock cycles on machines which may be in use I'd go with Condor.
For hardware which you have dedicated access to for fixed periods, such as overnight and at weekends, I'd probably still go with Condor but might be able to persuade myself to use Grid Engine.

I've had to choose between condor and SGE for a customer project recently. I was favoring SGE (because I was more familiar with that environment), but Condor won finally because:
the customer infrastructure is Windows oriented, and the SGE solution requires a Unix or Linux machine for the Central Manager, + installing MS Services for Unix on the computation hosts
support and installation process of Condor on Windows was much simpler.
However, you cannot use the most interesting features of Condor on Windows : checkpointing is not available, nor the Condor specific IOs. I'm not using the VM universe, so I cannot comment on that aspect.

I've only tried Condor, and it was a pain to attempt to set up. If you need all the clock cycles you can fully utiilize, go with Condor.
I'm about to try SGE, and I'll tell you how it goes. However at my company, people have had experience setting up SGE, so I'll probably say SGE is easier.

SGE doesn't exist... it's OGE, and it's very expensive. Go with Condor.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio