ksoftirqd's bottom-halves in interrupt or process context?

ksoftirqd's bottom-halves in interrupt or process context? - linux-kernel

I have been meticulously reading the book named Linux Kernel Development written by Robert Love.
In my understanding, softirqs and tasklets are run in the interrupt context. Also, ksoftirqd is a kernel thread which runs in the process context. So, I find it quite puzzling and difficult to think how ksoftirqd (process context) is employed in order to run softirqs (interrupt context).

I had similar question on my mind while reading the book, here is a link which should clarify some things: refer to this papar
"ksoftirqd is implemented as a set of threads, each of which is
constrained to only run on a specific CPU. They are scheduled (at a
very high priority) by the normal task scheduler. This implementation
has the advantage that the time spent executing the bottom halves is
accounted to a system task. It is thus possible for the user to see
that the machine is overloaded with interrupt processing, and maybe
take remedial action.
Although the work is now being done in process context rather than
bottom half context, ksoftirqd sets up an environment identical to
that found in bottom half context. Specifically, it executes the
softirq handlers with local interrupts enabled and bottom halves
disabled locally. Code which runs as a bottom half does not need to
change for ksoftirqd to run it."

Related

Go goroutine under the hood

I'm trying to understand golang architecture and what "lightweight thread" means. I've already read something, but want to ask question to clarify it.
Am I right if I'll say what "go" keyword under the hood just puts following function in queue of inner thread pool, but for user it looks like creation of thread?

This is copied from the Go FAQ:
Why goroutines instead of threads?
Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
What's lacking here is the definition of thread. If we resort to Wikipedia, we find:
In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, ...
but that's just a description of, well, the same thing that a goroutine is. The problem here is that the word thread tends to refer to kernel thread and/or user thread (both defined on that same Wikipedia page) and these threads are heavier-weight than the goroutine threads. Which brings us right back to this:
I'm trying to understand golang architecture and what "lightweight thread" means ...
To cut to the chase, this means "lighter than the OS-provided ones". That's really all it means. There are OS-provided threads (on multiple OSes on which Go runs), but they generally do too much and cost too much to switch between so Go provides its own language-level ones that it calls "goroutines" that are much lighter.
From comments:
Why need to move tasks from one thread to another by some planner ...
This is an implementation detail, which involves another aspect of the OS-provided kernel threads:
I can't understand how [a goroutine] can be preempted if single thread process [is] blocked by [a] system call to read [a] long file
The current Go runtime goroutine / thread / processor scheduler (see What is relationship between goroutine and thread in kernel and user state and note that there have been more than just the current implementation) predicts that some system call will block, and makes sure to assign that system call its own OS-level kernel thread (see also JimB's comment). These threads do not count against the GOMAXPROCS setting. This is in fact sometimes a problem, as it's possible for the Go runtime to try to spin off more threads than the OS allows: it might be nice if there were a system-call-thread-pool here (though there are also obvious problems with this).
So, the current runtime creates up to GOMAXPROCS kernel-style OS-level threads and uses those to multiplex up to that many goroutines onto the CPUs, but creates extra kernel-style OS-level threads whenever it wants to. As the blog post linked in the question above notes, the P entities act as queues to hold goroutines (Gs) on a per-processor basis for localized cache lookup (remember that on some systems, especially NUMA ones, it's expensive to reach out "across" CPUs: the scheduler is still willing to do this, but won't do it too often, for some definition of "too often").
Earlier versions of the current scheduler required explicit yields (runtime.Gosched()) calls or various other runtime operations to cause a switch from the current goroutine to some other goroutine. See What exactly does runtime.Gosched do? for example. In Go 1.14, some OSes provide automatic goroutine preemption; see Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

The Goroutines and the scheduled

i didn't understand this sentence plz explain me in detail and use an easy english to do that
Go routines are cooperatively scheduled, rather than relying on the kernel to manage their time sharing.

Disclaimer: this is a rough and inaccurate description of the scheduling in the kernel and in the go runtime aimed at explaining the concepts, not at being an exact or detailed explanation of the real system.
As you may (or not know), a CPU can't actually run two programs at the same time: a CPU only have one execution thread, which can execute one instruction at a time. The direct consequence on early systems was that you couldn't run two programs at the same time, each program needing (system-wise) a dedicated thread.
The solution currently adopted is called pseudo-parallelism: given a number of logical threads (e.g multiple programs), the system will execute one of the logical threads during a certain amount of time then switch to the next one. Using really small amounts of time (in the order of milliseconds), you give the human user the illusion of parallelism. This operation is called scheduling.
The Go language doesn't use this system directly: it itself implement a scheduler that run on top of the system scheduler, and schedule the execution of the goroutines itself, bypassing the performance cost of using a real thread for each routine. This type of system is called light/green thread.

Could process running only on one processor have threads running on other processors?

Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
(Question came from discussion started in one company, regarding using synchronization objects (Events, Mutexes, Semaphores) and WinAPIs, like WaitForSignleObject, etc, especially SignalObjectAndWait for which MSDN states
"Note that the "signal" and "wait" are not guaranteed to be performed
as an atomic operation. Threads executing on other processors can
observe the signaled state of the first object before the thread
calling SignalObjectAndWait begins its wait on the second object"
Does it mean that for single processor it's guaranteed to be atomic?
P.S. Is there any differences for Windows Context Switching that there are multiple processors or single processor with more real cores?
P.P.S. Please be patient with this question if I didn't use exact and concrete terms - this are is still not very good known for me.

No.
The set of processor cores a thread can run on is the intersection of the process affinity mask and the thread affinity mask.
To get the behavior you describe, one would set the thread affinity mask for the main thread, and not mess with the process mask.
For your followup questions: If it isn't atomic, it isn't atomic. There are additional guarantees for threads sharing a core, because preemption follows certain rules, but they are very complex, since relative priority and dynamic priority are important factors in thread scheduling. Because of the complexity, it is best to use proper synchronization.
Notably, race conditions between threads of equal priority certainly still exist on a single core (or single core restricted) system, but they are far less frequent and therefore far more difficult to find and debug.

Is it possible, in multiprocessor environment (PC) that one windows process is configured to run only on one processor (affinity mask = 1 or SetProcessAffinityMask(GetCurrentProcess(),1)), but its thread are spawned on other processors?
If not set cpu affinity to only one core, one process could run on multiple cores?
What's the difference between processes and threads?
Thread could have processes or process could have threads?
Could process seen from a thread point of view or vice verse?
What is atomic notion?
when number 1 could seen as multidimensional unit?
Could we divide 1/0 (to zero)? When could we or couldn't?
Does it mean that for single processor it's guaranteed to be atomic?
One cpu: do you remember: run and stay resident? Good old time!
Then Unix: multiprocessing, multithreading, etc. :)
Note:
You couldn't ask a question without knowing answer to that question.
Try to ask something you don't know, that's impossible! You're asking because you have an answer. Look inside your question. Answer is evident. :)

WIN32: Yielding execution to another (given) thread

I am looking for a way to yield the remainder of the thread execution's scheduled time slice to a different thread. There is a SwitchToThread function in WINAPI, but it doesn't let the caller specify the thread it wants to switch to. I browsed MSDN for quite some time and haven't found anything that would offer just that.
For an operating-system-internals layman like me, it seems that yielding thread should be able to specify which thread does it want to pass the execution to. Is it possible or is it just my imagination?

The reason you can't yield processor time-slices to a designated thread is that Windows features a preemptive scheduling kernel which pretty much places the responsibility and authority of scheduling the processor time in the hands of the kernel and only the kernel.
As such threads don't have any control over when they run, if they run, and even less over which thread is switched to after their time slice is up.
However, there are a few way you may influence context switches:
by increasing the priority of a certain thread you may force the scheduler to schedule it more often in the detriment of other threads (obviously the reverse applies as well - you can lower the priority of other threads)
you can code your process to place threads in kernel wait mode when they don't have work to do in order to help the scheduler do it's job. When using proper kernel wait constructs such as Critical Sections, Mutexes, Semaphores, and Timers you effectively tell the kernel a certain thread doesn't need to be scheduled until a certain codition is met.
Note: There is rarely a reason you should tamper with task priorities so USE WITH CAUTION

You might use 'fibers' instead of 'threads': for example there's a Win32 API named SwitchToFiber which lets you specify the fiber to be scheduled.

Take a look at UMS (User-mode scheduling) threads in Windows 7
http://msdn.microsoft.com/en-us/library/dd627187(VS.85).aspx

The second thread can simply wait for the yielding thread either by calling WaitForSingleObject() on its handle or periodically polling GetExitCodeThread(). The other answers are correct about altering the operating system's scheduling mechanisms - it is better to design the threads properly in the first place.

This is not possible. Only the kernel can decide what code runs next though you can influence it by reducing the non-waiting threads it has to choose from to run next, and by setting thread priorities with SetThreadPriority.

You can use regular synchronization primitives like events, semaphores, etc. to serialize your two threads. This does not in any form prevent the kernel from scheduling other threads in between, or in parallel on another CPU core, or virtually simultaneously on the same core. This is due to preemtive multitasking nature of modern general purpose operating systems.

If you want to do your own scheduling under Windows, you can use fibers, which essentially are threads that you have to schedule yourself. However, given that you describe yourself as a layman to the OS internals world, that would probably be a bad idea, as fibers are something of an advanced feature.

Can I ask why you want to use SwitchToThread?
If for example it's some form of because thread x is computing some value that you want to wait for on thread Y, then I'd really suggest looking at the Parallel Pattern Library or the Asynchronous Agents Library in Visual Studio 2010 which allows you to do this either with message blocks (receive on an asynchronous value) or simply via tasks : wait for a set of tasks to complete and inline their execution while waiting...
//i.e. on an arbitrary thread
task_group* tasks;
tasks->run(... / some functor/)
a call to tasks->wait() will wait and inline any tasks running.

tasklet advantage in userspace application

Got some doubts with bottom half.Here, I consider tasklets only.
Also , I consider non-preemptible kernel only.
Suppose consider a ethernet driver in which rx interrupt processing is doing some 10 functions calls.(bad programming :) )
Now, looking at performance perspective if 9 function calls can be moved to a tasklet and only 1 needs to be called in interrupt handling , Can I really get some good performance in a tcp read application.
Or in other words, when there is switch to user space application all the 9 function calls for the tasklets scheduled will be called, in effective the user space application will be able to get the packet cum data only after "all the taskets scheduled" are completed ? correct?
I understand that by having bottom half , we are enabling all interrupts .. but I have a doubt whether the application that relies on the interrupt actually gain anything by having the entire 10 functions in interrupt handler itself or in the bottom half.
In Short, by having tasklet do I gain performance improvement in user space application ,here ?

Since tasklets are not queued but scheduled, i.e. several hardware interrupts posting the same tasklet might result in a single tasklet function invocation, you would be able to save up to 90% of the processing in extreme cases.
On the other hand there's already a high-priority soft IRQ for net-rx.

In my experience on fast machines, moving work from the handler to the tasklet does not make the machine run faster. I've added macros in the handler that can turn my schedule_tasklet() call into a call to the tasklet function itself, and it's easy to benchmark both ways and see the difference.
But it's important that interrupt handlers finish quickly. As Nikolai mentioned, you might benefit if your device likes to interrupt a lot, but most high-bandwidth devices have interrupt mitigation hardware that makes this a less serious problem.
Using tasklets is the way that core kernel people are going to do things, so all else being equal, it's probably best to follow their lead, especially if you ever want to see your driver accepted into the mainline kernel.
I would also note that calling lots of functions isn't necessarily bad practice; modern branch predictors can make branch-heavy code run just as fast as non-branch-heavy code. Far more important in my opinion are the potential cache effects of having to do half the job now, and then half the job later.

A tasklet does not run in context of the user process. If your ISR schedules a tasklet, it will run immediately after your isr is done, but with interrupts enabled. The benefit of this is that your packet processing is not preventing additional interrupts.
In your TCP example, the hardware hands off the packet to the network stack and your driver is done -- the net stack handles waking up the process etc. so there really no way for the hw's driver to execute in the process context of the data's recipient, because the hw doesn't even know who that is.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio