How does Linux kernel handle threads calling sleep or pause? - linux-kernel

According to Linux man page, the thread calling sleep or pause can be woken by a not ignored signal. And I learned that there is a so called "sleep queue" in Linux.
Does that mean the kernel has to check each thread in the "sleep queue" to see whether there is a pending signal that can be handled?
I know a process can be stopped by SIGSTOP. Is a stopped thread also in the "sleep queue"?
When there are multi threads in a process. Mix signals and sleep may cause great trouble?

Related

Killing a process that is hanging on disk IO

I've got an SSD that is failing. Some of its data can't be read anymore.
I would like to know which files are affected.
I've created some small program that uses regular functions (CreateFile, ReadFile) to read files.
The program has some watchdog thread that monitors the thread that issues the IO functions. If they take too long, the thread marks somewhere the file is damaged and tries to kill the IO thread and the process.
My issue is using TerminateThread and TerminateProcess does not kill the thread/process. It hangs there, forever, until I log out.
Trying to kill using TaskManager also fails, of course (it used to use NtTerminateProcess, I don't know what it does nowadays).
Does anyone know a way that would kill my process?
According to the Doc: TerminateProcess function
This function stops execution of all threads within the process and
requests cancellation of all pending I/O. The terminated process
cannot exit until all pending I/O has been completed or canceled. When
a process terminates, its kernel object is not destroyed until all
processes that have open handles to the process have released those
handles.
When a process terminates itself, TerminateProcess stops execution of
the calling thread and does not return. Otherwise, TerminateProcess is
asynchronous; it initiates termination and returns immediately. If you
need to be sure the process has terminated, call the
WaitForSingleObject function with a handle to the process.
I suggest you could try to use Job Objects.

If I use kill -9 for the SpringBoot project, what will happen to its asyn threads

If I use kill -9 for the SpringBoot project, what will happen to its async(#Async) threads?
and what about kill -15?
Will it wait for the async task to terminate?
If it is my custom thread(pool), what will happen to those threads?
Where is the source code implementation?
First of all, all threads exist in the same address space as the process in which they were spawned. So even your async threads will be killed (they share the same pid). More info here.
The difference between kill -9 and -15 is given in good detail here.

How does os kernel keep track of the locks that user threads are waiting on?

I know that when a user thread acquires for a lock(like event, semaphore and so on), the kernel will change the thread's state to waiting so the thread will not be scheduled to run until the kernel finds that the lock is available.
My question is how does the kernel captures the state of these locks? By polling or notifying?
By notifying. Before the thread goes to sleep, it adds itself to the wakeup list for whatever kernel object corresponds to the thing it's waiting for.
This works precisely the same way all other waits work. Say, for example, the process does a blocking read on a file and the process has to sleep until the read completes. Or say the process accesses some code that hasn't been read in from disk yet. In all of these cases, the process is added to the appropriate wakeup notification scheme when it puts itself to sleep.
What you are asking is highly system specific and lock specific. For example, quality operating systems have lock management facilities that will detect deadlocks.
Some locks might be implemented as spin locks where there is no process hibernation and no operating system notification at all.
In the case where waiting suspends a process, all the operating system needs to keep track of is the lock itself. If a process releases the lock, the operating system can send a notification to all the waiting process—no poling necessary.

Restriction on interrupt routines in linux kernel drivers

Every device driver book talks about not using functions that sleep in interrupt routines.
What issues occur by calling these functions from ISRs ?
A total lockdown of the kernel is the issue here. The kernel is in interrupt context when executing interrupt handlers, that is, the interrupt handler is not associated with any process (the current macro cannot be used).
If you are able to sleep, you would never be able to get back to the interrupted code, since the scheduler would not know how to get back to it.
Holding a lock in the interrupt handler, and then sleeping, allowing another process to run and then entering the interrupt handler again and trying to re-acquire the lock would deadlock the kernel.
If you try to read more about how the scheduling in the kernel works, you will soon realize why sleeping is a no go in certain contexts.

Process under FreeBSD 9.0 hangs in uninterruptable sleep with apparently no syscall (empty wchan)

I have a custom logging process that is reading from STDIN and sending the data out via TCP to a scribed logging server.
STDIN is in my case an access log that is attached to Apache httpd 2.2 like this in httpd.conf:
CustomLog "|/usr/local/bin/serelog" default
My serelog process sometimes goes into uninterruptable sleep under FreeBSD 9.0 and does not return from it. It works reliably under other operating systems though, including FreeBSD 8, Linux 2.6 and Linux 3.1.
How can I find out what could be the reason for the uninterruptable sleep?
The overall structure is like this:
httpd --[PIPE]--> serelog --[TCP-CONNECTION]--> scribed
Until now I did the following analysis:
Using ps: stat is "D" and wchan is "-". So there is apparently no syscall, which doesn't
make too much sense to me, as the process in uninterruptable sleep and should be in kernel land.
As the process is in state "D", the process does not react to kill -9 as expected.
Attaching truss to serelog externally from a shell: As long as truss is attached, serelog runs smoothly.
Shortly (Seconds) after detaching truss from serelog, serelog goes into "D" state.
When attaching truss to serelog AFTER it has entered "D" state, truss prints nothing
In "D" state, lsof shows that the incoming PIPE is full. This is exected, as in "D" state the process "sleeps"
and cannot read any longer. The outgoing TCP-CONNECTION is empty.
If I kill the "surrounding" Apache httpd server, the serelog process eventually terminates after (e.g.) 40 minutes.
Checking what others report in forums about the uninterruptable problem was not successful: In my setup there is no NFS.
And as it is a server there is also no user interaction with CD drives or pluggable hardware.
So I am now stuck with a process that is uninterruptable, is apparently not in a syscall,
and works reliably when traced. The only good thing is that I am able to reproduce the behavior in a few
seconds or minutes when I send a lot of HTTP requests via JMeter loadtest (5 threads in JMeter).
Any tips on debugging, kernel parameter tuning are appreciated.
Greetings
The issue has proven to be an actual FreeBSD Kernel bug, and is now fixed in the Kernel.
Link to the PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=166340
Proposed Patch: http://lists.freebsd.org/pipermail/freebsd-bugs/2012-May/048610.html

Resources