How does the idle process and init process schedule? - linux-kernel

This code comes from the Linux kernel:
kernel/init/main.c
static noinline void __init_refok rest_init(void)
{
int pid;
rcu_scheduler_starting();
/*
* We need to spawn init first so that it obtains pid 1, however
* the init task will end up wanting to create kthreads, which, if
* we schedule it before we create kthreadd, will OOPS.
*/
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
numa_default_policy();
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
complete(&kthreadd_done);
/*
* The boot idle thread must execute schedule()
* at least once to get things moving:
*/
init_idle_bootup_task(current);
schedule_preempt_disabled();
/* Call into cpu_idle with preempt disabled */
cpu_startup_entry(CPUHP_ONLINE);
}
I know from the kernel start, there is a 0 process will init all the things when kernel boot, until this time, it runs the function: rest_init
Here: it will create the init process we call 1 process.
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
after it runs the function, there should be two process now 0 and 1.
Questions:
0 and 1 process are all at the same thread list in the same cpu(if there is 4 or 8 cpus platform) at this time? how does the two process dispatched?
if they are in a thread list in the same cpu, when 0 process call schedule_preempt_disabled function(), it means stop schedule. Then 0 process enter cpu_startup_entry() in a idle time , and which process will set the need_resched flag to make the idle(0) process to schedule? I mean the process 1 won't run again?
or you can tell me detailed the 0 and 1 process how to schedule at this time.

process 0 calls schedule_preempt_disabled to do the things below:
1, sched_preempt_enable_no_resched(); //enable preempt
2, schedule(); //schedule to other process(1-init or 2-kthreadd_task)
3, preempt_disable(); //when all processes give up cpu,
//scheduler pick the 0-idle to run again;
//0-idle disable preemt and run into cpu_idle_loop;

Related

Command mqreply.c timeout

We with my colleague built mqreply.sh from https://github.com/ibm-messaging/mq-rfhutil/tree/master/mqperf
But we don't suggest that command mqreply has timeout after which process with command is closed.
I attach our file with params for executing mqreply:
[header]
qname=DEV.QUEUE.1
qmgr=QM1
msgcount=10
msgtype=2
format="MQSTR"
codepage=1208
persist=0
replyq=DEV.QUEUE.2
sleeptime=1000
maxWaitTime=5
maxtime=60
waitTime=60
replyFilename=/tmp/msqtoload.dat
I try to set maxWaitTime and maxtime, waitTime, but it doesn't affect timeout for life of process.
Can you say how can I let mqreply doesn't close or maybe increase timeout?
Thank you
The while loop around the MQGET in the mqreply sample you link to does this:-
while ((compcode == MQCC_OK) && (0 == terminate) && ((0 == parms.totcount) || (msgsRead < parms.totcount)))
{
Also, the MQGET will only wait for 1 seconds. There is a comment thus:-
/* since we have a signal handler installed, we do not want to be in an MQGET for a long time */
This suggests that if you want to keep mqreply open and running for longer, you need to specify msgcount as a number bigger than 10.

Can you compare values across probes in a multi-CPU safe way in DTrace?

I'm trying to write a DTrace script which does the following:
Whenever a new thread is started, increment a count.
Whenever one of these threads exits, decrement the count, and exit the script if the count is now zero.
I have something like this:
BEGIN {
threads_alive = 0;
}
proc:::lwp-start /execname == $$1/ {
self->started = timestamp;
threads_alive += 1;
}
proc:::lwp-exit /self->started/ {
threads_alive -= 1;
if (threads_alive == 0) {
exit(0);
}
}
However, this doesn't work, because threads_alive is a scalar variable and thus it is not multi-cpu safe. As a result, multiple threads will overwrite each other's changes to the variable.
I have also tried using an aggregate variable instead:
#thread_count = sum(1)
//or
#threads_entered = count();
#threads_exitted = count();
Unfortunately, I haven't found syntax to be able to do something like #thread_count == 0 or #threads_started == #threads_stopped.
DTrace doesn't have facilities for doing the kind of thread-safe data sharing you're proposing, but you have a few options depending on precisely what you're trying to do.
If the executable name is unique, you can use the proc:::start and proc:::exit probes for the start of the first thread and the exit of the last thread respectively:
proc:::start
/execname == $$1/
{
my_pid = pid;
}
proc:::exit
/pid == my_pid/
{
exit(0);
}
If you're using the -c option to dtrace, the BEGIN probe fires very shortly after the corresponding proc:::start. Internally, dtrace -c starts the specified forks the specified command and then starts tracing at one of four points: exec (before the first instruction of the new program), preinit (after ld has loaded all libraries), postinit (after each library's _init has run), or main (right before the first instruction of the program's main function, though this is not supported in macOS).
If you use dtrace -x evaltime=exec -c <program> BEGIN will fire right before the first instruction of the program executes:
# dtrace -xevaltime=exec -c /usr/bin/true -n 'BEGIN{ts = timestamp}' -n 'pid$target:::entry{printf("%dus", (timestamp - ts)/1000); exit(0); }'
dtrace: description 'BEGIN' matched 1 probe
dtrace: description 'pid$target:::entry' matched 1767 probes
dtrace: pid 1848 has exited
CPU ID FUNCTION:NAME
10 16757 _dyld_start:entry 285us
The 285us is due to the time it takes dtrace to resume the process via /proc or ptrace(2) on macOS. Rather than proc:::start or proc:::lwp-start you may be able to use BEGIN, pid$target::_dyld_start:entry, or pid$target::main:entry.

why does the linux kernel thread hog up cpu

I have created a kernel thread using kthread_run in a kernel module.
The thread is very simple, just like bellow.
static int my_thread_func(void * data)
{
int a;
DBG_PRINT("policy:%lu; prio:%d", current->policy, current->prio);
while (!kthread_should_stop())
{
a++;
}
}
However, after I loaded the module, the system did not response any more.
So I wonder what's the schedule policy and priority of this kernel thread.
Then I try to print out the schedule policy and priority of this kernel thread,
and got bellow output.
policy:0; prio:120
policy:0 means SCHED_NORMAL;
prio:120 this is also not high.
While the thread does not have a SCHED_FIFO or SCHED_RR schedule policy, Why can it hog up the cpu?
And I also found that if I insert some sleep code in the loop body of the thread, the system could remain responsive.
And I also found when I run a userspace program implemented as bellow, the system remained responsive, too.
int main(int argc, char *argv[])
{
int a;
while (1) a++;
return 0;
}
So who can tell me, why the kernel thread could hog up the cpu.
When you say the priority is 120, are you observing the priority of kthreadd, or the actual kernel thread that was created for you?
Please see http://lxr.free-electrons.com/source/kernel/kthread.c#L310 for the function that is used to create new kernel threads.
Excerpt:
struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
void *data, int node,
const char namefmt[],
...)
{
...
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
...
/*
* root may have changed our (kthreadd's) priority or CPU mask.
* The kernel thread should not inherit these properties.
*/
sched_setscheduler_nocheck(task, SCHED_NORMAL, &param);
set_cpus_allowed_ptr(task, cpu_all_mask);
}
...
}
It appears that the sched_priority is set to 0.
Now, please look at http://lxr.free-electrons.com/source/include/linux/sched/prio.h#L9
Excerpt:
/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
* tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
* values are inverted: lower p->prio value means higher priority.
*
* The MAX_USER_RT_PRIO value allows the actual maximum
* RT priority to be separate from the value exported to
* user-space. This allows kernel threads to set their
* priority to a value higher than any user task. Note:
* MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
*/
Please note the second paragraph: This allows kernel threads to set their priority to a value higher than any user task.
Summary:
It appears that newly created kernel threads are set to sched_priority of 0 even with SCHED_NORMAL. Priority of 0 is not a normal priority for SCHED_NORMAL, and thus the kernel thread will take priority over any other thread (that isn't using an RT policy---and I don't think any processes in the kernel use RT by default).
Addendum:
Note: I am not 100% sure if this is the reason. But, if you look at the comments in the kernel for kernel threads, they all seem to imply that a kernel thread keeps running UNLESS:
The kernel thread itself yields or calls do_exit.
Someone else calls kthread_should_stop().
Which to me, sounds like the kernel thread runs as long as it wants, until it decides to stop, or someone else explicitly tells it to stop.

Condition Variable alternatives (c/c++ on windows xp)

I want to write a thread which runs tasks from an unlimited-size container of tasks.
While the task-list is empty the thread trying to get a task should be blocked.
Coming from Linux I wanted to use condition variable which will be signaled on task adding and will be waited while the list is empty.
I found that CONDITION_VARIABLE is available only from windows Vista, so this is out of question.
Semaphores are problematic too due to the unlimited-size restriction.
Is there any apropriate subtitution?
Thanks
Why do you say that semaphores are problematic? Linux/Windows both have semaphores with a maximum count that can be realistically be described as 'Unlimited'.
Use James' suggestion on Windows - it will work fine. Init. your semaphore with zero count. Add a task to your big (thread-safe), container, then signal the semaphore. In the thread, wait on the semaphore, then get a task from your container and process it. You can pass the semaphore instance to multiple threads if you wish - that will work OK as well.
Rgds,
Martin
Sounds like you want a Win32 kernel event. See CreateEvent.
WaitForSingleObject and CreateSemaphore?
Thanks all,
thats my conclusion:
void ThreadPool::ThreadStartPoint(ThreadPool* tp)
{
while (1)
{
WaitForSingleObject(tp->m_taskCountSemaphore,INFINITE); // while (num of tasks==0) block; decreament num of tasks
BaseTask* current_task = 0;
// get top priority task
EnterCriticalSection (&tp->m_mutex);
{
current_task = tp->m_tasksQue.top();
tp->m_tasksQue.pop();
}
LeaveCriticalSection (&tp->m_mutex);
current_task->operator()(); // this is not critical section
current_task->PostExec();
}
}
void ThreadPool::AddTask(BaseTask& _task)
{
EnterCriticalSection (&m_mutex);
{
m_tasksQue.push(&_task);
_task.PrepareTask(m_mutex);
}
LeaveCriticalSection (&m_mutex);
if (!ReleaseSemaphore(m_taskCountSemaphore,
1, // increament num of tasks by 1
NULL // don't store previuos num of tasks value
))
{//if failed
throw ("semaphore release failed");
}
}

How do I automatically destroy child processes in Windows?

In C++ Windows app, I launch several long running child processes (currently I use CreateProcess(...) to do this.
I want the child processes to be automatically closed if my main processes crashes or is closed.
Because of the requirement that this needs to work for a crash of the "parent", I believe this would need to be done using some API/feature of the operating system. So that all the "child" processes are cleaned up.
How do I do this?
The Windows API supports objects called "Job Objects". The following code will create a "job" that is configured to shut down all processes when the main application ends (when its handles are cleaned up). This code should only be run once.:
HANDLE ghJob = CreateJobObject( NULL, NULL); // GLOBAL
if( ghJob == NULL)
{
::MessageBox( 0, "Could not create job object", "TEST", MB_OK);
}
else
{
JOBOBJECT_EXTENDED_LIMIT_INFORMATION jeli = { 0 };
// Configure all child processes associated with the job to terminate when the
jeli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
if( 0 == SetInformationJobObject( ghJob, JobObjectExtendedLimitInformation, &jeli, sizeof(jeli)))
{
::MessageBox( 0, "Could not SetInformationJobObject", "TEST", MB_OK);
}
}
Then when each child process is created, execute the following code to launch each child each process and add it to the job object:
STARTUPINFO info={sizeof(info)};
PROCESS_INFORMATION processInfo;
// Launch child process - example is notepad.exe
if (::CreateProcess( NULL, "notepad.exe", NULL, NULL, TRUE, 0, NULL, NULL, &info, &processInfo))
{
::MessageBox( 0, "CreateProcess succeeded.", "TEST", MB_OK);
if(ghJob)
{
if(0 == AssignProcessToJobObject( ghJob, processInfo.hProcess))
{
::MessageBox( 0, "Could not AssignProcessToObject", "TEST", MB_OK);
}
}
// Can we free handles now? Not sure about this.
//CloseHandle(processInfo.hProcess);
CloseHandle(processInfo.hThread);
}
VISTA NOTE: See AssignProcessToJobObject always return "access denied" on Vista if you encounter access-denied issues with AssignProcessToObject() on vista.
One somewhat hackish solution would be for the parent process to attach to each child as a debugger (use DebugActiveProcess). When a debugger terminates all its debuggee processes are terminated as well.
A better solution (assuming you wrote the child processes as well) would be to have the child processes monitor the parent and exit if it goes away.
Windows Job Objects sounds like a good place to start. The name of the Job Object would have to be well-known, or passed to the children (or inherit the handle). The children would need to be notice when the parent dies, either through a failed IPC "heartbeat" or just WFMO/WFSO on the parent's process handle. At that point any child process could TermianteJobObject to bring down the whole group.
You can keep a separate watchdog process running. Its only task is watching the current process space to spot situations like you describe. It could even re-launch the original application after a crash or provide different options to the user, collect debug information, etc. Just try to keep it simple enough so that you don't need a second watchdog to watch the first one.
You can assign a job to the parent process before creating processes:
static HANDLE hjob_kill_on_job_close=INVALID_HANDLE_VALUE;
void init(){
hjob_kill_on_job_close = CreateJobObject(NULL, NULL);
if (hjob_kill_on_job_close){
JOBOBJECT_EXTENDED_LIMIT_INFORMATION jobli = { 0 };
jobli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
SetInformationJobObject(hjob_kill_on_job_close,
JobObjectExtendedLimitInformation,
&jobli, sizeof(jobli));
AssignProcessToJobObject(hjob_kill_on_job_close, GetCurrentProcess());
}
}
void deinit(){
if (hjob_kill_on_job_close) {
CloseHandle(hjob_kill_on_job_close);
}
}
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE causes all processes associated with the job to terminate when the last handle to the job is closed. By default, all child processes will be assigned to the job automatically, unless you passed CREATE_BREAKAWAY_FROM_JOB when calling CreateProcess. See https://learn.microsoft.com/en-us/windows/win32/procthread/process-creation-flags for more information about CREATE_BREAKAWAY_FROM_JOB.
You can use process explorer from Sysinternals to make sure all processes are assigned to the job. Just like this:
You'd probably have to keep a list of the processes you start, and kill them off one by one when you exit your program. I'm not sure of the specifics of doing this in C++ but it shouldn't be hard. The difficult part would probably be ensuring that child processes are shutdown in the case of an application crash. .Net has the ability to add a function that get's called when an unhandled exception occurs. I'm not sure if C++ offers the same capabilities.
You could encapsulate each process in a C++ object and keep a list of them in global scope. The destructors can shut down each process. That will work fine if the program exits normally but it it crashes, all bets are off.
Here is a rough example:
class myprocess
{
public:
myprocess(HANDLE hProcess)
: _hProcess(hProcess)
{ }
~myprocess()
{
TerminateProcess(_hProcess, 0);
}
private:
HANDLE _hProcess;
};
std::list<myprocess> allprocesses;
Then whenever you launch one, call allprocessess.push_back(hProcess);

Resources