I have to validate state of some data in a MPI program. The program will run on a super computer with a distributed memory system. A quick research about C Standard's assert function revealed that assert internally uses abort() function for program termination. I haven't found much information as to how abort() works on a multi-process program, especially in MPI's context - it is very different than POSIX enviroment. Does abort() only terminate the process in which it is called, or can it terminate all the processes?
And finally how would I really terminate all processes of a MPI program when a condition fails? Is there a built-in assert in MPI library?
abort() only terminates the MPI task that invokes it.
It is very likely that will be detected by mpirun and/or the resource manager, and kill all the MPI job (e.g. all the MPI tasks on all nodes) after that.
That being said, this is library/system dependent, and you should double check that first.
The right way to terminate a MPI job is to
MPI_Abort(MPI_COMM_WORLD, errorcode)
errorcode is an int and is generally assigned a strictly positive value.
Related
This is purely academic question, I don't really need to know this information for anything, but I would like to understand kernel a bit more :-)
According to kernel documentation http://www.tldp.org/LDP/tlk/kernel/processes.html processes in linux kernel have following states:
Running
The process is either running (it is the current process in the
system) or it is ready to run (it is waiting to be assigned to one of
the system's CPUs).
Waiting
The process is waiting for an event or for a resource. Linux
differentiates between two types of waiting process; interruptible and
uninterruptible. Interruptible waiting processes can be interrupted by
signals whereas uninterruptible waiting processes are waiting directly
on hardware conditions and cannot be interrupted under any
circumstances.
Stopped
The process has been stopped, usually by receiving a signal. A process
that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a
task_struct data structure in the task vector. It is what it sounds
like, a dead process.
As you can see, when I take a snapshot of processes state, using command like ps, I can see, if it's in Running state, that process either was literally Running or just waiting to be assigned to some CPU by kernel.
In my opinion, these 2 states (that are actually both represented by 1 state in task_struct) are quite different.
Why there is no state like "Ready" that would mean the process is "ready to run" but wasn't assigned to any CPU so far, so that the task_struct would be more clear about the real state? Is it even possible to retrieve this information, or is it secret for whatever reason which process is "literally running" on the CPU?
The struct task_struct contains a long to represent current state:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
This simply indicates if a process is 'runnable'.
To see the currently executing process you should look at the runqueue. Specifically a struct rq (as defined in kernel/sched/sched.h) contains:
struct task_struct *curr, *idle, *stop;
The pointer *curr is the currently running process on this runqueue (there exists a runqueue per CPU).
You should consult files under kernel/sched/ to see how the Kernel determines which processes should be scheduled according to the different scheduling algorithms if you are interested in exactly how it arrives at the running state.
This is not a linux-kernel answer but a more general about scheduling ^^
A core part of any OS is the Scheduler: http://en.wikipedia.org/wiki/Process_scheduler
Many of them work giving every process a time slice of execution and letting each of them do a little bit of work before switching (referred as a context switch) to another process.
Since the length of a time slice is in the order of milliseconds by the time the information you requested is shown, the state has surely changed so differentiate between "Really Running" and "Ready-but-not-really-running" could result (most of the time) in inaccurate informations.
Suppose in a two process environment, one process is scheduled for execution by the kernel, and it demanded for some data which is not available in the RAM. So the cpu will indicate the kernel that something is not available and the process will be suspended. Then after kernel loads the second process for execution through the CPU and start investigating about the data in secondary memory location (say virtual memory) and gets it, puts it back to main memory by a swap to the memory data which is currently inactive, and puts the process back in the ready queue for execution.
We know that everything in computer system is get manipulated by CPU only and if CPU is busy executing continuously the process code then who is executing the kernel code to perform the tasks done by kernel?
Please let me know if i am able to explain the scenario.
At any point in time, CPU (/s) will be
Running a process in User Mode.
Running on behalf of a process in Kernel Mode to execute previleged instruction or access hardware (for example when system call read / write is issued).
Running in repsonse to a hardware interrupt. i.e. running in interrupt context. (Not associated with any process in particular) and yes in kernel mode.
Running some kernel threads to serve deferred work like soft irq. (Tasklet / Softirq)
Running CPU idle thread if nothing is there to execute.
If you are in particular asking about scheduling, then
Suppose a process is running and now it has issued a read call to retrieve data from hard disk, say, then process is removed from cpu and kernel invokes schedule() functions. So here, first process issues read system call, which results in switching from user mode to kernel mode. The kernel which is running on behalf of the process prepares for the hard disk read operation and then calls schedule() function
Suppose a hardware interrupt has come, then currently running process is removed, and interrupt service handler for that interrupt begins to execute in kernel mode (obviously).
Basically, kernel runs in between user processes !!
Clear now ?
Shash
The kernel runs either as a result of a hardware interrupt, or as a result of being invoked by a process to do something. In both cases the code which was executing at that moment stops running until the kernel finishes its job.
It is similar to a function call: when function A calls function B, function A has to wait until function B is done doing what it does, and returns control to function A. You do not need multiple CPUs, or any kind of magic to accomplish this.
The CPU is not continuously executing process code. The CPU is interrupted to perform various operations. Interrupts can occur for various reasons: a resource becomes available, a previous action completes, or simply a timer goes off.
I recommend this series of videos for more in-depth information: http://academicearth.org/courses/operating-systems-and-system-programming
I have a Win32 native VC++ application that upon entering WinMain() starts a separate thread, then does some useful job while that other thread is running, then simply exits WinMain() - the other thread is not explicitly stopped.
This blog post says that a .NET application will not terminate in this case since the other thread is still running. Does the same apply to native Win32 applications?
Do I have to stop all threads prior to exiting?
Yes, you have to if you are simply exiting or terminating the main thread via ExitThread or TerminateThread, otherwise your application may not fully shutdown. I recommend reading Raymond Chen's excellent blog posts on this topic:
The old-fashioned theory on how processes exit
Quick overview of how processes exit on Windows XP
How my lack of understanding of how processes exit on Windows XP forced a security patch to be recalled
During process termination, the gates are now electrified
If you return from the main thread, does the process exit?
But please note in particular that if you properly return from the main or WinMain function, the process will exit as described by the ExitProcess API documentation and the last post by Raymond Chen that is being linked above!
The short of it is:
For a native Win32 process to terminate, one of two conditions must be met:
Someone calls ExitProcess or TerminateProcess.
All the threads exit (by returning from their ThreadProc (including the WinMainEntryPoint that is the first thread created by windows)), close (by calling ExitThread), or terminated (someone calls TerminateThread).
(The first condition is actually the same as the 2nd: ExitProcess and TerminateProcess, as part of their cleanup, both call TerminateThread on each thread in the process).
The c-runtime imposes different conditions: For a C/C++ application to terminate, you must either:
return from main (or WinMain).
call exit()
Calling exit() or returning from main() both cause the c-runtime to call ExitProcess(). Which is how c & c++ applications exit without cleaning up their threads. I, personally, think this is a bad thing.
However, non trivial Win32 processes can never terminate because many perfectly, otherwise reasonable, Win32 subsystems create worker threads. winsock, ole, etc. And do not provide any way to cause those threads to spontaneously close.
No, when WinMain returns, the process will be terminated, and this means all threads spawned by the process should be terminated though they might not be closed gracefully.
However, it is possible that a primary thread is terminated while the other threads are running, resulting in the application is still running. If you call ExitThread (not exit or ExitProcess) in WinMain, and there are running threads (eventually created by the primary thread), then, you may observe this behavior. Nonetheless, just return in WinMain will call ExitProcess, and that means all threads are should be terminated.
Correct me if it's wrong.
I think you can first close all your windows(so the user won't see your application), and then set a flag for exit, your thread should check the flag periodicly, and once found set, the thread should return.
after set the flag, your main thread could call ::WaitForSingleObject() or ::WaitForMultipleObjects() for a while (say, three seconds), if the thread(s) not return, just kill them by ::TerminateThread().
Want to improve this post? Provide detailed answers to this question, including citations and an explanation of why your answer is correct. Answers without enough detail may be edited or deleted.
short answer : yes
I want to write a program, which will launch a child process. The child process may be windows mode or console mode program.
I want to monitor the child process status and resource usage. e.g. I want to know the child process is still running or terminated. If it terminated, I want to know the reason (is terminated normally or because of crash?).
And during the child process running and/or it terminated, I want to know its resource usage, especially CPU time (user time, system) and memory usage (virtual size and/or rss). It is OK if the numbers are not very accurate.
In Unix terminology, I want to fork, exec, waitpid and getrusage . And fork+setrusage+exec can limit child's resource usage. But I don't know how to do these on the Windows platform.
Please point me the Windows API name. I could study the rest myself.
Prefer not using library other than the Windows API. Prefer it is not parent working as debugger and attaching to child process. Just not prefer, but still acceptable.
When you call CreateProcess, it returns a handle to the process.
WaitForSingleObject on a process handle will block until the process has exited or time-out has expired. A timeout of zero will return immediately and indicate if the process is still running.
BOOL IsProcessRunning(HANDLE process)
{
return WaitForSingleObject(process, 0) != WAIT_OBJECT_0;
}
void WaitForProcessToExit(HANDLE process)
{
WaitForSingleObject(process, INFINITE);
}
To get the exit code of a running process, you can use GetExitCodeProcess. You'll need to interpret what the error code means, however. 0xC0000005 is typical for an access violation, but not all crashes result in this error code.
For resource usage, you can call GetProcessTimes to get total CPU time, GetGuiResources to get GDI handle info, GetProcessMemoryInfo to get memory stats, and GetProcessIoCounters to get IO info.
Do we have any sort of relationship between fork() and CreateThread? Is there anything that
CreateThread internally calls fork()?
In NT, the fundamental working unit is called a thread (ie NT schedules threads, not processes.). User threads run in the context of a process. When you call CreateThread, you request the NT kernel to allocate a working unit within the context of your process (you also have fibres that are basically threads you can schedule yourself but that's beyond the topic of your question).
When you call CreateThread you provide the function with an entry point that is going to be run after the function is called. The code must be within the virtual space of the process and the page must have execution rights. Put simply, you give a function pointer. ;)
fork() is an UNIX function that requests the kernel to create copy of the running process. The parent process gets the pid of the child process and the child process gets 0 (this way you know who you are).
If you wish to create a process in Windows, you call the CreateProcess function, but that doesn't behave like fork(). The reason being that most of the time you will create threads, not processes.
As you can see, there is no relation between CreateThread and fork.
fork() only exists on Unix systems and it creates a new process with the same state as the caller. CreateThread() creates a new thread in the same process.
The Windows and Unix process model is fundamentally very different, so there is no way of directly mapping the API from one on top of the other.
fork() clones the current process into two. In the parent process, fork() returns the pid, and in the child it returns 0. This is typically used like this:
int pid;
if (pid = fork()) {
// this code is executed in the parent
} else {
// this code is executed in the child
}
Cygwin is an emulation layer for building and running Unix applications on Windows which emulates the behavior of fork() using CreateProcess().
CreateThread - is for threads, fork - is for creating duplicate process. And there is no native way to have fork functionality for windows (at least through Win32 ).
You might want to know Microsoft provides fork() in high-end versions of Windows with component called Subsystem for UNIX-based Applications (SUA). You can find details in my answer here.
Found this link which i believe could be helpful in clearing few facts regarding forking/threading.
Sharing over here: http://www.geekride.com/index.php/2010/01/fork-forking-vs-threading-thread-linux-kernel/