Unexpected blocking behavior on OSX, Pthreads and calls to system(3) - macos

I am working on a program to start a long running process (afplay, with long sound files) using system() and at a later time possibly decide to terminate this process. It seems that it would be straightforward to invoke a system("prog") call and then later a system("killall prog") call. Using pthreads, I fire up a thread to invoke the initial system("prog") call and then later if application detects that its time to terminate early, the main thread will call system("killall prog"). Through print statements I can see that the main thread properly detects the logic to stop but the subsequent system call blocks until the original system call is finished (the main thread doesn't appear to block until this time, other activity does progress past the thread creation for the initial system call). If I try the killall from a separate shell after my program invokes the prog, killlall works (as you'd expect). I know that macOS has requirements on programs that interact with the ui libraries need to handle such activity from the main thread only. Are there other requirements for programs shelling out to system(3) that I clearly am ignorant of?
On windows, the only difference in the code is the choices for "prog", and the behavior works as I expect.

system() is expected to block until the launched program exits -- if it didn't, there would be no way for system() to return the exit-status of the child process as part of its return value.
If you want your thread to continue executing in parallel with the child process, you will need to use a different API (typically fork() followed by calling exec() from the child-process's branch of the fork) instead.

Related

How can I handle a `system` call which fails?

I have a Perl script which calls an external program. (Right now I'm actually using backticks, but I could just as easily use system or something from cpan.) Sometimes the program fails, causing Windows to create a dialog box "(external program) has stopped working" with the text
Windows is checking for a solution to the problem...
shortly replaced with
A problem caused the program to stop working correctly. windows will close the program and notify you if a solution is available.
Unfortunately, this error message stops the process from dying, causing Perl to not return until the user (me!) clicks "Cancel" or "Close Program". Is there a way to avoid this behavior?
In my use case it is acceptable to have the program fail -- it does useful but strictly not necessary work. But as it needs to run unattended I can't have it block the program's remaining work.
The problem with your current approach is that backticks & system block while the external program is running/hanging. Possible other aproaches might include.
Using threads & various modules from the Win32 family to busy-wait for the process end or click on the dialong box. This is probably overkill.
Use an Alarm Signal or Event to wake up your program when the external program has taken 'too long' to respond.
Use an IPC Module to open the program and monitor it's progress.
If you don't need the child program's return value, STDOUT or STDERR, simbabque's exec option has merit, but if you need to keep a handle on the process, try Win32::Process. I've found this useful on many an occasion. The module's wait method can be an excellent alternative to my Alarm suggestion or simabque's sleep suggestion with the added benefit that your program will not sleep longer than required by the child.
If you do not need to wait for the external program to finish running to continue, you can do exec instead of system and it will never return.
You could always add a sleep $n afterwards to make it wait for the external program to theoretically finish.
exec('maybe_dies.exe');
sleep 1; # make sure it does stuff before it dies, or not, or whatever...

Bash scripting, react immediately to signals

I have a shell script which runs very large simulation binaries. This becomes problematic when I want to request some output of variables in the script. For instance, when I run 10 large simulations, I want to be able to print which iteration I am on without having to wait a minute or two for the current simulation to terminate.
Currently, I am using the trap command. However, the script does not react immediately to signals but will only execute the binded function when the current iteration terminates. I will post the code if anyone needs it.
You should start threads for each large thing you're going to run. Have those threads dump results somewhere, then you have your main method free waiting to interrogate the results on the fly.

How to exit the entire call stack of shell scripts if a child script fails?

I have set of shell scripts, around 20-30, that are used for performing one big task as a whole. The wrapper script calls mainly the high-level task scripts but internally those scripts calls other scripts like and the flow goes on in a nested manner.
I want to know if there is a way to exit the entire call stack if some critical script fails. Normally I run exit 125 command and then catch that in caller script and so on but I feel that little complicated. Is there a special exit that will abort the entire call stack? I don't want to use kill command to abort the wrapper script process.
You could have your main wrapper script start every sub-script in its own process group, using e.g. chpst -P.
Then the sub-scripts, as well as their children, could kill their own process group by sending it a KILL signal, and this would not affect the main wrapper script.
I think this would be a bad idea and what you're currently doing is the good way, though (because it makes the code easier to follow).

Are Process::detach and Process::wait mutually exclusive (Ruby)?

I'm refactoring a bit of concurrent processing in my Ruby on Rails server (running on Linux) to use Spawn. Spawn::fork_it documentation claims that forked processes can still be waited on after being detached: https://github.com/tra/spawn/blob/master/lib/spawn.rb (line 186):
# detach from child process (parent may still wait for detached process if they wish)
Process.detach(child)
However, the Ruby Process::detach documentation says you should not do this: http://www.ruby-doc.org/core/classes/Process.html
Some operating systems retain the status of terminated child processes until the parent collects that status (normally using some variant of wait(). If the parent never collects this status, the child stays around as a zombie process. Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates. Use detach only when you do not intent to explicitly wait for the child to terminate.
Yet Spawn::wait effectively allows you to do just that by wrapping Process::wait. On a side note, I specifically want to use the Process::waitpid2 method to wait on the child processes, instead of using the Spawn::wait method.
Will detach-and-wait not work correctly on Linux? I'm concerned that this may cause a race condition between the detached reaper thread and the waiting parent process, as to who collects the child status first.
The answer to this question is there in the documentation. Are you writing code for your own use in a controlled environment? Or to be used widely by third parties? Ruby is written to be widely used by third parties, so their recommendation is to not do something that could fail on "some operating systems". Perhaps the Spawn library is designed primarily for use on Linux machines and tested only on a small subset thereof where this tactic works.
If you're distributing the code you're writing to be used by anyone and everyone, I would take Ruby's approach.
If you control the environment where this code will be run, I would write two tests:
A test that spawns a process, detaches it and then waits for it.
A test that spawns a process and then just waits for it.
Count the failure rate for both and if they are equal (within a margin that you feel is acceptable), go for it!

Send CTRL+C to subprocess tree on Windows

I would like to run arbitrary console-based sub-processes and manage them from a single master process. The console based sub-processes communicate via stdin, stdout and stderr, and if you run them in a genuine console they terminate cleanly when you press CTRL+C. Some of them may in fact be a tree of processes, such as a batch script that runs an executable which may in turn run another executable to do some work. I would like to redirect their standard I/O (for example, so that I can show their output in a GUI window) and in certain circumstances to send them a CTRL+C event so that they will give up and terminate cleanly.
The following two diagrams show first the normal structure - one master process has four worker sub-processes, and some of those workers have their own subprocesses; and then what should happen when one of the workers needs to be stopped - it and all of its children should get the CTRL+C event, but no other processes should receive the CTRL+C event.
(source: livejournal.com)
Additionally, I would much prefer that there are no extra windows visible to the user.
Here's what I've tried (note that I'm working in Python, but solutions for C would still be helpful):
Spawning an extra intermediate process with CREATE_NEW_CONSOLE, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Unfortunately, CREATE_NEW_CONSOLE seems to prevent me from redirecting the standard I/O channels, so I'm left with no easy way to get the output back to the main program.
Spawning an extra intermediate process with CREATE_NEW_PROCESS_GROUP, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Somehow, this manages to send the CTRL+C only to the master process, which is completely useless. On closer inspection, GenerateConsoleCtrlEvent says that CTRL+C cannot be sent to process groups.
Spawning the subprocess with CREATE_NEW_PROCESS_GROUP. Then call GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, pid) to kill the worker. This is not ideal, because CTRL+BREAK is less friendly than CTRL+C and will probably result in a messier termination. (E.g. if it's a Python process, no KeyboardInterrupt can be caught and no finally blocks run.)
Is there any good way to do what I want? I can see that I could theoretically build on the first attempt and find some other way to communicate between the processes, but I am worried it will turn out to be extremely awkward. Are there good examples of other programs that achieve the same effect? It seems so simple that it can't be all that uncommon a requirement.
I don't know about managing/redirecting stdin et. al., but for managing the subprocess tree
have you considered using the Windows Job Objects api?
There are several other questions about managing process trees (How do I automatically destroy child processes in Windows? Performing equivalent of “Kill Process Tree” in c++ on windows) and it looks like the cleanest method if you can use it.
Chapter 5 of Windows Via C/C++ by Jeffery Richter has a good discussion on using CreateJobObject and the related APIs.

Resources