Why doesn't this fork loop work as expected? - windows

Why does the following code:
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $pm = new Parallel::ForkManager(5);
my #all_data = ('a','b','c','d','e','f');
foreach my $data (#all_data) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next;
print "Hello $pid\n";
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
print:
Hello 0
Hello 0
Hello 0
Hello 0
Hello 0
I am new to Perl and I am trying to catch up on multiprocessing in Perl

From the docs for the start method:
This method does the fork. It returns the pid of the child process for the parent, and 0 for the child process.
As it happens, the fork function does the same, which is why start mirrors it.
The parent may need the child PID to control the child – sending signals and stuff – but the child knows its own PID via the $$ variable:
foreach my $data (#all_data) {
$pm->start and next;
print "Hello $$\n";
$pm->finish;
}
Example output:
Hello 22215
Hello 22218
Hello 22219
Hello 22217
Hello 22220
Hello 22216

my $pid = $pm->start and next;
"and" logic will evaluate to true if both of the arguments are true. If the first argument is false, then the "and" logic will shortcut and will not evaluate the second argument
You might want to use "or" logic instead.

Related

Perl Windows alternatives for "system" ( opening multiple processes )

So long story short, I'm trying to run a linux perl script in Windows ( with few modifications ).
On Unix it works just fine, but on Windows I come to the conclusion that calling for system doesn't work the same as on Unix and so it doesn't create multiple processes.
Below is the code :
use strict;
use warnings;
open (FIN, 'words.txt'); while (<FIN>) {
chomp;
my $line = $_;
system( "perl script.pl $line &" );
}
close (FIN);
So basically, I have 5 different words in "words.txt" which I want each and every one to be used one by one when calling for script.pl , which means :
word1 script.pl
word2 script.pl
word3 script.pl
etc
As of now it opens just the first word in words.txt and it loops with that one only. As I said, on Unix it works perfectly, but not on Windows.
I've tried to use "start" system( "start perl script.pl $line &" ); and it works...except it opens 5 additional CMDs to do the work. I want it to do the work on the same window.
If anyone has any idea how this can work on window, i'll really appreciate it.
Thanks!
According to perlport :
system
(Win32) [...] system(1, #args) spawns an external process and
immediately returns its process designator, without waiting for it to
terminate. Return value may be used subsequently in wait or waitpid.
Failure to spawn() a subprocess is indicated by setting $? to 255 <<
8. $? is set in a way compatible with Unix (i.e. the exit status of the subprocess is obtained by $? >> 8, as described in the
documentation).
I tried this:
use strict;
use warnings;
use feature qw(say);
say "Starting..";
my #pids;
for my $word (qw(word1 word2 word3 word3 word5)) {
my $pid = system(1, "perl script.pl $word" );
if ($? == -1) {
say "failed to execute: $!";
}
push #pids, $pid;
}
#wait for all children to finish
for my $pid (#pids) {
say "Waiting for child $pid ..";
my $ret = waitpid $pid, 0;
if ($ret == -1) {
say " No such child $pid";
}
if ($? & 127) {
printf " child $pid died with signal %d\n", $? & 127;
}
else {
printf " child $pid exited with value %d\n", $? >> 8;
}
}
say "Done.";
With the following child script script.pl :
use strict;
use warnings;
use feature qw(say);
say "Starting: $$";
sleep 2+int(rand 5);
say "Done: $$";
sleep 1;
exit int(rand 10);
I get the following output:
Starting..
Waiting for child 7480 ..
Starting: 9720
Starting: 10720
Starting: 9272
Starting: 13608
Starting: 13024
Done: 13608
Done: 10720
Done: 9272
Done: 9720
Done: 13024
child 7480 exited with value 9
Waiting for child 13344 ..
child 13344 exited with value 5
Waiting for child 17396 ..
child 17396 exited with value 3
Waiting for child 17036 ..
child 17036 exited with value 6
Waiting for child 17532 ..
child 17532 exited with value 8
Done.
Seems to work fine..
You can use Win32::Process to get finer control over creating a new process than system gives you on Windows. In particular, the following doesn't create a new console for each process like using system("start ...") does:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
# Older versions don't work with an undef appname argument.
# Use the full path to perl.exe on them if you can't upgrade
use Win32::Process 0.17;
my #lines = qw/foo bar baz quux/; # For example instead of using a file
my #procs;
for my $line (#lines) {
my $proc;
if (!Win32::Process::Create($proc, undef, "perl script.pl $line", 1,
NORMAL_PRIORITY_CLASS, ".")) {
$_->Kill(1) for #procs;
die "Unable to create process: $!\n";
}
push #procs, $proc;
}
$_->Wait(INFINITE) for #procs;
# Or
# use Win32::IPC qw/wait_all/;
# wait_all(#procs);
As Yet Another Way To Do It, the start command takes a /b option to not open a new command prompt.
system("start /b perl script.pl $line");

How to wait for grandchild process (`bash` retval becomes -1 in Perl due to SIG CHLD)

I have a Perl script (snippet below) that runs in cron to perform system checks. I fork a child as a timeout and reap it with SIG{CHLD}. Perl does several system calls of Bash scripts and checks their exit status. One bash script fails about 5% of the time with no error. The Bash scripts exists with 0 and Perl sees $? as -1 and $! as "No child processes".
This bash script tests compiler licenses, and Intel icc is left around after the Bash script completes (ps output below). I think the icc zombie completes, forcing Perl into SIG{CHLD} handler, which blows away the $? status before I'm able to read it.
Compile status -1; No child processes
#!/usr/bin/perl
use strict;
use POSIX ':sys_wait_h';
my $GLOBAL_TIMEOUT = 1200;
### Timer to notify if this program hangs
my $timer_pid;
$SIG{CHLD} = sub {
local ($!, $?);
while((my $pid = waitpid(-1, WNOHANG)) > 0)
{
if($pid == $timer_pid)
{
die "Timeout\n";
}
}
};
die "Unable to fork\n" unless(defined($timer_pid = fork));
if($timer_pid == 0) # child
{
sleep($GLOBAL_TIMEOUT);
exit;
}
### End Timer
### Compile test
my #compile = `./compile_test.sh 2>&1`;
my $status = $?;
print "Compile status $status; $!\n";
if($status != 0)
{
print "#compile\n";
}
END # Timer cleanup
{
if($timer_pid != 0)
{
$SIG{CHLD} = 'IGNORE';
kill(15, $timer_pid);
}
}
exit(0);
#!/bin/sh
cc compile_test.c
if [ $? -ne 0 ]; then
echo "Cray compiler failure"
exit 1
fi
module swap PrgEnv-cray PrgEnv-intel
cc compile_test.c
if [ $? -ne 0 ]; then
echo "Intel compiler failure"
exit 1
fi
wait
ps
exit 0
The wait doesn't really wait because cc calls icc which creates a zombie grandchild process that wait (or wait PID) doesn't block for. (wait `pidof icc`, 31589 in this case, gives "not a child of this shell")
user 31589 1 0 12:47 pts/15 00:00:00 icc
I just don't know how to fix this in Bash or Perl.
Thanks, Chris
Isn't this a use case for alarm? Toss out your SIGCHLD handler and say
local $? = -1;
eval {
local $SIG{ALRM} = sub { die "Timeout\n" };
alarm($GLOBAL_TIMEOUT);
#compile = `./compile_test.sh 2>&1`;
alarm(0);
};
my $status = $?;
instead.
I thought the quickest solution would be to add sleep of a second or two at the bottom of the bash script to wait for the zombie icc to complete. But that didn't work.
If I didn't already have a SIG ALRM (in the real program) I agree the best choice would be to wrap the whole thing in a eval. Even thought that would be pretty ugly for a 500 line program.
Without the local($?), every `system` call gets $? = -1. The $? I need in this case is after waitpid, then unfortunately set to -1 after the sig handler exits. So I find this works. New lines shown with ###
my $timer_pid;
my $chld_status; ###
$SIG{CHLD} = sub {
local($!, $?);
while((my $pid = waitpid(-1, WNOHANG)) > 0)
{
$chld_status = $?; ###
if($pid == $timer_pid)
{
die "Timeout\n";
}
}
};
...
my #compile = `./compile_test.sh 2>&1`;
my $status = ($? == -1) ? $chld_status : $?; ###
...
We had a similar issue, here is our solution: Leak a write-side file descriptor into the grandchild and read() from it which will block until it exits.
See also: wait for children and grand-children
use Fcntl;
# OCF scripts invoked by Pacemaker will be killed by Pacemaker with
# a SIGKILL if the script exceeds the configured resource timeout. In
# addition to killing the script, Pacemaker also kills all of the children
# invoked by that script. Because it is a kill, the scripts cannot trap
# the signal and clean up; because all of the children are killed as well,
# we cannot simply fork and have the parent wait on the child. In order
# to work around that, we need the child not to have a parent proccess
# of the OCF script---and the only way to do that is to grandchild the
# process. However, we still want the parent to wait for the grandchild
# process to exit so that the OCF script exits when the grandchild is
# done and not before. This is done by leaking the write file descriptor
# from pipe() into the grandchild and then the parent reads the read file
# descriptor, thus blocking until it gets IO or the grandchild exits. Since
# the file descriptor is never written to by the grandchild, the parent
# blocks until the child exits.
sub grandchild_wait_exit
{
# We use "our" instead of "my" for the write side of the pipe. If
# we did not, then when the sub exits and $w goes out of scope,
# the file descriptor will close and the parent will exit.
pipe(my $r, our $w);
# Enable leaking the file descriptor into the children
my $flags = fcntl($w, F_GETFD, 0) or warn $!;
fcntl($w, F_SETFD, $flags & (~FD_CLOEXEC)) or die "Can't set flags: $!\n";
# Fork the child
my $child = fork();
if ($child) {
# We are the parent, waitpid for the child and
# then read to wait for the grandchild.
close($w);
waitpid($child, 0);
<$r>;
exit;
}
# Otherwise we are the child, so close the read side of the pipe.
close($r);
# Fork a grandchild, exit the child.
if (fork()) {
exit;
}
# Turn off leaking of the file descriptor in the grandchild so
# that no other process can write to the open file descriptor
# that would prematurely exit the parent.
$flags = fcntl($w, F_GETFD, 0) or warn $!;
fcntl($w, F_SETFD, $flags | FD_CLOEXEC) or die "Can't set flags: $!\n";
}
grandchild_wait_exit();
sleep 1;
print getppid() . "\n";
print "$$: gc\n";
sleep 30;
exit;

pgrep -P, but for grandchildren not just children

I am using:
pgrep -P $$
to get the child pids of $$. But I actually want a list of grandchildren and great grandchild too.
How do I do this tho? With a regular programming language we would do that with recursion for example, but with bash? Perhaps use a bash function?
I've already posted an attempted solution. It's short and effective, and seems in line with the OP's question, so I'll leave it as it is. However, it has some performance and portability problems that mean it's not a good general solution. This code attempts to fix the problems:
top_pid=$1
# Make a list of all process pids and their parent pids
ps_output=$(ps -e -o pid= -o ppid=)
# Populate a sparse array mapping pids to (string) lists of child pids
children_of=()
while read -r pid ppid ; do
[[ -n $pid && pid -ne ppid ]] && children_of[ppid]+=" $pid"
done <<< "$ps_output"
# Add children to the list of pids until all descendants are found
pids=( "$top_pid" )
unproc_idx=0 # Index of first process whose children have not been added
while (( ${#pids[#]} > unproc_idx )) ; do
pid=${pids[unproc_idx++]} # Get first unprocessed, and advance
pids+=( ${children_of[pid]-} ) # Add child pids (ignore ShellCheck)
done
# Do something with the list of pids (here, just print them)
printf '%s\n' "${pids[#]}"
The basic approach of using a breadth-first search to build up the tree has been retained, but the essential information about processes is obtained with a single (POSIX-compliant) run of ps. pgrep is no longer used because it is not in POSIX and it could be run many times. Also, a very inefficient way of removing items from the queue (copy all but one element of it) has been replaced with manipulation of an index variable.
Average (real) run time is 0.050s when run on pid 0 on my oldish Linux system with around 400 processes.
I've only tested it on Linux, but it only uses Bash 3 features and POSIX-compliant features of ps so it should work on other systems too.
Using nothing but bash builtins (not even ps or pgrep!):
#!/usr/bin/env bash
collect_children() {
# format of /proc/[pid]/stat file; group 1 is PID, group 2 is its parent
stat_re='^([[:digit:]]+) [(].*[)] [[:alpha:]] ([[:digit:]]+) '
# read process tree into a bash array
declare -g children=( ) # map each PID to a string listing its children
for f in /proc/[[:digit:]]*/stat; do # forcing initial digit skips /proc/net/stat
read -r line <"$f" && [[ $line =~ $stat_re ]] || continue
children[${BASH_REMATCH[2]}]+="${BASH_REMATCH[1]} "
done
}
# run a fresh collection, then walk the tree
all_children_of() { collect_children; _all_children_of "$#"; }
_all_children_of() {
local -a immediate_children
local child
read -r -a immediate_children <<<"${children[$1]}"
for child in "${immediate_children[#]}"; do
echo "$child"
_all_children_of "$child"
done
}
all_children_of "$#"
On my local system, time all_children_of 1 >/dev/null (invoking the function in an already-running shell) clocks in the neighborhood of 0.018s -- typically, 0.013s for the collect_children stage (the one-time action of reading the process tree), and 0.05s for the recursive walk of that tree triggered by the initial call of _all_children_of.
Prior timings were testing only the time needed for the walk, discarding the time needed for the scan.
The code below will print the PIDs of the current process and all its descendants. It uses a Bash array as a queue to implement a breadth-first search of the process tree.
unprocessed_pids=( $$ )
while (( ${#unprocessed_pids[#]} > 0 )) ; do
pid=${unprocessed_pids[0]} # Get first elem.
echo "$pid"
unprocessed_pids=( "${unprocessed_pids[#]:1}" ) # Remove first elem.
unprocessed_pids+=( $(pgrep -P $pid) ) # Add child pids
done
Probably a simple loop would do it:
# set a value for pid here
printf 'Children of %s:\n' $pid
for child in $(pgrep -P $pid); do
printf 'Children of %s:\n' $child
pgrep -P $child
done
If pgrep doesn't do what you want, you can always use ps directly. Options will be somewhat platform-dependent.
ps -o ppid,pid |
awk -v pid=$$ 'BEGIN { parent[pid] = 1 } # collect interesting parents
{ child[$2] = $1 } # collect parents of all processes
$1 == pid { parent[$2] = 1 }
END { for (p in child)
if (parent[child[p]])
print p }'
The variable names are not orthogonal -- parent collects the processes which are pid or one of its children as keys, i.e. the "interesting" parents, and child contains the parent of each process, with the process as the key and the parent as the value.
I ended up doing this with node.js and bash:
const async = require('async');
const cp = require('child_process');
export const getChildPids = (pid: number, cb: EVCb<Array<string>>) => {
const pidList: Array<string> = [];
const getMoreData = (pid: string, cb: EVCb<null>) => {
const k = cp.spawn('bash');
const cmd = `pgrep -P ${pid}`;
k.stderr.pipe(process.stderr);
k.stdin.end(cmd);
let stdout = '';
k.stdout.on('data', d => {
stdout += String(d || '').trim();
});
k.once('exit', code => {
if (code > 0) {
log.warning('The following command exited with non-zero code:', code, cmd);
}
const list = String(stdout).split(/\s+/).map(v => String(v || '').trim()).filter(Boolean);
if (list.length < 1) {
return cb(null);
}
for (let v of list) {
pidList.push(v);
}
async.eachLimit(list, 3, getMoreData, cb);
});
};
getMoreData(String(pid), err => {
cb(err, pidList);
});
};

How to do parallel programming in perl on Windows?

I have used the following pattern of code in perl on a Unix system, but it crashes on Windows. How can I achieve the same thing using either forking or threads on Windows using perl?
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
DATA_LOOP:
foreach my $data (#all_data) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next DATA_LOOP;
# ... do some work with $data in the child process ...
$pm->finish; # Terminates the child process
}
Here is one example using fork:
#!/usr/bin/perl -w
use strict;
foreach my $data (#all_data) {
my $pid;
next if $pid = fork; # Parent goes to next server.
die "fork failed: $!" unless defined $pid;
# From here on, we're in the child. Do whatever the
# child has to do... The server we want to deal
# with is in $data.
exit; # Ends the child process.
}
# The following waits until all child processes have
# finished, before allowing the parent to die.
1 while (wait() != -1);
print "All done!\n";

Perl trapping Ctrl-C (sigint) in bash

I'm reading How do we capture CTRL ^ C - Perl Monks, but I cannot seem to get the right info to help with my problem.
The thing is - I have an infinite loop, and 'multiline' printout to terminal (I'm aware I'll be told to use ncurses instead - but for short scripts, I'm more comfortable writing a bunch of printfs). I'd like to trap Ctrl-C in such a way, that the script will terminate only after this multiline printout has finished.
The script is (Ubuntu Linux 11.04):
#!/usr/bin/env perl
use strict;
use warnings;
use Time::HiRes;
binmode(STDIN); # just in case
binmode(STDOUT); # just in case
# to properly capture Ctrl-C - so we have all lines printed out
# unfortunately, none of this works:
my $toexit = 0;
$SIG{'INT'} = sub {print "EEEEE"; $toexit=1; };
#~ $SIG{INT} = sub {print "EEEEE"; $toexit=1; };
#~ sub REAPER { # http://www.perlmonks.org/?node_id=436492
#~ my $waitedpid = wait;
#~ # loathe sysV: it makes us not only reinstate
#~ # the handler, but place it after the wait
#~ $SIG{CHLD} = \&REAPER;
#~ print "OOOOO";
#~ }
#~ $SIG{CHLD} = \&REAPER;
#~ $SIG{'INT'} = 'IGNORE';
# main
# http://stackoverflow.com/questions/14118/how-can-i-test-stdin-without-blocking-in-perl
use IO::Select;
my $fsin = IO::Select->new();
$fsin->add(\*STDIN);
my ($cnt, $string);
$cnt=0;
$string = "";
while (1) {
$string = ""; # also, re-initialize
if ($fsin->can_read(0)) { # 0 timeout
$string = <STDIN>;
}
$cnt += length($string);
printf "cnt: %10d\n", $cnt;
printf "cntA: %10d\n", $cnt+1;
printf "cntB: %10d\n", $cnt+2;
print "\033[3A"; # in bash - go three lines up
print "\033[1;35m"; # in bash - add some color
if ($toexit) { die "Exiting\n" ; } ;
}
Now, if I run this, and I press Ctrl-C, I either get something like this (note, the _ indicates position of terminal cursor after script has terminated):
MYPROMPT$ ./test.pl
cnEEEEEcnt: 0
MYPROMPT$ _
cntB: 2
Exiting
or:
MYPROMPT$ ./test.pl
cncnt: 0
MYPROMPT$ _
cntB: 2
Exiting
... however, I'd like to get:
MYPROMPT$ ./test.pl
cncnt: 0
cntA: 1
cntB: 2
Exiting
MYPROMPT$ _
Obviously, handlers are running - but not quite in the timing (or order) I expect them to. Can someone clarify how do I fix this, so I get the output I want?
Many thanks in advance for any answers,
Cheers!
Hmmm... seems solution was easier than I thought :) Basically, the check for "trapped exit" should run after the lines are printed - but before the characters for "go three lines up" are printed; that is, that section should be:
printf "cnt: %10d\n", $cnt;
printf "cntA: %10d\n", $cnt+1;
printf "cntB: %10d\n", $cnt+2;
if ($toexit) { die "Exiting\n" ; } ;
print "\033[3A"; # in bash - go three lines up
print "\033[1;35m"; # in bash - add some color
... and then the output upon Ctrl-C seems to be like:
MYPROMPT$ ./test.pl
cnt: 0
^CcntA: 1
cntB: 2
Exiting
MYPROMPT$ _
Well, hope this may help someone,
Cheers!

Resources