Why is bash breaking MPI job control loop

Why is bash breaking MPI job control loop - bash

I'm attempting to use a simple bash script to sequentially run a batch of MPI jobs. This script works perfectly when running serial code (I am using Fortran 90), but for some reason bash breaks out of the loop when I attempt to execute MPI code.
I already found a work-around to the problem. I just wrote essentially the exact same script in Perl and it worked like a charm. I just really want to understand the issue here because I prefer the simplicity of bash and it perfectly fits my own scripting needs in almost all other cases.
I've tried running the MPI code as a background process and using wait with the same result. If I run the jobs in the background without using wait, bash does not break out of the loop, but it stacks up jobs until eventually crashing. The goal is to run the executable sequentially for each parameter set anyway, I just wanted to note that the loop is not broken in that case.
Bash Script, interp.sh: Usage --> $ ./interp.sh inputfile
#!/bin/bash
PROG=$1
IFILE=$2
kount=0 # Counter variable for looping through input file
sys=0 # Counter variable to store how many times model has been run
while IFS="\n" read -r line
do
kount=$(( $kount + 1 ))
if [ $(( kount % 2 )) -eq 1 ] # if kount is even, then expect headers
then
unset name defs
sys=$(( $sys + 1 ))
name=( $line ) # parse headers
defs=${#name[*]}
k=$(( $defs - 1 ))
else # if count is odd, then expect numbers
unset vals
vals=( $line ) # parse parameters
for i in $( seq 0 $k )
do
# Define variables using header names and set their values
printf -v "${name[i]}" "${vals[i]}"
done
# Print input variable values
echo $a $b $c $d $e $nPROC
# Run executable
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG
fi
done < $IFILE
Input file, input.dat:
a b c d e nPROC
1 2 3 4 5 2
nPROC
3
nPROC
4
nPROC
5
nPROC
6
nPROC
7
nPROC
8
Sample MPI f90 code, main.f90:
program main
use mpi
implicit none
integer :: i, ierr, myID, nPROC
integer, parameter :: foolen = 100000
double precision, dimension(0:foolen) :: foo
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nPROC, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myID, ierr)
if ( myID .eq. 0 ) then
do i=0,foolen
foo(i) = i
end do
else
do i=0,foolen
foo(i) = i
end do
end if
call MPI_FINALIZE(ierr)
end program
Sample makefile:
COMP=mpif90
EXT=f90
CFLAGs=-Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=all \
-fbacktrace
MPIflags=--oversubscribe --hostfile my_hostfile
PROG=main.x
INPUT=input.dat
OUTPUT=output
OBJS=main.o
$(PROG): $(OBJS)
$(COMP) $(CFLAGS) -o $(PROG) $(OBJS) $(LFLAGS)
main.o: main.f90
$(COMP) -c $(CFLAGs) main.f90
%.o: %.f90
$(COMP) -c $(CFLAGs) $<
run:
make && make clean
./interp.sh $(PROG) $(INPUT)
clean:
rm -f *.o DONE watch
my_hostfile
localhost slots=4
Note that if the mpiexec line is commented out, the script runs as expected. The output looks like this:
1 2 3 4 5 2
1 2 3 4 5 3
1 2 3 4 5 4
1 2 3 4 5 5
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
These are the parameter values which are supposed to be passed to the MPI code in each loop. However, when mpiexec is called in the script, only the first set of parameters is read and passed.
I apologize if all that is a bit excessive, I just wanted to provide all that is needed for testing. Any help solving the issue in bash or explanation of why this happens would be greatly appreciated!

mpiexec is consuming the stdin thus reading all remaining lines in the loop. So after the first loop stdin is empty and the loop breaks.
This is an issue that occurs not only with loops calling mpiexec from whithin but also with loops other commands that consumes stdin by default such as ssh.
The general solution is to use < /dev/null so that the offending command won't consume stdin but the /dev/null instead. Some commands have special flags to replace the redirect command such as ssh -n
so the solution in this case would be to add the redirect at the end of the line where mpiexec is called:
mpiexec -np $nPROC --oversubscribe --hostfile my_hostfile $PROG < /dev/null
there are some issues to pay attention to in the case of mpiexec related to Standard I/O detailed here: https://www.open-mpi.org/doc/v3.0/man1/mpiexec.1.php#toc14

Related

Pause ‘for’ after every 5 loops

I’ve got this bash script to download 52k files:
for i in {1..52000};
do wget -c "download.hebrewbooks.org/downloadhandler.ashx?req=$i" ;
done
However the server gives me an 429 error.
How can I pause the loop for X amount of time after every 5 files that are downloaded?

If i is a multiple of five, sleep.
for i in {1..52000}; do
wget -c "download.hebrewbooks.org/downloadhandler.ashx?req=$i"
((i % 5)) || sleep $X
done
Note that ((expr)) returns the Boolean value of expr, where false=0 and true=1, which is the opposite of normal Bash return codes. That's why you have to use OR || instead of AND &&. If that's too confusing, use this instead: ((i % 5 == 0)) && ...

Makefile: Running a specific target in parallel

I am learning Makefile and trying to implement parallelism. I am aware of the "-j" option. However, for example having the following makefile (on Windows)-
all: a b c d
a:
# some build rule
b:
# some build rule with parallelism
c:
# some build rule
d:
#some build rule
I am trying to run make all with only target "b" running in parallel. Passing the -j option with the build rule for "b" doesn't work. Any pointers?

You could get b's recipe to run in the background as so:
all: a b c d
#echo running $#
.PHONY: a b c d all
a c d: | b
#echo -n _$#0 && \
sleep 1 && echo -n _$#1 && \
sleep 1 && echo _$#2
b:
#(echo -n _$#0 && \
sleep 2 && echo -n _$#1 && \
sleep 2 && echo -n _$#2\
) &
Which outputs:
_b0_a0_a1_b1_a2
_c0_c1_b2_c2
_d0_d1_d2
running all
The order-only dependency on b makes b run first, otherwise it wouldn't start until after a completes with -j1... It does of course mean that you have to build b if you build either a c or d.
Alternatively, (and I'm not recommending this) you could use some manual locking mechanism such as flock to prevent a, c, and d from running in parallel (note that the flock only protects a single shell, so you would have to collapse your recipes into a single line protected by flock for this to work).

Execute a code for 5 seconds and then stop it

The goal of code is that, I want to make a random tcp traffic using iperf and capture it over 5, 10, 15, 20 seconds using tcpdump. In addition, capturing the throughput is also important for me. My problem is that, I would like to execute code1, code2, code3 and code4 for 5, 10, 15 and 20 seconds in bash. However I don't know how to put the mentioned condition for it. Here is my code:
for Test_duration in 5 10 15 20
do
echo “Test performing with $Test_duration duration”
sudo tcpdump -G 10 -W 2 -w /tmp/scripttest_$Test_duration -i h1-eth0 &
while true; do
#code1
time2=$(($RANDOM%20+1))&
pksize2=$(($RANDOM%1000+200))&
iperf -c 10.0.0.2 -t $time2 -r -l $pksize2 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host2_$Test_duration.txt &\
#code2
time3=$(($RANDOM%20+1))&
pksize3=$(($RANDOM%1000+200))&
iperf -c 10.0.0.3 -t $time3 -r -l $pksize3 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host3_$Test_duration.txt &\
#code3
time4=$(($RANDOM%20+1))&
pksize4=$(($RANDOM%1000+200))&
iperf -c 10.0.0.4 -t $time4 -r -l $pksize4 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host4_$Test_duration.txt &\
#code4
time5=$(($RANDOM%20+1))&
pksize5=$(($RANDOM%1000+200))&
iperf -c 10.0.0.5 -t $time5 -r -l $pksize5 >> /media/sf_sharedsaeed/throughtput/iperthroughput_host5_$Test_duration.txt &\
done
done
Another constraint is that, code1, code2, code3 and code4 should be executed at the same time so, I used &.
Please help me what should I replace instead of while true; to have periodic execution of codes. Can any body help me?

You could do that by using a background subshell that creates a simple file lock on expiration that you detect from your while loops. Here is an example based on a simplified version of your code:
for Test_duration in 5 10 15 20
do
# TIMEOUT_LOCK will be your file lock
rm -f TIMEOUT_LOCK
# next command will run in a parallel subshell at the background
(sleep $Test_duration; touch TIMEOUT_LOCK) &
echo “Test performing with $Test_duration duration”
while true; do
# check whether current timeout (5 or 10 or 15 ...) has occured
if [ -f TIMEOUT_LOCK ]; then rm -f TIMEOUT_LOCK; break; fi
# do your stuff here - I'm just outputing dots and sleeping
echo -n "."
sleep 1
done
echo ""
done
The output of this code is:
“Test performing with 5 duration”
.....
“Test performing with 10 duration”
..........
“Test performing with 15 duration”
...............
“Test performing with 20 duration”
....................

Do calculation in the Makefile

I got confused with Makefile. I am trying to run a simple command in the Makefile but it gives me the error "/bin/bash: line 3: :=: command not found". I am using shell to run this makefile
This is my part of my Makefile:
all:
vlog Benchmarks/$(NAME)/Syn/*.v
$(eval tux_number := 1)
$(eval range := 1)
$(eval ssh_log := 255)
echo "Start Range: ${range}"
echo "tux-number: ${tux_number}"
while [[ $$range -le 50 ]] ; do \
ssh -l yazdanbakhsh tux-$(tux_number).cae.wisc.edu exit ; \
echo "range: ${range}" ; \
eval $$range := $$((${range}+1)) ; \
done
Thanks

all:
#range=1; \
while [ $$range -le 10 ] ; \
do echo Range: $$range; \
let range=range+1 ; \
done;
Note that the whitespace in front of #range... is the only TAB.

Just to fix your obvious problems with Makefile syntax, here is an attempt at refactoring your attempt into valid code.
tux_number := 1
ssh_log := 255 # not used anywhere
all:
vlog Benchmarks/$(NAME)/Syn/*.v
echo "Start Range: 1" # This is probably no longer very useful output
echo "tux-number: ${tux_number}"
range=1; while [ $$range -le 50 ] ; do \
ssh -l yazdanbakhsh tux-$(tux_number).cae.wisc.edu exit ; \
echo "range: $$range" ; \
range=$$(expr "$$range + 1); \
done
Notice how tux_number and ssh_log are Makefile variables, while range only exists in the shell which executes the while loop. I have avoided the Bashisms in order to make this portable. (If portability is not important, you might want to refactor it back to Bash syntax and use for ((range=1; range<=50; range++)); do... instead.)
Your use of eval is misguided. As you can see, I simply lifted out the Makefile variables outside the recipe where they don't belong. What you were doing was (1) have Make evaluate the expression range := 1 (which evaluates to itself) and (2) use the output as a shell command in a recipe. Since it's not a valid shell command, you got the syntax error from Bash. Without further ado, I'll just take the easy way out here and say that eval is a complex subject, and until you get more experience with Make, it's probably just best to forget that it exists.
In order to properly make use of Make's facilities, I would make this parallelizable, i.e. split it up into 50 individual targets. This is a bit clumsy (there's probably a better way to define range here), but at least it should illustrate a number of differences to your approach. (If you don't insist on having range count up from 1, making it zero-based would make this a little less clumsy. This exploits the fact that the empty string is harmless in a shell snippet, so we can use it instead of a zero prefix. Again, this could be simplifed if you don't care about the human readability of the range index.)
digits := 0 1 2 3 4 5 6 7 8 9
deca := "" 1 2 3 4
range := $(filter-out ""0,$(foreach d,$(deca),$(foreach i,$(digits),$d$i))) 50
# Or, at the expense of an external process,
# range := $(shell perl -le 'print $$_ for 1..50')
.PHONY: all
all: $(patsubst %,ssh-%,$(range))
.PHONY: ssh-%
ssh-%:
ssh -l yazdanbakhsh tux-$(tux_number).cae.wisc.edu exit
echo "range: $*"
This can be run with something like make -j 5 to execute these in parallel batches of five, for example.
Incidentally, the commented-out $(shell ...) call might be the actual answer to your question, if what you really wanted to do was to use Make to drive an external program to calculate something for you.

Command arguments being interpreted as command file name

I'm trying to create a bash script that builds up a command to execute (that includes arguments). The name of the command executable is ms (which lives in the ms directory) and it takes a bunch of parameters that I compute and store in a string. When it comes time to execute the command, I try:
GENETREES=$(../ms/${x:1})
but am getting the error message:
./simulate.sh: line 21: ../ms/ms 6 1 -T -I 6 1 1 1 1 1 1 -ej 0.059851352500000010170566611122922040522098541259765625 1 4 -es 0.059851352500000010170566611122922040522098541259765625 4 0.3457841801761454281205487859551794826984405517578125 -ej 0.059851352500000010170566611122922040522098541259765625 2 3 -es 0.059851352500000010170566611122922040522098541259765625 3 0.54870128110803395582451003065216355025768280029296875 -ej 0.0897770262499999471828004971030168235301971435546875 7 9 -es 0.089777026250000002693951728360843844711780548095703125 3 0.8097582153199012200417428175569511950016021728515625 -ej 0.119702699999999995217336845598765648901462554931640625 3 4 -ej 0.125827642499999947656164067666395567357540130615234375 9 10 -es 0.1258276425000000031673152989242225885391235351562500 4 0.28069295861466903030390085405088029801845550537109375 -ej 0.13195258499999995560614252099185250699520111083984375 8 10 -ej 0.1817980399999999663318561715641408227384090423583984375 6 5 -ej 0.2525933399999999717788767839010688476264476776123046875 10 4 -ej 0.41145434999999996872332985731191001832485198974609375 4 5 : File name too long
I think bash thinks that I intended all those command parameters to be a part of the executable name. But this is not my intent.
What am I doing wrong?
Thanks.
Update - More info requested on how x is constructed
MSOUT=$(java -jar ./NetworkSearchGen.jar ms $3 $TRUENETWORK $4)
OIFS=$IFS
IFS='}'
first="true"
for x in $MSOUT
do
if [ $first = "true" ]; then
echo "$x"
else
GENETREES=$(../ms/${x:1})
fi
first="false"
done

It was setting IFS that screwed things up. Making IFS=" " just before executing got the script working again.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why is bash breaking MPI job control loop - bash

Related

Pause ‘for’ after every 5 loops

Makefile: Running a specific target in parallel

Execute a code for 5 seconds and then stop it

Do calculation in the Makefile

Command arguments being interpreted as command file name

Categories

Resources