bash how to wait for completion of forked processes that run in background - bash

I wonder if i could achieve something like the following logic:
given a set of jobs to be done fold_num and a limit number of worker processes, say work_num, i hope to run work_num processes in parallel until all jobs fold_num are done. Finally, there is some other processing on the results of all these jobs. We can assume fold_num is always several times of work_num.
I haven't got the following snippet working so far, with tips from How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?
#!/bin/bash
worker_num=5
fold_num=10
pids=""
result=0
for fold in $(seq 0 $(( $fold_num-1 ))); do
pids_idx=$(( $fold % ${worker_num} ))
echo "pids_idx=${pids_idx}, pids[${pids_idx}]=${pids[${pids_idx}]}"
wait ${pids[$pids_idx]} || let "result=1"
if [ "$result" == "1" ]; then
echo "some job is abnormal, aborting"
exit
fi
cmd="echo fold$fold" # use echo as an example, real command can be time-consuming to run
$cmd &
pids[${pids_idx}]="$!"
echo "pids=${pids[*]}"
done
# when the for-loop completes, do something else...
The output looks like:
pids_idx=0, pids[0]=
pids=5846
pids_idx=1, pids[1]=
fold0
pids=5846 5847
fold1
pids_idx=2, pids[2]=
pids=5846 5847 5848
fold2
pids_idx=3, pids[3]=
pids=5846 5847 5848 5849
fold3
pids_idx=4, pids[4]=
pids=5846 5847 5848 5849 5850
pids_idx=0, pids[0]=5846
fold4
./test_wait.sh: line 12: wait: pid 5846 is not a child of this shell
some job is abnormal, aborting
Question:
1. Seems the pids array has recorded correct process IDs, but failed to be 'wait' for. Any ideas how to fix this?
2. Do we need to use wait after the for-loop? if so, what to do after the for-loop?

alright, I guess I got a working solution with tips from folks on 'parallel'.
export worker_names=("foo", "bar")
export worker_num=${#worker_names[#]}
function some_computation {
fold=$1
cmd="..." #involves worker_names and fold
echo $cmd; $cmd
}
export -f some_computation # important, to make this function visible to subprocesses
for fold in $(seq 0 $(( $fold_num-1 ))); do
sem -j $worker_num some_computation $fold
done
sem --wait # wait for all jobs to complete
# do something below
Couple of things here:
I haven't got parallel working because of the post-computation processing i need to do after those parallel jobs. The parallel version i tried failed to wait for job completion. So i used GNU sem which stands for semaphore.
exporting variables is crucial for the computation function to access to in this situation. Otherwise those global variables are invisible.
exporting the computation function is also necessary for the same reason. Notice the -f option.
sem --wait perfectly fulfills the needs to wait for parallel jobs.
HTH.

Related

Bash script - check how many times public IP changes

I am trying to create my first bash script. The goal of this script is to check at what rate my public IP changes. It is a fairly straight forward script. First it checks if the new address is different from the old one. If so then it should update the old one to the new one and print out the date along with the new IP address.
At this point I have created a simple script in order to accomplish this. But I have two main problems.
First the script keeps on printing out the IP even tough it hasn't changed and I have updated the PREV_IP with the CUR_IP.
My second problem is that I want the output to direct to a file instead of outputting it into the terminal.
The interval is currently set to 1 second for test purposes. This will change to a higher interval in the final product.
#!/bin/bash
while true
PREV_IP=00
do
CUR_IP=$(curl https://ipinfo.io/ip)
if [ $PREV_IP != "$CUR_IP" ]; then
PREV_IP=$CUR_IP
"$(date)"
echo "$CUR_IP"
sleep 1
fi
done
I also get a really weird output. I have edited my public IP to xx.xxx.xxx.xxx:
Sat 20 Mar 09:45:29 CET 2021
xx.xxx.xxx.xxx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
while true
PREV_IP=00
do
is the reason you are seeing ip each loop. It's the same as while true; PREV_IP=00; do. The exit status of true; PREV_IP=00 is the exit status of last command - the exit status of assignment is 0 (success) - so the loop will always execute. But PREV_IP will be reset to 00 each loop... This is a typo and you meant to set prev_ip once, before the loop starts.
"$(date)"
will try execute the output of date command, as a next command. So it will print:
$ "$(date)"
bash: sob, 20 mar 2021, 10:57:02 CET: command not found
And finally, to silence curl, read man curl first and then find out about -s. I use -sS so errors are also visible.
Do not use uppercase variables in your scripts. Prefer lower case variables. Check you scripts with http://shellcheck.net . Quote variable expansions.
I would sleep each loop. Your script could look like this:
#!/bin/bash
prev=""
while true; do
cur=$(curl -sS https://ipinfo.io/ip)
if [ "$prev" != "$cur" ]; then
prev="$cur"
echo "$(date) $cur"
fi
sleep 1
done
that I want the output to direct to a file instead of outputting it into the terminal.
Then research how redirection works in shell and how to use it. The simplest would be to redirect echo output.
echo "$(date) $cur" >> "a_file.txt"
The interval is currently set to 1 second for test purposes. This will change to a higher interval in the final product.
You are still limited with the time it takes to connect to https://ipinfo.io/ip. And from ipinfo.io documentation:
Free usage of our API is limited to 50,000 API requests per month.
And finally, I wrote a script where I tried to use many public services as I found ,get_ip_external for getting external ip address. You may take multiple public services for getting ipv4 address and choose a random/round-robin one so that rate-limiting don't kick that fast.

Can we write logics in Terraform Code (IaC) anything like using count.index[ ]?

While, I was working with Terraform I had a question; I will be able to destroy some specific resources using terraform destroy --target [] --target [] or terraform state -rm ; this is okay if we have 50 servers but what if I have 1000 servers and would like to terminate such as odd number instances or even number instances using the array numbers in the list or could we write a script to gather all the corrupted instances and execute that script to terminate all those instances and make that script reusable!!
Is there any way for this, I have searched all over the internet but couldn't find any solution; may be this question is dumb but I was just curious!!!!
Is there any documentation that explains this is would not be possible through terraform!!!!
You could expose the count as an output:
output "server_count" {
value = var.server_count
}
and write a script (shell/Python/etc) that takes that count as an argument and uses it to taint every odd resource:
#!/bin/bash
# usage: taint_odd_servers.sh <num servers>
SERVER_COUNT=$1
i=0
while [ $i -lt $SERVER_COUNT ]
do
REMAINDER=$(( $i % 2 ))
if [ $REMAINDER -ne 0 ]
then
terraform taint "your_server_resource[${i}]"
fi
i=$(($i+1))
done
You could then call that script like:
taint_odd_servers.sh $(terraform output server_count)

SLURM batch array loop?

I'm somewhat bash challenged and trying to send a large job array through slurm on my institution's cluster. I am way over my limit (which appears to be 1000 jobs per job array) and am having to iteratively parse out the list into blocks of 1000, which is tedious:
sbatch --array=17001-18000 -p <server-name> --time=12:00:00 <my-bash-script>
How might I write a loop to do this? Each job takes about 11 minutes, so I would need to build in a pause in the loop. Otherwise, I suspect SLURM will reject the new batch job. Anyone out there know what to do? Thanks in advance!
Something like this should do what you want
START=1
END=10000
STEP=1000
SLEEP=700 #Just over 11 Minutes (in seconds)
for i in $(seq $START $STEP $END) ; do
JSTART=$i
JEND=$[ $JSTART + $STEP - 1 ]
echo "Submitting with ${JSTART} and ${JEND}"
sbatch --array=${JSTART}-${JEND} -p <server-name> --time=12:00:00 <my-bash-script>
sleep $SLEEP
done

How to break shell script if a script it calls produces an error

I'm currently debugging a shell script, which acts as a master-script in a data pipeline. In order to run the pipeline, you feed a bunch of arguments into the shell script. From there, the shell script sequentially calls 6 different scripts [4 in R, 2 in Python], writes out stuff to log files, and so on. Basically, my idea is to use this script to automate a data pipeline that takes a long time to run.
Right now, if any of the individual R or Python scripts break within the shell script, it just jumps to the next script that it's supposed to call. However, running script 03.py requires the data input to scripts 01.R and 02.R to be fully run and processed, otherwise 03 will produce erroneous output data which will then be written out and further processed in later scripts.
What I want to do is,
1. Break the overall shell script if there's an error in any of the R scripts
2. Output a message telling me where this error happened [line of individual R / python script]
Here's a sample of the master.sh shell script which calls the individual scripts.
#############
# STEP 2 : RUNNING SCRIPTS
#############
# A - 01.R
#################################################################
# log_file - this needs to be reassigned for every individual script
log_file=01.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 01. Log file output being written to $log_file_dir$log_file."
Rscript 01.R -f $input_file -s $sql_db > $log_file_dir$log_file
# current time/date
current_time=$(date)
echo "Current time: $current_time"
# B - 02.R
#################################################################
log_file=02.log
current_time=$(date)
echo "Current time: $current_time"
echo "Now running script 02. Log file output being written to $log_file_dir$log_file"
Rscript 02.R -f $input_file -s $sql_db > $log_file_dir$log_file
# PRINT OUT TIMINGS
current_time=$(date)
echo "Current time: $current_time"
This sequence is repeated throughout the master.sh script until script 06.R, after which it collates some data retrieved from output files and log files, and prints them to stout.
Here's some sample output that gets printed by my current master.sh, which shows how the script just keeps moving even though 01.R has produced an error.
file: test-data/minisample.txt
There are a total of 101 elements in file.
Using the main database.
Writing log-files to this directory: log_files/minisample/.
Writing output-csv with classifications to output/minisample.csv.
Current time: Wed Nov 14 18:19:53 UTC 2018
Now running script 01. Log file output being written to log_files/minisample/01.log.
Loading required package: stringi
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: RMySQL
Loading required package: DBI
Loading required package: methods
Loading required package: hms
Error: The following 2 arguments need to be provided:
-f <input file>.csv
-s <MySQL db name>
Execution halted
Current time: Wed Nov 14 18:19:54 UTC 2018
./master.sh: line 95: -1: substring expression < 0
./master.sh: line 100: -1: substring expression < 0
./master.sh: line 104: -1: substring expression < 0
Total time taken to run script 01.R:
Average time taken per user to run script 01.R:
Total time taken to run pipeline so far [01/06]:
Average time taken per user to run pipeline so far [01/06]:
Current time: Wed Nov 14 18:19:54 UTC 2018
Now running script 02. Log file output being written to log_files/minisample/02.log
Seeing as the R script 01.R produces an error, I want the script master.sh to stop. But how?
Any help would be greatly appreciated, thanks in advance!
As another user mentioned, simply running set -e will make your script terminate on first error. However, if you want more control, you can also check the exit status with ${?} or simply $? assuming your program gives an exit code of 0 on success, and non-zero otherwise.
#!/bin/bash
url=https://nosuchaddress1234.com/nosuchpage.html
error_file=errorFile.txt
wget ${url} 2> ${error_file}
exit_status=${?}
if [ ${exit_status} -ne 0 ]; then
echo -n "wget ${url} "
if [ ${exit_status} -eq 4 ]; then
echo "- Network failure."
elif [ ${exit_status} -eq 8 ]; then
echo "- Server issued an error response."
else
echo "- Other error"
fi
echo "See ${error_file} for more details"
exit ${exit_status};
fi
I like to put some boilerplate at the top of most scripts like this -
trap 'echo >&2 "ERROR in $0 at line $LINENO, Aborting"; exit $LINENO;' ERR
set -u
While coding at debugging, I usually add
set -x
And a lot of trace "comments" with colons -
: this will parse its args but only show under set -x
Then the trick is to make sure any errors you know are ok are handled.
Conditionals consume the errors, so those are safe.
if grep foo nonexistantfile
then : do the success stuff
else : if you *want* a failout here, just call false
false here will abort # args don't matter :)
fi
By the same token, if you just want to catch and ignore a known possible error -
ls $mightNotExist ||: # || says "do on fail"; : is an alias for "true"
Just always check your likely errors. Then the only thing that will crash your script is a fail.

Batch files processing in bash with full processor occupancy

Maybe really simple question, but I don't know where to dig.
I have a list of files (random names), and I want to process them using some command
processing_command $i ${i%.*}.txt
I want to speed up by using all processors. How to make such the script occupy the 10 processors simultaneously (by processing 10 files)? processing_command is not parallel by default. Thank you!
the trivial approach would be to use:
for i in $items
do
processing_command $i ${i%.*}.txt &
done
which will start a new (parallel instance of) processing_command for each $i (the trick is the trailing & which will background the process)
the drawback is, that if you have e.g. 1000 items, then this will start 1000 parallel processes, which (while occupying all 10 cores) will be busy doing context switching rather than doing the actual processing.
if you have as many (or less) items as cores, than this is a good and simple solution.
usually you don't want to start more processes than cores.
a simplistic approach (assuming that all items take about the same time when processing), is to split the the original "items" list into number_of_cores equally long lists. the following is slightly modified version of an example taken from an article in the german linux-magazin:
#!/bin/bash
## number of processors
PMAX=$(ls -1d /sys/devices/system/cpu/cpu[0-9]* | wc -l)
## call processing_command on each argument:
doSequential() {
local i
for i in "$#"; do
processing_command $i ${i%.*}.txt
done
}
## run PMAX parallel processes
doParallel() {
# split the arguments into PMAX equally sized lists
local items item currentProcess=0
for item in "$#"; do
items[$currentProcess]="${items[$currentProcess]} "$item""
shift
let currentProcess=$(( (currentProcess+1)%PMAX ))
done
# run PMAX processes, each with the shorter list of items
currentProcess=0
while [ $currentProcess -lt $PMAX ]; do
[ -n "${items[$currentProcess]}" ] &&
eval doSequential ${items[$currentProcess]} &
currentProcess=$((currentProcess+1))
done
wait
}
doParallel $ITEMS

Resources