Waiting ANY child process exit on macOS?

Waiting ANY child process exit on macOS? - bash

I am wondering how to wait for any process to finish in macOS, since wait -n doesn't work. I have a script doing several things, and in some point it will enter a loop calling another script to the background to exploit some parallelism, but not more than X times since it wouldn't be efficient. Thus, I need to wait for any child process to finish before creating new processes.
I have seen this question but it doesn't answer the "any" part, it just says how to wait to a specific process to finish.
I've thought of either storing all PIDs and actively checking if they're still running with ps, but it's very slapdash and resource consuming. I also thought about upgrading bash to a newer version (if that's ever possible in macOS without breaking how bash already works), but I would be very disappointed if there was no other way to actually wait for any process to finish, it's such a basic feature... Any ideas?
A basic version of my code would look like this:
for vid_file in $VID_FILES
do
my_script.sh $vid_file other_args &
((TOTAL_PROCESSES=TOTAL_PROCESSES+1))
if [ $TOTAL_PROCESSES -ge $MAX_PROCESS ]; then
wait -n
((TOTAL_PROCESSES=TOTAL_PROCESSES-1))
fi
done
My neither elegant nor performant approach to substitute the wait -n:
NUM_PROCC=$MAX_PROCESS
while [ $NUM_PROCC -ge $MAX_PROCESS ]
do
sleep 5
NUM_PROCC=$(ps | grep "my_script.sh"| wc -l | tr -d " \t")
# grep command will count as one so we need to remove it
((NUM_PROCC=NUM_PROCC-1))
done
PS: This question could be closed and merged with the one I mentioned above. I've just created this new one because stackoverflow wouldn't let me comment or ask...
PS2: I do understand that my objective could be achieved by other means. If you don't have an answer for the specific question itself but rather a workaround, please let other people answer the question about "waiting any" since it would be very useful for me/everyone in the future as well. I will of course welcome and be thankful for the workaround too!
Thank you in advance!

It seems like you just want to limit the number of processes that are running at the same time. Here's a rudimentary way to do it with bash <= 4.2:
#!/bin/bash
MAX_PROCESS=2
INPUT_PATH=/somewhere
for vid_file in "$INPUT_PATH"/*
do
while [[ "$(jobs -pr | wc -l)" -ge "$MAX_PROCESS" ]]; do sleep 1; done
my_script.sh "$vid_file" other_args &
done
wait
Here's the bash >= 4.3 version:
#!/bin/bash
MAX_PROCESS=2
INPUT_PATH=/somewhere
for vid_file in "$INPUT_PATH"/*
do
[[ "$(jobs -pr | wc -l)" -ge "$MAX_PROCESS" ]] && wait -n
my_script.sh "$vid_file" other_args &
done
wait

GNU make has parallelization capabilities and the following Makefile should work even with the very old make 3.81 that comes with macOS. Replace the 4 leading spaces before my_script.sh by a tab and store this in a file named Makefile:
.PHONY: all $(VID_FILES)
all: $(VID_FILES)
$(VID_FILES):
my_script.sh "$#" other_args
And then to run 8 jobs max in parallel:
$ make -j8 VID_FILES="$VID_FILES"
Make can do even better: avoid redoing things that have already been done:
TARGETS := $(patsubst %,.%.done,$(VID_FILES))
.PHONY: all clean
all: $(TARGETS)
$(TARGETS): .%.done: %
my_script.sh "$<" other_args
touch "$#"
clean:
rm -f $(TARGETS)
With this last version an empty tag file .foo.done is created for each processed video foo. If, later, you re-run make and video foo did not change, it will not be re-processed. Type make clean to delete all tag files. Do not forget to replace the leading spaces by a tab.

Suggestion 1: Completion indicator file
Suggesting to add task completion indication file to your my_script.sh
Like this:
echo "$0.$(date +%F_%T).done" >> my_script.sh
And in your deployment script test if the completion indicator file exist.
rm "my_script.sh.*.done"
my_script.sh "$vid_file" other_args &
while [[ ! -e "my_script.sh.*.done" ]]; do
sleep 5
done
Don't forget to clean up the completion indicator files.
Advantages for this approach:
Simple
Supported in all shells
Retain a history/audit trail on completion
Disadvantages for this approach:
Requires modification to original script my_script.sh
Requires cleanup.
Using loop
Suggestion 2: Using wait command with pgrep command
Suggesting to learn more about wait command here.
Suggesting to learn more about pgrep command here.
my_script.sh "$vid_file" other_args &
wait $(pgrep -f "my_script.sh $vid_file")
Advantages for this approach:
Simple
Readable
Disadvantages for this approach:
Multiple users using same command same time
wait command is specific to Linux bash maybe in other shells as well. Check your current support.

With GNU Parallel it would look something like:
parallel my_script.sh {} other_args ::: $VID_FILES

Related

Iterations of a bash script to run in parallel

I have a bash script that looks like below.
$TOOL is another script which runs 2 times with different inputs(VAR1 and VAR2).
#Iteration 1
${TOOL} -ip1 ${VAR1} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR1}
rename mods mode_c_ ${FINAL_MODE_DIR1}/*.xml
#Iteration 2
${TOOL} -ip1 ${VAR2} -ip2 ${FINAL_PML}/$1$2.txt -p ${IP} -output_format ${MODE} -o ${FINAL_MODE_DIR2}
rename mods mode_c_ ${FINAL_MODE_DIR2}/*.xml
Can I make these 2 iterations in parallel inside a bash script without submitting it in a queue?

If I read this right, what you want is to run them in background.
c.f. https://linuxize.com/post/how-to-run-linux-commands-in-background/
More importantly, if you are going to be writing scripts, PLEASE read the following closely:
https://www.gnu.org/software/bash/manual/html_node/index.html#SEC_Contents
https://mywiki.wooledge.org/BashFAQ/001

Trying to parse a file, check its size, then send report.. this fails. :(

Noob question i'm sure.. But Noob needs help!
I have a script that downloads a ham radio file, does some stuff with it, then sends me an email with its final work. The problem is, Sometime's it comes up short. I'm not sure where it is dropping some data, and quite frankly i'm not sure I care (though I probably should..). My solution was to run the script, check the output of the file, and if it's over 96k, email it. If not.. re-run the script.
It fails on the 'until' process even if the file is above the correct size.
While i'm sure it can be done in other languages, Bash is what i'm currently familiar enough with to try and make this work better. So a bash solution is what i'm looking for. I am also alright with any streamlining that could be done, though it's not intensive by any means to run this currently!
Here's what I have..
dt=`date '+%D %T.%6N'`
#
wget -O ~/file1 "https://www.radioid.net/static/users.csv"
egrep 'Washington,United States|Oregon,United States|Idaho,United States|Montana,United States|British Columbia' ~/file1 > ~/PNW1
awk 'BEGIN {FS=OFS=","} {sub(/ .*/, "", $3)} {gsub("Washington", "WA",$5}{gsub("Idaho", "ID",$5)} {gsub("Montana", "MT",$5)} {gsub("Oregon", "OR",$5)} {gsub("Brit$
sed "s/'/ /g" ~/PNW_Contact.txt > ~/PNW_Contacts.txt
rm ~/PNW_Contact.txt
rm ~/file1
rm ~/PNW1
sudo cp ~/PNW_Contacts.txt /var/www/html/PNW_Contacts.txt
until [[ $(find /home/kc7aad/PNW_Contacts.txt -type f -size +96000c 2>/dev/null) ]]; do
echo "$dt" - Failed >> ~/ids.log
sleep 10
done
echo "$dt" - Success >> ~/ids.log
mail -s "PNW DMR Contacts Update" kc7aad#gmail.com -A ~/PNW_Contacts.txt < /home/kc7aad/PNW_Message.txt
If I run this script manually, it does succeed. If I let Cron try to complete this, it fails.
I think that's all the detail that is needed. Please let m eknow if there are any questions!
Thanks.

how to execute part of statement first

Coming from Powershell and trying to figure out how to do an equivalent sort of operation in bash.
Can someone please clarify for me how to do this? I've been researching for a bit with no luck
While **ls /etc/test | grep "test.conf" -q** -ne 0 ; do
echo test
sleep 2
done
I am trying to make bolded section of the statement trigger first so that the resulting t/f can be evaluated.

What you really need is
until [ -f "/etc/test/test.conf" ] ; do
echo test
sleep 2
done

The direct solution to your problem is that the (negated) exit status of grep is the only thing the while loop needs:
while ! ls /etc/test | grep -q "test.conf"; do
There is also the until loop which handles the negation for you:
until ls /etc/test | grep -q "test.conf"; do
However, this is a fragile (ls breaks in the case where a file name contains a newline) and expensive (multiple extra processes are created). Instead, let the shell check directly if the desire file exists yet:
until [[ -f "/etc/test/test.conf" ]]; do
(and we arrive at #anishane's original answer).
An even better approach, though, is to ask the operating system to tell us when the file is created, without having to ask ever two seconds. How you do this is necessarily OS-specific; I'll use Linux as an example. For this, you would need to install the inotify-tools package first.
The simplest thing is to just use a call to inotifywait:
inotifywait -c create /etc/test/test.conf
But if the file already exists, you'll wait forever waiting for it to be created. You might think, "I'll check if it exists first, then wait if necessary", with
[[ -f /etc/test/test.conf ]] || inotifywait -c create /etc/test/test.conf
but now we have a race condition: it's possible that the file is created after you look for it, but before you start waiting. So the full solution is to first wait for it to be created in the background, then kill the background job if the file does exist, then wait for the background to complete if it is still running. It sounds a little convoluted, but it's efficient and it works.
inotifywait -c create /etc/test/test.conf & WAIT_PID=$!
[[ -f /etc/test/test.conf ]] && kill $WAIT_PID
wait $WAIT_PID

Is this a valid self-update approach for a bash script?

I'm working on a script that has gotten so complex I want to include an easy option to update it to the most recent version. This is my approach:
set -o errexit
SELF=$(basename $0)
UPDATE_BASE=http://something
runSelfUpdate() {
echo "Performing self-update..."
# Download new version
wget --quiet --output-document=$0.tmp $UPDATE_BASE/$SELF
# Copy over modes from old version
OCTAL_MODE=$(stat -c '%a' $0)
chmod $OCTAL_MODE $0.tmp
# Overwrite old file with new
mv $0.tmp $0
exit 0
}
The script seems to work as intended, but I'm wondering if there might be caveats with this kind of approach. I just have a hard time believing that a script can overwrite itself without any repercussions.
To be more clear, I'm wondering, if, maybe, bash would read and execute the script line-by-line and after the mv, the exit 0 could be something else from the new script. I think I remember Windows behaving like that with .bat files.
Update: My original snippet did not include set -o errexit. To my understanding, that should keep me safe from issues caused by wget.
Also, in this case, UPDATE_BASE points to a location under version control (to ease concerns).
Result: Based on the input from these answers, I constructed this revised approach:
runSelfUpdate() {
echo "Performing self-update..."
# Download new version
echo -n "Downloading latest version..."
if ! wget --quiet --output-document="$0.tmp" $UPDATE_BASE/$SELF ; then
echo "Failed: Error while trying to wget new version!"
echo "File requested: $UPDATE_BASE/$SELF"
exit 1
fi
echo "Done."
# Copy over modes from old version
OCTAL_MODE=$(stat -c '%a' $SELF)
if ! chmod $OCTAL_MODE "$0.tmp" ; then
echo "Failed: Error while trying to set mode on $0.tmp."
exit 1
fi
# Spawn update script
cat > updateScript.sh << EOF
#!/bin/bash
# Overwrite old file with new
if mv "$0.tmp" "$0"; then
echo "Done. Update complete."
rm \$0
else
echo "Failed!"
fi
EOF
echo -n "Inserting update process..."
exec /bin/bash updateScript.sh
}

(At least it doesn't try to continue running after updating itself!)
The thing that makes me nervous about your approach is that you're overwriting the current script (mv $0.tmp $0) as it's running. There are a number of reasons why this will probably work, but I wouldn't bet large amounts that it's guaranteed to work in all circumstances. I don't know of anything in POSIX or any other standard that specifies how the shell processes a file that it's executing as a script.
Here's what's probably going to happen:
You execute the script. The kernel sees the #!/bin/sh line (you didn't show it, but I presume it's there) and invokes /bin/sh with the name of your script as an argument. The shell then uses fopen(), or perhaps open() to open your script, reads from it, and starts interpreting its contents as shell commands.
For a sufficiently small script, the shell probably just reads the whole thing into memory, either explicitly or as part of the buffering done by normal file I/O. For a larger script, it might read it in chunks as it's executing. But either way, it probably only opens the file once, and keeps it open as long as it's executing.
If you remove or rename a file, the actual file is not necessarily immediately erased from disk. If there's another hard link to it, or if some process has it open, the file continues to exist, even though it may no longer be possible for another process to open it under the same name, or at all. The file is not physically deleted until the last link (directory entry) that refers to it has been removed, and no processes have it open. (Even then, its contents won't immediately be erased, but that's going beyond what's relevant here.)
And furthermore, the mv command that clobbers the script file is immediately followed by exit 0.
BUT it's at least conceivable that the shell could close the file and then re-open it by name. I can't think of any good reason for it to do so, but I know of no absolute guarantee that it won't.
And some systems tend to do stricter file locking that most Unix systems do. On Windows, for example, I suspect that the mv command would fail because a process (the shell) has the file open. Your script might fail on Cygwin. (I haven't tried it.)
So what makes me nervous is not so much the small possibility that it could fail, but the long and tenuous line of reasoning that seems to demonstrate that it will probably succeed, and the very real possibility that there's something else I haven't thought of.
My suggestion: write a second script whose one and only job is to update the first. Put the runSelfUpdate() function, or equivalent code, into that script. In your original script, use exec to invoke the update script, so that the original script is no longer running when you update it. If you want to avoid the hassle of maintaining, distributing, and installing two separate scripts. you could have the original script create the update script with a unique in /tmp; that would also solve the problem of updating the update script. (I wouldn't worry about cleaning up the autogenerated update script in /tmp; that would just reopen the same can of worms.)

Yes, but ... I would recommend you keep a more layered version of your script's history, unless the remote host can also perform version-control with histories. That being said, to respond directly to the code you have posted, see the following comments ;-)
What happens to your system when wget has a hiccup, quietly overwrites part of your working script with only a partial or otherwise corrupt copy? Your next step does a mv $0.tmp $0 so you've lost your working version. (I hope you have it in version control on the remote!)
You can check to see if wget returns any error messages
if ! wget --quiet --output-document=$0.tmp $UPDATE_BASE/$SELF ; then
echo "error on wget on $UPDATE_BASE/$SELF"
exit 1
fi
Also, Rule-of-thumb tests will help, i.e.
if (( $(wc -c < $0.tmp) >= $(wc -c < $0) )); then
mv $0.tmp $0
fi
but are hardly foolproof.
If your $0 could windup with spaces in it, better to surround all references like "$0".
To be super-bullet proof, consider checking all command returns AND that Octal_Mode has a reasonable value
OCTAL_MODE=$(stat -c '%a' $0)
case ${OCTAL_MODE:--1} in
-[1] )
printf "Error : OCTAL_MODE was empty\n"
exit 1
;;
777|775|755 ) : nothing ;;
* )
printf "Error in OCTAL_MODEs, found value=${OCTAL_MODE}\n"
exit 1
;;
esac
if ! chmod $OCTAL_MODE $0.tmp ; then
echo "error on chmod $OCTAL_MODE %0.tmp from $UPDATE_BASE/$SELF, can't continue"
exit 1
fi
I hope this helps.

Very late answer here, but as I just solved this too, I thought it might help someone to post the approach:
#!/usr/bin/env bash
#
set -fb
readonly THISDIR=$(cd "$(dirname "$0")" ; pwd)
readonly MY_NAME=$(basename "$0")
readonly FILE_TO_FETCH_URL="https://your_url_to_downloadable_file_here"
readonly EXISTING_SHELL_SCRIPT="${THISDIR}/somescript.sh"
readonly EXECUTABLE_SHELL_SCRIPT="${THISDIR}/.somescript.sh"
function get_remote_file() {
readonly REQUEST_URL=$1
readonly OUTPUT_FILENAME=$2
readonly TEMP_FILE="${THISDIR}/tmp.file"
if [ -n "$(which wget)" ]; then
$(wget -O "${TEMP_FILE}" "$REQUEST_URL" 2>&1)
if [[ $? -eq 0 ]]; then
mv "${TEMP_FILE}" "${OUTPUT_FILENAME}"
chmod 755 "${OUTPUT_FILENAME}"
else
return 1
fi
fi
}
function clean_up() {
# clean up code (if required) that has to execute every time here
}
function self_clean_up() {
rm -f "${EXECUTABLE_SHELL_SCRIPT}"
}
function update_self_and_invoke() {
get_remote_file "${FILE_TO_FETCH_URL}" "${EXECUTABLE_SHELL_SCRIPT}"
if [ $? -ne 0 ]; then
cp "${EXISTING_SHELL_SCRIPT}" "${EXECUTABLE_SHELL_SCRIPT}"
fi
exec "${EXECUTABLE_SHELL_SCRIPT}" "$#"
}
function main() {
cp "${EXECUTABLE_SHELL_SCRIPT}" "${EXISTING_SHELL_SCRIPT}"
# your code here
}
if [[ $MY_NAME = \.* ]]; then
# invoke real main program
trap "clean_up; self_clean_up" EXIT
main "$#"
else
# update myself and invoke updated version
trap clean_up EXIT
update_self_and_invoke "$#"
fi

Running a limited number of child processes in parallel in bash? [duplicate]

This question already has answers here:
How to limit number of threads/sub-processes used in a function in bash
(7 answers)
Closed 3 years ago.
I have a large set of files for which some heavy processing needs to be done.
This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run.
My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.
In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.
However a very simple example shell script like this will trash the system performance due to excessive load and swapping:
find . -type f | while read name ;
do
some_heavy_processing_command ${name} &
done
So what I want is essentially similar to what "gmake -j4" does.
I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).
What is the simplest/cleanest/best solution to do what I want?
Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash
The "xargs --max-procs=4" works like a charm.
(So I voted to close my own question)

I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)
function max2 {
while [ `jobs | wc -l` -ge 2 ]
do
sleep 5
done
}
find . -type f | while read name ;
do
max2; some_heavy_processing_command ${name} &
done
wait

#! /usr/bin/env bash
set -o monitor
# means: run background processes in a separate processes...
trap add_next_job CHLD
# execute add_next_job when we receive a child complete signal
todo_array=($(find . -type f)) # places output into an array
index=0
max_jobs=2
function add_next_job {
# if still jobs to do then add one
if [[ $index -lt ${#todo_array[*]} ]]
# apparently stackoverflow doesn't like bash syntax
# the hash in the if is not a comment - rather it's bash awkward way of getting its length
then
echo adding job ${todo_array[$index]}
do_job ${todo_array[$index]} &
# replace the line above with the command you want
index=$(($index+1))
fi
}
function do_job {
echo "starting job $1"
sleep 2
}
# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
add_next_job
done
# wait for all jobs to complete
wait
echo "done"
Having said that Fredrik makes the excellent point that xargs does exactly what you want...

With GNU Parallel it becomes simpler:
find . -type f | parallel some_heavy_processing_command {}
Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

I think I found a more handy solution using make:
#!/usr/bin/make -f
THIS := $(lastword $(MAKEFILE_LIST))
TARGETS := $(shell find . -name '*.sh' -type f)
.PHONY: all $(TARGETS)
all: $(TARGETS)
$(TARGETS):
some_heavy_processing_command $#
$(THIS): ; # Avoid to try to remake this makefile
Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.
It is more flexible than xargs, and make is part of the standard distribution, not like parallel.

This code worked quite well for me.
I noticed one issue in which the script couldn't end.
If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.
To prevent the above scenario, I've added the following right after the "max_jobs" declaration.
if [ $max_jobs -gt ${#todo_array[*]} ];
then
# there are more elements found in the array than max jobs, setting max jobs to #of array elements"
max_jobs=${#todo_array[*]}
fi

Another option:
PARALLEL_MAX=...
function start_job() {
while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
sleep .1 # Wait for background tasks to complete.
done
"$#" &
}
start_job some_big_command1
start_job some_big_command2
start_job some_big_command3
start_job some_big_command4
...

Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.
function jobmax
{
typeset -i MAXJOBS=$1
sleep .1
while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
do
sleep .1
done
}
nproc=5
for i in {1..100}
do
sleep 1 &
jobmax $nproc
done
wait # Wait for the rest

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio