Bash function: start program in background, determine PID and pipe output - bash

I would like to have a function in bash, that starts a program in the background, determines the PID of that program and also pipe's its output to sed. I know how to do either one of them separately, but not how to achieve all of them at once.
What I have so far is this:
# Start a program in the background
#
# Arguments:
# 1 - Variable in which to "write" the PID
# 2 - App to execute
# 3 - Arguments to app
#
function start_program_in_background() {
RC=$1; shift
# Start program in background and determine PID
BIN=$1; shift
( $BIN $# & echo $! >&3 ) 3>PID | stdbuf -o0 sed -e 's/a/b/' &
# ALTERNATIVE $BIN $# > >( sed .. ) &
# Write PID to variable given as argument 1
PID=$(<PID)
# when using ALTERNATIVEPID=$!
eval "$RC=$PID"
echo "$BIN ---PID---> $PID"
}
The way I extract the PID is inspired by [1]. There is a second variant in the comments. Both of them show the output of the background processes when executing a script that starts programs using above function, but there is no output when I pipe
[1] How to get the PID of a process that is piped to another process in Bash?
Any ideas?

Solution:
I came up with this myself thanks to some of the helpful comments. In order to being able to mark as solved, I'm posting the working solution here.
# Start a program in the background
#
# Arguments:
# 1 - Variable in which to "write" the PID
# 2 - App to execute
# 3 - Arguments to app
#
function start_program_in_background() {
RC=$1; shift
# Create a temporary file to store the PID
FPID=$(mktemp)
# Start program in background and determine PID
BIN=$1; shift
APP=$(basename $BIN)
( stdbuf -o0 $BIN $# 2>&1 & echo $! >&3 ) 3>$FPID | \
stdbuf -i0 -o0 sed -e "s/^/$APP: /" |\
stdbuf -i0 -o0 tee /tmp/log_${APP} &
# Need to sleep a bit to make sure PID is available in file
sleep 1
# Write PID to variable given as argument 1
PID=$(<$FPID)
eval "$RC=$PID"
rm $FPID # Remove temporary file holding PID
}

Related

pgrep -P $$ gives an inexistent process id

#!/usr/bin/env bash
sleep 3 & # Spawn a child
trap '
pgrep -P $$ # Outputs one PID as expected
PIDS=( $( pgrep -P $$ ) ) # Saves an extra nonexistant PID
echo "PIDS: ${PIDS[#]}" # You can see it is the last one
ps -o pid= "${PIDS[#]:(-1)}" ||
echo "Dafuq is ${PIDS[#]:(-1)}?" # Yep, it does not exist!
' 0 1 2 3 15
It outputs
11800
PIDS: 11800 11802
Dafuq is 11802?
It only happens with traps.
Why is a nonexistent PID appended to the array? And how to avoid this odd behaviour?
By using $(...), you've created a subprocess which will execute that code.
Naturally, the parent of that process will be the current shell, so it's going to be listed.
As for the workaround, you could remove that PID from the list. First you have to know how to access the subshell PID: $$ in a script vs $$ in a subshell . Now you can filter it out (nope, it doesn't work):
PIDS=( $( pgrep -P $$ | grep -v ^$BASHPID$ ) )

How to redirect stderr and stdout to 2 distinct functions without pipes/fifos?

My aim is to monitor child process without creating any pipe while still being able to discriminate stderr from stdout, and being able to retrieve exit code.
I would like to avoid using named pipes or /dev/shm since they wouldn't be cleaned up in case of SIGKILL (yes, I have nice and tactful users)
After picking ideas on stackoverflow, I came to this. Not sure it is the smartest way, feel free to correct/enhance !
% monitor "echo stdout:" "echo stderr:" /bin/bash -c "echo FOO;BAR"
stdout: FOO
stderr: /bin/bash: BAR: command not found
% echo Exit-code: $?
Exit-code: 127
There's room for improvement as it keeps spawning 2 child processes, 1 per redirection I suspect.
#
# PURPOSE
# redirect stdout and stderr to 2 distinct functions and retrieve exit code
#
# USAGE
# monitor HANDLER_OUT HANDLER_ERR COMMAND ...
# HANDLER_OUT and HANDLER_ERR can be commands or functions
# COMMAND is the full command which outputs are being monitors
#
# RESULT
# returns COMMAND exit-code
#
# EXAMPLES
# monitor "echo OUT=" "echo ERR=" /bin/bash -c "FOOBAR; echo Done."
# monitor myfunction1 myfunction2 /bin/bash -c "FOOBAR; echo Done.;/bin/false"
#
monitor () {
monitor_exit_code "$#" 1001>&1 1002>&2
}
monitor_exit_code () {
local code=0
while read data
do
code=$data
done < <( monitor_stdout "$#" 1000>&1 )
return $code
}
monitor_stdout () {
local parser_out=$1;shift
while read data
do
($parser_out "$data") >&1001
done < <( monitor_stderr "$#" 1001>&1 )
}
monitor_stderr () {
local parser_err=$1;shift
while read data
do
($parser_err "$data") >&1002
done < <( "$#" 2>&1 1>&1001 ; echo >&1000 $? )
}

how to write a process-pool bash shell

I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
My task can be started like:
myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
Use xargs:
xargs -P <maximum-number-of-process-at-a-time> -n <arguments-per-process> <command>
Details here.
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
# \brief the worker function that is called when we fork off worker processes
# \param[in] id the worker ID
# \param[in] job_queue the fifo to read jobs from
# \param[in] result_log the temporary log file to write exit codes to
function _job_pool_worker()
{
local id=$1
local job_queue=$2
local result_log=$3
local line=
exec 7<> ${job_queue}
while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
# workers block on the exclusive lock to read the job queue
flock --exclusive 7
read line <${job_queue}
flock --unlock 7
# the worker should exit if it sees the end-of-job marker or run the
# job otherwise and save its exit code to the result log.
if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
# write it one more time for the next sibling so that everyone
# will know we are exiting.
echo "${line}" >&7
else
_job_pool_echo "### _job_pool_worker-${id}: ${line}"
# run the job
{ ${line} ; }
# now check the exit code and prepend "ERROR" to the result log entry
# which we will use to count errors and then strip out later.
local result=$?
local status=
if [[ "${result}" != "0" ]]; then
status=ERROR
fi
# now write the error to the log, making sure multiple processes
# don't trample over each other.
exec 8<> ${result_log}
flock --exclusive 8
echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
flock --unlock 8
exec 8>&-
_job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
fi
done
exec 7>&-
}
You can get a copy of my solution at Github. Here's a sample program using my implementation.
#!/bin/bash
. job_pool.sh
function foobar()
{
# do something
true
}
# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0
# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false
# wait until all jobs complete before continuing
job_pool_wait
# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar
# don't forget to shut down the job pool
job_pool_shutdown
# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
Using GNU Parallel you can do:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
./configure && make && make install
Personal installation
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Test the installation
After this you should be able to do:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and print
the output when they complete.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
#!/bin/sh
./script1.sh &
./script2.sh &
./script3.sh &
./script4.sh &
I found the best solution proposed in A Foo Walks into a Bar... blog using build-in functionality of well know xargs tool
First create a file commands.txt with list of commands you want to execute
myprog taskname1
myprog taskname2
myprog taskname3
myprog taskname4
...
myprog taskname123
and then pipe it to xargs like this to execute in 4 processes pool:
cat commands.txt | xargs -I CMD --max-procs=4 bash -c CMD
you can modify no of process
Following #Parag Sardas' answer and the documentation linked here's a quick script you might want to add on your .bash_aliases.
Relinking the doc link because it's worth a read
#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt
if [ "$#" -eq 0 ]; then
echo "$0 <file> [max-procs=0]"
exit 1
fi
FILE=${1}
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD
I.e.
./xargs-parallel.sh jobs.txt 4 maximum of 4 processes read from jobs.txt
You could probably do something clever with signals.
Note this is only to illustrate the concept, and thus not thoroughly tested.
#!/usr/local/bin/bash
this_pid="$$"
jobs_running=0
sleep_pid=
# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM
# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
jobs_running=$(($jobs_running - 1))
if [ -n "${sleep_pid}" ]
then
kill -s SIGKILL "${sleep_pid}"
sleep_pid=
fi
}
# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
if [ ${jobs_running} -gt 3 ]
then
(
while true
do
sleep 999
done
) &
sleep_pid=$!
wait ${sleep_pid}
fi
# Launch the requested task, signalling the parent upon completion
(
"$#"
kill -s SIGALRM "${this_pid}"
) &
jobs_running=$((${jobs_running} + 1))
}
# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=$1
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
Main script will create symbolic links to executables following certain namimg convention. For example,
ln -s executable1 ./01-task.01
first prefix is for sorting and suffix identifies batch (01-04).
Now we spawn 4 shell scripts that would take batch number as input and do something like this
for t in $(ls ./*-task.$batch | sort ; do
t
rm t
done
Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:
./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...
Here is my solution. The idea is quite simple. I create a fifo as a semaphore, where each line stands for an available resource. When reading the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echoing anything to the queue.
function task() {
local task_no="$1"
# doing the actual task...
echo "Executing Task ${task_no}"
# which takes a long time
sleep 1
}
function execute_concurrently() {
local tasks="$1"
local ps_pool_size="$2"
# create an anonymous fifo as a Semaphore
local sema_fifo
sema_fifo="$(mktemp -u)"
mkfifo "${sema_fifo}"
exec 3<>"${sema_fifo}"
rm -f "${sema_fifo}"
# every 'x' stands for an available resource
for i in $(seq 1 "${ps_pool_size}"); do
echo 'x' >&3
done
for task_no in $(seq 1 "${tasks}"); do
read dummy <&3 # blocks util a resource is available
(
trap 'echo x >&3' EXIT # returns the resource on exit
task "${task_no}"
)&
done
wait # wait util all forked tasks have finished
}
execute_concurrently 10 4
The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}") sequence to the actual task queue you want to run.
I made my modifications based on methods introduced in this Writing a process pool in Bash.
#!/bin/bash
#set -e # this doesn't work here for some reason
POOL_SIZE=4 # number of workers running in parallel
#######################################################################
# populate jobs #
#######################################################################
declare -a jobs
for (( i = 1988; i < 2019; i++ )); do
jobs+=($i)
done
echo '################################################'
echo ' Launching jobs'
echo '################################################'
parallel() {
local proc procs jobs cur
jobs=("$#") # input jobs array
declare -a procs=() # processes array
cur=0 # current job idx
morework=true
while $morework; do
# if process array size < pool size, try forking a new proc
if [[ "${#procs[#]}" -lt "$POOL_SIZE" ]]; then
if [[ $cur -lt "${#jobs[#]}" ]]; then
proc=${jobs[$cur]}
echo "JOB ID = $cur; JOB = $proc."
###############
# do job here #
###############
sleep 3 &
# add to current running processes
procs+=("$!")
# move to the next job
((cur++))
else
morework=false
continue
fi
fi
for n in "${!procs[#]}"; do
kill -0 "${procs[n]}" 2>/dev/null && continue
# if process is not running anymore, remove from array
unset procs[n]
done
done
wait
}
parallel "${jobs[#]}"
xargs with -P and -L options does the job.
You can extract the idea from the example below:
#!/usr/bin/env bash
workers_pool_size=10
set -e
function doit {
cmds=""
for e in 4 8 16; do
for m in 1 2 3 4 5 6; do
cmd="python3 ./doit.py --m $m -e $e -m $m"
cmds="$cmd\n$cmds"
done
done
echo -e "All commands:\n$cmds"
echo "Workers pool size = $workers_pool_size"
echo -e "$cmds" | xargs -t -P $workers_pool_size -L 1 time > /dev/null
}
doit
#! /bin/bash
doSomething() {
<...>
}
getCompletedThreads() {
_runningThreads=("$#")
removableThreads=()
for pid in "${_runningThreads[#]}"; do
if ! ps -p $pid > /dev/null; then
removableThreads+=($pid)
fi
done
echo "$removableThreads"
}
releasePool() {
while [[ ${#runningThreads[#]} -eq $MAX_THREAD_NO ]]; do
echo "releasing"
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
else
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
echo "released"
fi
done
}
waitAllThreadComplete() {
while [[ ${#runningThreads[#]} -ne 0 ]]; do
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
fi
done
}
MAX_THREAD_NO=10
runningThreads=()
sequenceNo=0
for i in {1..36}; do
releasePool
((sequenceNo++))
echo "added $sequenceNo"
doSomething &
pid=$!
runningThreads+=($pid)
done
waitAllThreadComplete

When I run /usr/bin/time my_program, how to kill my_program?

I want to check how long time my program takes. Then I using "/usr/bin/time my_program". When it takes more than 5 seconds, I want to kill it. I tried "kill -9 TIME_S_PID", time is killed, but my_program is still running. So how to kill my_program?
Thanks.
I resolved this using 2 scripts below:
the first I name it mytime
#!/bin/bash
##############################################################################################################
# This script is used to perform a /usr/bin/time over a command and deliver the kill signal to the good exe
# Usage : mytime command args
# The result will be in a file name /tmp/command.PID.txt where pid is the current pid
# example:
# ./mytime sleep 10000 &
# kill pid of mytime
# cat /tmp/sleep.*.txt
##############################################################################################################
# store the current pid
P=$$
CMD=$1
signalisation ()
{
# deliver the signal to good pid ie the son of /usr/bin/time
S=$1
# echo Signal $S >>/tmp/trace_$P.txt
PF=$(cat /tmp/pid$P.txt)
rm /tmp/pid$P.txt
# echo pid du fils de time : $PF >>/tmp/trace_$P.txt
kill $S $PF
}
trap "signalisation 1" 1
trap "signalisation 2" 2
trap "signalisation 3" 3
trap "signalisation 15" 15
/usr/bin/time -f '%e %M # Elapsed real time (in seconds) and Memory Max' -o /tmp/$CMD.$P.txt storepid /tmp/pid$P.txt $* &
wait %1
exit $?
and this one used by the previous I named it storepid
#!/bin/bash
###########################################################################################
# this script store the pid in the file specified by the first argument
# the it executes the command defined by the others arguments
# example:
# storepid /tmp/mypid.txt sleep 100
# result sleep 100 secondes , but you can kill it before using kill $(cat /tmp/mypid.txt)
###########################################################################################
WHERE=$1
shift
echo $$ >$WHERE
exec $*
Note that killall, as its name suggests, will kill all instances of your program. A better way would be to make a wrapper script:
#!/bin/sh
echo "$$" # print the PID of the current process
exec my_program
Then you execute /usr/bin/time my_wrapper_script, which prints PID of this instance of your program, and kill it when you want with kill -9 "$my_prog_pid".
As #Soulseekah said, killall my_program works.

How do you run multiple programs in parallel from a bash script?

I am trying to write a .sh file that runs many programs simultaneously
I tried this
prog1
prog2
But that runs prog1 then waits until prog1 ends and then starts prog2...
So how can I run them in parallel?
How about:
prog1 & prog2 && fg
This will:
Start prog1.
Send it to background, but keep printing its output.
Start prog2, and keep it in foreground, so you can close it with ctrl-c.
When you close prog2, you'll return to prog1's foreground, so you can also close it with ctrl-c.
To run multiple programs in parallel:
prog1 &
prog2 &
If you need your script to wait for the programs to finish, you can add:
wait
at the point where you want the script to wait for them.
If you want to be able to easily run and kill multiple process with ctrl-c, this is my favorite method: spawn multiple background processes in a (…) subshell, and trap SIGINT to execute kill 0, which will kill everything spawned in the subshell group:
(trap 'kill 0' SIGINT; prog1 & prog2 & prog3)
You can have complex process execution structures, and everything will close with a single ctrl-c (just make sure the last process is run in the foreground, i.e., don't include a & after prog1.3):
(trap 'kill 0' SIGINT; prog1.1 && prog1.2 & (prog2.1 | prog2.2 || prog2.3) & prog1.3)
If there is a chance the last command might exit early and you want to keep everything else running, add wait as the last command. In the following example, sleep 2 would have exited first, killing sleep 4 before it finished; adding wait allows both to run to completion:
(trap 'kill 0' SIGINT; sleep 4 & sleep 2 & wait)
You can use wait:
some_command &
P1=$!
other_command &
P2=$!
wait $P1 $P2
It assigns the background program PIDs to variables ($! is the last launched process' PID), then the wait command waits for them. It is nice because if you kill the script, it kills the processes too!
With GNU Parallel http://www.gnu.org/software/parallel/ it is as easy as:
(echo prog1; echo prog2) | parallel
Or if you prefer:
parallel ::: prog1 prog2
Learn more:
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line
will love you for it.
Read: Ole Tange, GNU Parallel 2018 (Ole Tange, 2018).
xargs -P <n> allows you to run <n> commands in parallel.
While -P is a nonstandard option, both the GNU (Linux) and macOS/BSD implementations support it.
The following example:
runs at most 3 commands in parallel at a time,
with additional commands starting only when a previously launched process terminates.
time xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
The output looks something like:
1 # output from 1st command
4 # output from *last* command, which started as soon as the count dropped below 3
2 # output from 2nd command
3 # output from 3rd command
real 0m3.012s
user 0m0.011s
sys 0m0.008s
The timing shows that the commands were run in parallel (the last command was launched only after the first of the original 3 terminated, but executed very quickly).
The xargs command itself won't return until all commands have finished, but you can execute it in the background by terminating it with control operator & and then using the wait builtin to wait for the entire xargs command to finish.
{
xargs -P 3 -I {} sh -c 'eval "$1"' - {} <<'EOF'
sleep 1; echo 1
sleep 2; echo 2
sleep 3; echo 3
echo 4
EOF
} &
# Script execution continues here while `xargs` is running
# in the background.
echo "Waiting for commands to finish..."
# Wait for `xargs` to finish, via special variable $!, which contains
# the PID of the most recently started background process.
wait $!
Note:
BSD/macOS xargs requires you to specify the count of commands to run in parallel explicitly, whereas GNU xargs allows you to specify -P 0 to run as many as possible in parallel.
Output from the processes run in parallel arrives as it is being generated, so it will be unpredictably interleaved.
GNU parallel, as mentioned in Ole's answer (does not come standard with most platforms), conveniently serializes (groups) the output on a per-process basis and offers many more advanced features.
#!/bin/bash
prog1 & 2> .errorprog1.log; prog2 & 2> .errorprog2.log
Redirect errors to separate logs.
Here is a function I use in order to run at max n process in parallel (n=4 in the example):
max_children=4
function parallel {
local time1=$(date +"%H:%M:%S")
local time2=""
# for the sake of the example, I'm using $2 as a description, you may be interested in other description
echo "starting $2 ($time1)..."
"$#" && time2=$(date +"%H:%M:%S") && echo "finishing $2 ($time1 -- $time2)..." &
local my_pid=$$
local children=$(ps -eo ppid | grep -w $my_pid | wc -w)
children=$((children-1))
if [[ $children -ge $max_children ]]; then
wait -n
fi
}
parallel sleep 5
parallel sleep 6
parallel sleep 7
parallel sleep 8
parallel sleep 9
wait
If max_children is set to the number of cores, this function will try to avoid idle cores.
There is a very useful program that calls nohup.
nohup - run a command immune to hangups, with output to a non-tty
This works beautifully for me (found here):
sh -c 'command1 & command2 & command3 & wait'
It outputs all the logs of each command intermingled (which is what I wanted), and all are killed with ctrl+c.
I had a similar situation recently where I needed to run multiple programs at the same time, redirect their outputs to separated log files and wait for them to finish and I ended up with something like that:
#!/bin/bash
# Add the full path processes to run to the array
PROCESSES_TO_RUN=("/home/joao/Code/test/prog_1/prog1" \
"/home/joao/Code/test/prog_2/prog2")
# You can keep adding processes to the array...
for i in ${PROCESSES_TO_RUN[#]}; do
${i%/*}/./${i##*/} > ${i}.log 2>&1 &
# ${i%/*} -> Get folder name until the /
# ${i##*/} -> Get the filename after the /
done
# Wait for the processes to finish
wait
Source: http://joaoperibeiro.com/execute-multiple-programs-and-redirect-their-outputs-linux/
You can try ppss (abandoned). ppss is rather powerful - you can even create a mini-cluster.
xargs -P can also be useful if you've got a batch of embarrassingly parallel processing to do.
Process Spawning Manager
Sure, technically these are processes, and this program should really be called a process spawning manager, but this is only due to the way that BASH works when it forks using the ampersand, it uses the fork() or perhaps clone() system call which clones into a separate memory space, rather than something like pthread_create() which would share memory. If BASH supported the latter, each "sequence of execution" would operate just the same and could be termed to be traditional threads whilst gaining a more efficient memory footprint. Functionally however it works the same, though a bit more difficult since GLOBAL variables are not available in each worker clone hence the use of the inter-process communication file and the rudimentary flock semaphore to manage critical sections. Forking from BASH of course is the basic answer here but I feel as if people know that but are really looking to manage what is spawned rather than just fork it and forget it. This demonstrates a way to manage up to 200 instances of forked processes all accessing a single resource. Clearly this is overkill but I enjoyed writing it so I kept on. Increase the size of your terminal accordingly. I hope you find this useful.
ME=$(basename $0)
IPC="/tmp/$ME.ipc" #interprocess communication file (global thread accounting stats)
DBG=/tmp/$ME.log
echo 0 > $IPC #initalize counter
F1=thread
SPAWNED=0
COMPLETE=0
SPAWN=1000 #number of jobs to process
SPEEDFACTOR=1 #dynamically compensates for execution time
THREADLIMIT=50 #maximum concurrent threads
TPS=1 #threads per second delay
THREADCOUNT=0 #number of running threads
SCALE="scale=5" #controls bc's precision
START=$(date +%s) #whence we began
MAXTHREADDUR=6 #maximum thread life span - demo mode
LOWER=$[$THREADLIMIT*100*90/10000] #90% worker utilization threshold
UPPER=$[$THREADLIMIT*100*95/10000] #95% worker utilization threshold
DELTA=10 #initial percent speed change
threadspeed() #dynamically adjust spawn rate based on worker utilization
{
#vaguely assumes thread execution average will be consistent
THREADCOUNT=$(threadcount)
if [ $THREADCOUNT -ge $LOWER ] && [ $THREADCOUNT -le $UPPER ] ;then
echo SPEED HOLD >> $DBG
return
elif [ $THREADCOUNT -lt $LOWER ] ;then
#if maxthread is free speed up
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1-($DELTA/100))"|bc)
echo SPEED UP $DELTA%>> $DBG
elif [ $THREADCOUNT -gt $UPPER ];then
#if maxthread is active then slow down
SPEEDFACTOR=$(echo "$SCALE;$SPEEDFACTOR*(1+($DELTA/100))"|bc)
DELTA=1 #begin fine grain control
echo SLOW DOWN $DELTA%>> $DBG
fi
echo SPEEDFACTOR $SPEEDFACTOR >> $DBG
#average thread duration (total elapsed time / number of threads completed)
#if threads completed is zero (less than 100), default to maxdelay/2 maxthreads
COMPLETE=$(cat $IPC)
if [ -z $COMPLETE ];then
echo BAD IPC READ ============================================== >> $DBG
return
fi
#echo Threads COMPLETE $COMPLETE >> $DBG
if [ $COMPLETE -lt 100 ];then
AVGTHREAD=$(echo "$SCALE;$MAXTHREADDUR/2"|bc)
else
ELAPSED=$[$(date +%s)-$START]
#echo Elapsed Time $ELAPSED >> $DBG
AVGTHREAD=$(echo "$SCALE;$ELAPSED/$COMPLETE*$THREADLIMIT"|bc)
fi
echo AVGTHREAD Duration is $AVGTHREAD >> $DBG
#calculate timing to achieve spawning each workers fast enough
# to utilize threadlimit - average time it takes to complete one thread / max number of threads
TPS=$(echo "$SCALE;($AVGTHREAD/$THREADLIMIT)*$SPEEDFACTOR"|bc)
#TPS=$(echo "$SCALE;$AVGTHREAD/$THREADLIMIT"|bc) # maintains pretty good
#echo TPS $TPS >> $DBG
}
function plot()
{
echo -en \\033[${2}\;${1}H
if [ -n "$3" ];then
if [[ $4 = "good" ]];then
echo -en "\\033[1;32m"
elif [[ $4 = "warn" ]];then
echo -en "\\033[1;33m"
elif [[ $4 = "fail" ]];then
echo -en "\\033[1;31m"
elif [[ $4 = "crit" ]];then
echo -en "\\033[1;31;4m"
fi
fi
echo -n "$3"
echo -en "\\033[0;39m"
}
trackthread() #displays thread status
{
WORKERID=$1
THREADID=$2
ACTION=$3 #setactive | setfree | update
AGE=$4
TS=$(date +%s)
COL=$[(($WORKERID-1)/50)*40]
ROW=$[(($WORKERID-1)%50)+1]
case $ACTION in
"setactive" )
touch /tmp/$ME.$F1$WORKERID #redundant - see main loop
#echo created file $ME.$F1$WORKERID >> $DBG
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID INIT " good
;;
"update" )
plot $COL $ROW "Worker$WORKERID: ACTIVE-TID:$THREADID AGE:$AGE" warn
;;
"setfree" )
plot $COL $ROW "Worker$WORKERID: FREE " fail
rm /tmp/$ME.$F1$WORKERID
;;
* )
;;
esac
}
getfreeworkerid()
{
for i in $(seq 1 $[$THREADLIMIT+1])
do
if [ ! -e /tmp/$ME.$F1$i ];then
#echo "getfreeworkerid returned $i" >> $DBG
break
fi
done
if [ $i -eq $[$THREADLIMIT+1] ];then
#echo "no free threads" >> $DBG
echo 0
#exit
else
echo $i
fi
}
updateIPC()
{
COMPLETE=$(cat $IPC) #read IPC
COMPLETE=$[$COMPLETE+1] #increment IPC
echo $COMPLETE > $IPC #write back to IPC
}
worker()
{
WORKERID=$1
THREADID=$2
#echo "new worker WORKERID:$WORKERID THREADID:$THREADID" >> $DBG
#accessing common terminal requires critical blocking section
(flock -x -w 10 201
trackthread $WORKERID $THREADID setactive
)201>/tmp/$ME.lock
let "RND = $RANDOM % $MAXTHREADDUR +1"
for s in $(seq 1 $RND) #simulate random lifespan
do
sleep 1;
(flock -x -w 10 201
trackthread $WORKERID $THREADID update $s
)201>/tmp/$ME.lock
done
(flock -x -w 10 201
trackthread $WORKERID $THREADID setfree
)201>/tmp/$ME.lock
(flock -x -w 10 201
updateIPC
)201>/tmp/$ME.lock
}
threadcount()
{
TC=$(ls /tmp/$ME.$F1* 2> /dev/null | wc -l)
#echo threadcount is $TC >> $DBG
THREADCOUNT=$TC
echo $TC
}
status()
{
#summary status line
COMPLETE=$(cat $IPC)
plot 1 $[$THREADLIMIT+2] "WORKERS $(threadcount)/$THREADLIMIT SPAWNED $SPAWNED/$SPAWN COMPLETE $COMPLETE/$SPAWN SF=$SPEEDFACTOR TIMING=$TPS"
echo -en '\033[K' #clear to end of line
}
function main()
{
while [ $SPAWNED -lt $SPAWN ]
do
while [ $(threadcount) -lt $THREADLIMIT ] && [ $SPAWNED -lt $SPAWN ]
do
WID=$(getfreeworkerid)
worker $WID $SPAWNED &
touch /tmp/$ME.$F1$WID #if this loops faster than file creation in the worker thread it steps on itself, thread tracking is best in main loop
SPAWNED=$[$SPAWNED+1]
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep $TPS
if ((! $[$SPAWNED%100]));then
#rethink thread timing every 100 threads
threadspeed
fi
done
sleep $TPS
done
while [ "$(threadcount)" -gt 0 ]
do
(flock -x -w 10 201
status
)201>/tmp/$ME.lock
sleep 1;
done
status
}
clear
threadspeed
main
wait
status
echo
Since for some reason I can't use wait, I came up with this solution:
# create a hashmap of the tasks name -> its command
declare -A tasks=(
["Sleep 3 seconds"]="sleep 3"
["Check network"]="ping imdb.com"
["List dir"]="ls -la"
)
# execute each task in the background, redirecting their output to a custom file descriptor
fd=10
for task in "${!tasks[#]}"; do
script="${tasks[${task}]}"
eval "exec $fd< <(${script} 2>&1 || (echo $task failed with exit code \${?}! && touch tasks_failed))"
((fd+=1))
done
# print the outputs of the tasks and wait for them to finish
fd=10
for task in "${!tasks[#]}"; do
cat <&$fd
((fd+=1))
done
# determine the exit status
# by checking whether the file "tasks_failed" has been created
if [ -e tasks_failed ]; then
echo "Task(s) failed!"
exit 1
else
echo "All tasks finished without an error!"
exit 0
fi
Your script should look like:
prog1 &
prog2 &
.
.
progn &
wait
progn+1 &
progn+2 &
.
.
Assuming your system can take n jobs at a time. use wait to run only n jobs at a time.
If you're:
On Mac and have iTerm
Want to start various processes that stay open long-term / until Ctrl+C
Want to be able to easily see the output from each process
Want to be able to easily stop a specific process with Ctrl+C
One option is scripting the terminal itself if your use case is more app monitoring / management.
For example I recently did the following. Granted it's Mac specific, iTerm specific, and relies on a deprecated Apple Script API (iTerm has a newer Python option). It doesn't win any elegance awards but gets the job done.
#!/bin/sh
root_path="~/root-path"
auth_api_script="$root_path/auth-path/auth-script.sh"
admin_api_proj="$root_path/admin-path/admin.csproj"
agent_proj="$root_path/agent-path/agent.csproj"
dashboard_path="$root_path/dashboard-web"
osascript <<THEEND
tell application "iTerm"
set newWindow to (create window with default profile)
tell current session of newWindow
set name to "Auth API"
write text "pushd $root_path && $auth_api_script"
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Admin API"
write text "dotnet run --debug -p $admin_api_proj"
end tell
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Agent"
write text "dotnet run --debug -p $agent_proj"
end tell
end tell
tell newWindow
set newTab to (create tab with default profile)
tell current session of newTab
set name to "Dashboard"
write text "pushd $dashboard_path; ng serve -o"
end tell
end tell
end tell
THEEND
If you have a GUI terminal, you could spawn a new tabbed terminal instance for each process you want to run in parallel.
This has the benefit that each program runs in its own tab where it can be interacted with and managed independently of the other running programs.
For example, on Ubuntu 20.04:
gnome-terminal --tab -- bash -c 'prog1'
gnome-terminal --tab -- bash -c 'prog2'
To run certain programs or other commands sequentially, you can add ;
gnome-terminal --tab -- bash -c 'prog1_1; prog1_2'
gnome-terminal --tab -- bash -c 'prog2'
I've found that for some programs, the terminal closes before they start up. For these programs I append the terminal command with ; wait or ; sleep 1
gnome-terminal --tab -- bash -c 'prog1; wait'
For Mac OS, you would have to find an equivalent command for the terminal you are using - I haven't tested on Mac OS since I don't own a Mac.
There're a lot of interesting answers here, but I took inspiration from this answer and put together a simple script that runs multiple processes in parallel and handles the results once they're done. You can find it in this gist, or below:
#!/usr/bin/env bash
# inspired by https://stackoverflow.com/a/29535256/2860309
pids=""
failures=0
function my_process() {
seconds_to_sleep=$1
exit_code=$2
sleep "$seconds_to_sleep"
return "$exit_code"
}
(my_process 1 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to success"
(my_process 1 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 1 second to failure"
(my_process 2 0) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to success"
(my_process 2 1) &
pid=$!
pids+=" ${pid}"
echo "${pid}: 2 seconds to failure"
echo "..."
for pid in $pids; do
if wait "$pid"; then
echo "Process $pid succeeded"
else
echo "Process $pid failed"
failures=$((failures+1))
fi
done
echo
echo "${failures} failures detected"
This results in:
86400: 1 second to success
86401: 1 second to failure
86402: 2 seconds to success
86404: 2 seconds to failure
...
Process 86400 succeeded
Process 86401 failed
Process 86402 succeeded
Process 86404 failed
2 failures detected
With bashj ( https://sourceforge.net/projects/bashj/ ) , you should be able to run not only multiple processes (the way others suggested) but also multiple Threads in one JVM controlled from your script. But of course this requires a java JDK. Threads consume less resource than processes.
Here is a working code:
#!/usr/bin/bashj
#!java
public static int cnt=0;
private static void loop() {u.p("java says cnt= "+(cnt++));u.sleep(1.0);}
public static void startThread()
{(new Thread(() -> {while (true) {loop();}})).start();}
#!bashj
j.startThread()
while [ j.cnt -lt 4 ]
do
echo "bash views cnt=" j.cnt
sleep 0.5
done

Resources