how to wait wget finished to get more resources - bash

I am new to bash.
I want to wget some resources in parallel.
What is the problem in the following code:
for item in $list
do
if [ $i -le 10 ];then
wget -b $item
let "i++"
else
wait
i=1
fi
When I execute this shell. Error throwed:
fork: Resource temporarily unavailable
My question is how to use wget right way.
Edit:
My problem is there is about four thousands of url to download, if I let all these jobs work in parallel, fork: Resource temporarily unavailable will throw out. I don't know how to control the count in parallel.

Use jobs|grep to check background jobs:
#!/bin/bash
urls=('www.cnn.com' 'www.wikipedia.org') ## input data
for ((i=-1;++i<${#urls[#]};)); do
curl -L -s ${urls[$i]} >file-$i.html & ## background jobs
done
until [[ -z `jobs|grep -E -v 'Done|Terminated'` ]]; do
sleep 0.05; echo -n '.' ## do something while waiting
done
echo; ls -l file*\.html ## list downloaded files
Results:
............................
-rw-r--r-- 1 xxx xxx 155421 Jan 20 00:50 file-0.html
-rw-r--r-- 1 xxx xxx 74711 Jan 20 00:50 file-1.html
Another variance, tasks in simple parallel:
#!/bin/bash
urls=('www.yahoo.com' 'www.hotmail.com' 'stackoverflow.com')
_task1(){ ## task 1: download files
for ((i=-1;++i<${#urls[#]};)); do
curl -L -s ${urls[$i]} >file-$i.html &
done; wait
}
_task2(){ echo hello; } ## task 2: a fake task
_task3(){ echo hi; } ## task 3: a fake task
_task1 & _task2 & _task3 & ## run them in parallel
wait ## and wait for them
ls -l file*\.html ## list results of all tasks
echo done ## and do something
Results:
hello
hi
-rw-r--r-- 1 xxx xxx 320013 Jan 20 02:19 file-0.html
-rw-r--r-- 1 xxx xxx 3566 Jan 20 02:19 file-1.html
-rw-r--r-- 1 xxx xxx 253348 Jan 20 02:19 file-2.html
done
Example with limit how many downloads in parallel at a time (max=3):
#!/bin/bash
m=3 ## max jobs (downloads) at a time
t=4 ## retries for each download
_debug(){ ## list jobs to see (debug)
printf ":: jobs running: %s\n" "$(echo `jobs -p`)"
}
## sample input data
## is redirected to filehandle=3
exec 3<<-EOF
www.google.com google.html
www.hotmail.com hotmail.html
www.wikipedia.org wiki.html
www.cisco.com cisco.html
www.cnn.com cnn.html
www.yahoo.com yahoo.html
EOF
## read data from filehandle=3, line by line
while IFS=' ' read -u 3 -r u f || [[ -n "$f" ]]; do
[[ -z "$f" ]] && continue ## ignore empty input line
while [[ $(jobs -p|wc -l) -ge "$m" ]]; do ## while $m or more jobs in running
_debug ## then list jobs to see (debug)
wait -n ## and wait for some job(s) to finish
done
curl --retry $t -Ls "$u" >"$f" & ## download in background
printf "job %d: %s => %s\n" $! "$u" "$f" ## print job info to see (debug)
done
_debug; wait; ls -l *\.html ## see final results
Outputs:
job 22992: www.google.com => google.html
job 22996: www.hotmail.com => hotmail.html
job 23000: www.wikipedia.org => wiki.html
:: jobs running: 22992 22996 23000
job 23022: www.cisco.com => cisco.html
:: jobs running: 22996 23000 23022
job 23034: www.cnn.com => cnn.html
:: jobs running: 23000 23022 23034
job 23052: www.yahoo.com => yahoo.html
:: jobs running: 23000 23034 23052
-rw-r--r-- 1 xxx xxx 61473 Jan 21 01:15 cisco.html
-rw-r--r-- 1 xxx xxx 155055 Jan 21 01:15 cnn.html
-rw-r--r-- 1 xxx xxx 12514 Jan 21 01:15 google.html
-rw-r--r-- 1 xxx xxx 3566 Jan 21 01:15 hotmail.html
-rw-r--r-- 1 xxx xxx 74711 Jan 21 01:15 wiki.html
-rw-r--r-- 1 xxx xxx 319967 Jan 21 01:15 yahoo.html
After reading your updated question, I think it is much easier to use lftp, which can log and download (automatically follow-link + retry-download + continue-download); you'll never need to worry about job/fork resources because you run only a few lftp commands. Just plit your download list into some smaller lists, and lftp will download for you:
$ cat downthemall.sh
#!/bin/bash
## run: lftp -c 'help get'
## to know how to use lftp to download files
## with automatically retry+continue
p=() ## pid list
for l in *\.lst; do
lftp -f "$l" >/dev/null & ## run proccesses in parallel
p+=("--pid=$!") ## record pid
done
until [[ -f d.log ]]; do sleep 0.5; done ## wait for the log file
tail -f d.log ${p[#]} ## print results when downloading
Outputs:
$ cat 1.lst
set xfer:log true
set xfer:log-file d.log
get -c http://www.microsoft.com -o micro.html
get -c http://www.cisco.com -o cisco.html
get -c http://www.wikipedia.org -o wiki.html
$ cat 2.lst
set xfer:log true
set xfer:log-file d.log
get -c http://www.google.com -o google.html
get -c http://www.cnn.com -o cnn.html
get -c http://www.yahoo.com -o yahoo.html
$ cat 3.lst
set xfer:log true
set xfer:log-file d.log
get -c http://www.hp.com -o hp.html
get -c http://www.ibm.com -o ibm.html
get -c http://stackoverflow.com -o stack.html
$ rm *log *html;./downthemall.sh
2018-01-22 02:10:13 http://www.google.com.vn/?gfe_rd=cr&dcr=0&ei=leVkWqiOKfLs8AeBvqBA -> /tmp/1/google.html 0-12538 103.1 KiB/s
2018-01-22 02:10:13 http://edition.cnn.com/ -> /tmp/1/cnn.html 0-153601 362.6 KiB/s
2018-01-22 02:10:13 https://www.microsoft.com/vi-vn/ -> /tmp/1/micro.html 0-129791 204.0 KiB/s
2018-01-22 02:10:14 https://www.cisco.com/ -> /tmp/1/cisco.html 0-61473 328.0 KiB/s
2018-01-22 02:10:14 http://www8.hp.com/vn/en/home.html -> /tmp/1/hp.html 0-73136 92.2 KiB/s
2018-01-22 02:10:14 https://www.ibm.com/us-en/ -> /tmp/1/ibm.html 0-32700 131.4 KiB/s
2018-01-22 02:10:15 https://vn.yahoo.com/?p=us -> /tmp/1/yahoo.html 0-318657 208.4 KiB/s
2018-01-22 02:10:15 https://www.wikipedia.org/ -> /tmp/1/wiki.html 0-74711 60.7 KiB/s
2018-01-22 02:10:16 https://stackoverflow.com/ -> /tmp/1/stack.html 0-253033 180.8

With updated question, here is an updated answer.
Following script launches 10 (can be changed to any number) wget processes in the background and monitors them. Once one of the process finishes, it gets the next one in the list and tries to keep the same $maxn(10) process running in the background, until it runs out of the urls from the list($urlfile). There are inline comments to help understand.
$ cat wget.sh
#!/bin/bash
wget_bg()
{
> ./wget.pids # Start with empty pidfile
urlfile="$1"
maxn=$2
cnt=0;
while read -r url
do
if [ $cnt -lt $maxn ] && [ ! -z "$url" ]; then # Only maxn processes will run in the background
echo -n "wget $url ..."
wget "$url" &>/dev/null &
pidwget=$! # This gets the backgrounded pid
echo "$pidwget" >> ./wget.pids # fill pidfile
echo "pid[$pidwget]"
((cnt++));
fi
while [ $cnt -eq $maxn ] # Start monitoring as soon the maxn process hits
do
while read -r pids
do
if ps -p $pids > /dev/null; then # Check pid running
:
else
sed -i "/$pids/d" wget.pids # If not remove it from pidfile
((cnt--)); # decrement counter
fi
done < wget.pids
done
done < "$urlfile"
}
# This runs 10 wget processes at a time in the bg. Modify for more or less.
wget_bg ./test.txt 10
To run:
$ chmod u+x ./wget.sh
$ ./wget.sh
wget blah.com ...pid[13012]
wget whatever.com ...pid[13013]
wget thing.com ...pid[13014]
wget foo.com ...pid[13015]
wget bar.com ...pid[13016]
wget baz.com ...pid[13017]
wget steve.com ...pid[13018]
wget kendal.com ...pid[13019]

Add this in your if statement :
until wget -b $item do
printf '.'
sleep 2
done
The loop will wait process finished and print a "." every 2sec

Related

How to choose any number of elements through user input in bash?

I have created this script which currently is taking a list of arguments from command line but what I want to do is let the user pass any numerical value which would then start executing the loop for number of the times the user has asked. The script is run in the following way for example ./testing.sh launch 1 2 3 4 5 6 7 8. How can I make a user pass a numerical value like 8 which would then loop over the IPs instead of doing 1 2 3 4 5 6 7 8. Also is there a better way to deal with so many IPs that I have passed in the script like for example map them and read them from a file.
#!/bin/bash
#!/usr/bin/expect
ips=()
tarts=()
launch_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Launching Tart $1 ---- "
sshpass -p "tart123" ssh -Y -X -L 5900:$ip:5901 tarts#$ip <<EOF1
export DISPLAY=:1
gnome-terminal -e "bash -c \"pwd; cd /home/tarts; pwd; ./launch_tarts.sh exec bash\""
exit
EOF1
}
kill_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Killing Tart $1 ---- "
sshpass -p "tart123" ssh -tt -o StrictHostKeyChecking=no tarts#$ip <<EOF1
. ./tartsenvironfile.8.1.1.0
nohup yes | kill_tarts mcgdrv &
nohup yes | kill_tarts server &
pkill -f traf
pkill -f terminal-server
exit
EOF1
}
tarts_setup () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Setting-Up Tart $1 ---- "
sshpass -p "root12" ssh -tt -o StrictHostKeyChecking=no root#$ip <<EOF1
pwd
nohup yes | /etc/rc.d/init.d/lifconfig
su tarts
nohup yes | vncserver
sleep 10
exit
exit
EOF1
}
ip[1]=10.171.0.10
ip[2]=10.171.0.11
ip[3]=10.171.0.12
ip[4]=10.171.0.13
ip[5]=10.171.0.14
ip[6]=10.171.0.15
ip[7]=10.171.0.16
ip[8]=10.171.0.17
ip[9]=10.171.0.18
ip[10]=10.171.0.19
ip[11]=10.171.0.20
ip[12]=10.171.0.21
ip[13]=10.171.0.100
ip[14]=10.171.0.101
ip[15]=10.171.0.102
ip[16]=10.171.0.103
ip[17]=10.171.0.104
ip[18]=10.171.0.105
ip[19]=10.171.0.106
ip[20]=10.171.0.107
case $1 in
kill) function=kill_tarts;;
launch) function=launch_tarts;;
setup) function=tarts_setup;;
*) exit 1;;
esac
shift
for tart in "$#"; do
($function $tart) &
ips+=(${ip[tart]})
# echo $ips
tarts+=(${tart[#]})
# echo $tarts
done
wait
Can someone guide please?
Try changing the bottom loop to: for ((tart=1; tart<=$2; tart++)), then use like: ./testing.sh launch 8.
You you can put multiple variable declarations on one line, so you could split the ip list in to two or three columns.
Or use mapfile: mapfile -t ip < ip-list. You will need to use tart - 1 for the array index though, like "${ip[tart-1]}", as the array will start at 0, not 1.
You want the seq command:
for x in $(seq 5); do
echo $x
done
this will produce the output
1
2
3
4
5
Then just take the number of iterations you want as another parameter on the command line, and use that in place of the hard coded 5 in my example.
seq just generates a sequence of numbers. From the man page:
SYNOPSIS
seq [-w] [-f format] [-s string] [-t string] [first [incr]] last
DESCRIPTION
The seq utility prints a sequence of numbers, one per line >(default), from first (default 1), to near last as possible, in >increments of incr (default 1). When first is larger than last the >default incr
is -1.

How to execute ksh script from bash

I currently have a ksh script which invokes another ksh script. The "parent" ksh script needs to be invoked from a bash shell in the context of the ksh shell user. Trying the following throws back this error message
As user root in the bash shell
su - whics -c '/usr/bin/nohup /whics/t99/wv.4gm/wv99b.4gs/wv99b.sh -s 1 -m u -sleep 5 > ./nohup.out &'
/whics/t99/wv.4gm/wv99b.4gs/wv99b.sh[8]: .: wh_setENV.sh: cannot open [No such file or directory]
wh_setENV.sh is actually in /whics/t99/bin
However, when running the below commands in order I do not get this error
server:~ su - whics
server:/whics/t99 cd ./wv.4gm/wv99b.4gs
server:/whics/t99/wv.4gm/wv99b.4gs nohup ./wv99b.sh -s 1 -m u -sleep 5 &
server:/whics/t99/wv.4gm/wv99b.4gs nohup: ignoring input and appending output to `/home/whics/nohup.out'
[1] + Done nohup ./wv99b.sh -s 1 -m u -sleep 5 &
server:/whics/t99/wv.4gm/wv99b.4gs cat /home/whics/nohup.out Mon Sep 17 12:27:40 AEST 2018 : Start wv99b
wv99b.sh
#!/bin/ksh
# Copyright (C) 1992-1997 Wacher Pty. Limited
# Sccsid: %Z% %M%%Y% %Q%%I% %E%
myname=${0##*/} # a useful identifying variable
mydir=${0%$myname} # where this script is
vSFX=${myname##*.}
. wh_setENV.sh # P4813 - when using 4js:WebServices, the $fglidir/lib in LD_LIBRARY_PATH causes problems
test $debugxv && set -xv
#--------------------------------------------------------------------------------------------------------------------------------------#
wv99b_msg() {
vERR="`date` : ${vMSG}"
echo $vERR | tee -a ${vLOG}
}
#--------------------------------------------------------------------------------------------------------------------------------------#
wv99b_sysFragments() {
vSYSFRAGOK="0"
vSYSFRAGMENTS="${vTABNAME}.sysfrags.unl" ; rm -f $vSYSFRAGMENTS
$WH_ISQL $company - <<! 2>/dev/null | sed "/exprtext/d;/^$/d;s/ //g;s/[()]//g" |cut -f1 -d'=' >| ${vSYSFRAGMENTS}
select F.exprtext
from systables S, sysfragments F
where S.tabid > 99
and S.tabtype = "T"
and S.tabname = "${vTABNAME}"
and S.tabid = F.tabid
and S.tabtype = F.fragtype
and F.evalpos = 0
;
!
if [ -s ${vSYSFRAGMENTS} ] ; then
# search for the vCOLUMN in the vSYSFRAGMENTS output
vSYSFRAGOK=`grep -i ${vKEY} ${vSYSFRAGMENTS} 2>/dev/null | wc -l | awk '{print $1}'`
else
vSYSFRAGOK="0"
rm -f ${vSYSFRAGMENTS} # cleanup
fi
}
# MAIN #
vARGS="$#"
vHERE=`pwd`
vLOG="${vHERE}/errlog"
vD=0 # debug indicator
vI=0 # infile indicator
vQ=0 # email indicator
vM=0 # mode indicator
vS=0 # serial indicator
vNO_MULTI=0 # default to false
vNO_PROGI=0 # default to false
vTABLE=0 # default to 0
vSLEEP=5 # default to 0
for i in $vARGS
do
case "$i" in
-debug) vD=$2 ;;
-infile) vI=$2 ;;
-table) vTABLE=$2 ;;
-sleep) vSLEEP=$2 ;;
-no_multi) vNO_MULTI=$2 ;;
-no_progi) vNO_PROGI=$2 ;;
-m) vM=$2 ;;
-q) vQ=$2 ;;
-s) vS=$2 ;;
esac
shift
done
[[ ${vS} -eq 0 ]] && vMSG="-s parameter not supplied" && wv99b_msg && exit 1
vHERE=`pwd`
if [ ${vD} -eq 1 ] ; then
vDEBUG=" -debug 1"
else
vDEBUG=""
fi
if [ ${vI} -eq 0 ] ; then
vINFILE="wv99b.in"
else
vINFILE="${vI}"
fi
# INIT
vWVI="wv99b_I" # the name of the (I)dentify script
vWVIS="${vWVI}_${vS}" # the name of the (I)dentify script PLUS SERIAL
vWVIO="${vWVIS}.unl" # the name of the (I)dentify script
rm -f ${vWVIO}
# Check that transaction-logging is off
# check that vINFILE exists
if [ ! -s "${vINFILE}" ] ; then
vMSG="Error cannot read input file $vINFILE" ; wv99b_msg ; exit 1
fi
# Process only one(1) table
if [ ${vTABLE} != "0" ] ; then
vTABLE_FILTER=" -table ${vTABLE} "
else
vTABLE_FILTER=""
fi
# We need to check if we are running client/server
#
vDB=`echo $company | awk 'BEGIN {FS="#" } { print $1 }'`
vDBSRV=`echo $company | awk 'BEGIN {FS="#" } { print $2 }'`
case X${vDBSRV}X in
XX) vREMOTE_DB="" ;;
*) vREMOTE_DB=" -db ${vDB} -dbsrv ${vDBSRV} " ;;
esac
#_end
vMSG="Start wv99b" ; wv99b_msg
So in the wv99b.sh file, I changed
. wh_setENV.sh
to
. /whics/t99/bin/wh_setENV.sh
However, now I get the error
cannot read input file wv99b.in
I checked wv99b.in and it is in the same directory as 'wv99b.sh' (i.e. /whics/t99/wv.4gm/wv99b.4gs/ )
wh_setENV.sh
#!/usr/bin/ksh
test $debugxv && set -xv
trap door 1 2 3 5 9 15
#---------------------------------------------------------------------#
door() {
echo "`date` ERROR($?) occured in $0" >> $WH/batch.4gm/trap.log
} #end door
#---------------------------------------------------------------------#
# Script to set Environment variables for various scripts
# Stef
# Unix specific
umask 002
: ${WH:="/whics/prod"}
set -a
TERM=xterm
vHERE=`pwd`
TERMCAP=$WH/etc/termcap
vHOST=`hostname | cut -f1 -d'.'`
set +a
#LD_LIBRARY_PATH="$WH/lib.4gm/S_lib:$fglibdir/S_lib" # GUC R481
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$INFORMIXDIR/lib/c++:$INFORMIXDIR/lib/cli:$INFORMIXDIR/lib/client:$INFORMIXDIR/lib/csm:$INFORMIXDIR/lib/dmi"
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$INFORMIXDIR/lib:$INFORMIXDIR/lib/esql:$INFORMIXDIR/lib/tools.$CCODE"
export LD_LIBRARY_PATH
# EOF #
UPDATE: After OP modified/updated the question ...
wv99b.in is referenced in wv99b.sh
you don't provide the path for wv99b.in so ...
if you invoke wv99b.sh from some directory other than /whics/t99/wv.4gm/wv99b.4gs ...
when you run your test [ ! -s "${vINFILE}" ] your script (correctly) determines that "${vINFILE}" is not located in the directory where you invoked wv99b.sh from
Net result ... the original problem with sh_setENV.sh has been fixed but now the same exact problem is occurring for wv99b.in, with the same solution needed here ... invoke wv99b.sh from its home directory or provide the full path to wv99b.in.
ORIGINAL POST:
Expanding on Andre's comment ...
In wv99b.sh you have the following line:
. wh_setENV.sh
This is going to look for wh_setENV.sh in the directory where wv99b.sh is invoked from.
In your original example you've provided the full path to wv99b.sh but the question is ... what directory is that call invoked from? We can tell from the error ...
wh_setENV.sh: cannot open [No such file or directory]
... that wv99b.sh was not invoked from /whics/t99/wv.4gm/wv99b.4gs otherwise it would have found wh_setENV.sh.
From your second example it appears that the full path to wh_setENV.sh is: /whics/t99/wv.4gm/wv99b.4gs/wh_setENV.sh so you have a couple options:
in your initial example make sure you cd /whics/t99/wv.4gm/wv99b.4gs before executing wv99b.4gs
or
update wv99b.4gs to include the full path to the location of sh_setENV.sh, eg:
. /whics/t99/wv.4gm/wv99b.4gs/wh_setENV.sh

start an application with linux network namespaces using a bash function [duplicate]

This question already has answers here:
How can I execute a bash function using sudo?
(10 answers)
Closed 4 years ago.
I have this script for exec an application and control any error on startup but need to perform a better control on this and use "network namespaces" to redirect this app on the netns with id "controlnet". With the last line the scripts goes ok, but Im redirecting to a blank screen, after exit from this I can see the application running but isn't initialized on "controlnet" namespaces.
If manually make the steps all is ok:
sudo ip netns exec controlnet sudo -u $USER -i
cd /home/app-folder/
./hlds_run -game cstrike -pidfile ogp_game_startup.pid +map de_dust +ip 1.2.3.4 +port 27015 +maxplayers 12
How to add this lines to the full bash?
Script used:
#!/bin/bash
function startServer(){
NUMSECONDS=`expr $(date +%s)`
until ./hlds_run -game cstrike -pidfile ogp_game_startup.pid +map de_dust +ip 1.2.3.4 +port 27015 +maxplayers 14 ; do
let DIFF=(`date +%s` - "$NUMSECONDS")
if [ "$DIFF" -gt 15 ]; then
NUMSECONDS=`expr $(date +%s)`
echo "Server './hlds_run -game cstrike -pidfile ogp_game_startup.pid +map de_dust +ip 1.2.3.4 +port 27015 +maxplayers 12 ' crashed with exit code $?. Respawning..." >&2
fi
sleep 3
done
let DIFF=(`date +%s` - "$NUMSECONDS")
if [ ! -e "SERVER_STOPPED" ] && [ "$DIFF" -gt 15 ]; then
startServer
fi
}
sudo ip netns exec controlnet sudo -u myuser -i && cd /home/ && startServer
The key issue here is that sudo -u myuser -i starts a new shell session. Further commands, like cd /home, aren't run until in the shell session; instead, they're run after the shell session.
Thus, you need to move startServer into the sudo command, instead of running it after the sudo command.
One way to do this is by passing the code that should be run under sudo via a heredoc:
#!/bin/bash
sudo ip netns exec controlnet sudo -u myuser bash -s <<'EOF'
startServer() {
local endTime startTime retval
while :; do
startTime=$SECONDS
./hlds_run -game cstrike -pidfile ogp_game_startup.pid +map de_dust +ip 1.2.3.4 +port 27015 +maxplayers 14; retval=$?
endTime=$SECONDS
if (( (endTime - startTime) > 15 )); then
echo "Server crashed with exit code $retval. Respawning..." >&2
else
echo "Server exited with status $retval after less than 15 seconds" >&2
echo " not attempting to respawn" >&2
return "$retval"
fi
sleep 3
done
}
cd /home/ || exit
startServer
EOF
What's important here is that we're no longer running sudo -i and expecting the rest of the script to be fed into the escalated shell implicitly; instead, we're running bash -s (which reads script text to run from stdin), and passing both the text of the startServer function and a command that invokes it within that stdin stream.

download files parallely in a bash script

I am using the below logic to download 3 file from the array at once, once all 3 completed only the next 3 files will be picked up.
parallel=3
downLoad() {
while (( "$#" )); do
for (( i=0; i<$parallel; i++ )); do
echo "downloading ${1}..."
curl -s -o ${filename}.tar.gz <download_URL> &
shift
done
wait
echo "#################################"
done
}
downLoad ${layers[#]}
But how i am expecting is "at any point in time 3 downloads should run" - i mean suppose we sent 3 file-downloads to background and one among the 3 gets completed very soon because of very less size, i want another file from the queue should be send for download.
COMPLETE SCRIPT:
#!/bin/bash
set -eu
reg="registry.hub.docker.com"
repo="hjd48"
image="redhat"
name="${repo}/${image}"
tag="latest"
parallel=3
# Get auth token
token=$( curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:${name}:pull" | jq -r .token )
# Get layers
resp=$(curl -s -H "Authorization: Bearer $token" "https://${reg}/v2/${name}/manifests/${tag}" | jq -r .fsLayers[].blobSum )
layers=( $( echo $resp | tr ' ' '\n' | sort -u ) )
prun() {
PIDS=()
while (( "$#" )); do
if ( kill -0 ${PIDS[#]} 2>/dev/null ; [[ $(( ${#PIDS[#]} - $? )) -lt $parallel ]])
then
echo "Download: ${1}.tar.gz"
curl -s -o $1.tar.gz -L -H "Authorization: Bearer $token" "https://${reg}/v2/${name}/blobs/${1}" &
PIDS+=($!)
shift
fi
done
wait
}
prun ${layers[#]}
If you do not mind using xargs then you can:
xargs -I xxx -P 3 sleep xxx < sleep
and sleep is:
1
2
3
4
5
6
7
8
9
and if you watch the background with:
watch -n 1 -exec ps --forest -g -p your-Bash-pid
(sleep could be your array of link ) then you will see that 3 jobs are run in parallel and when one of these three is completed the next job is added. In fact always 3 jobs are running till the end of array.
sample output of watch(1):
12260 pts/3 S+ 0:00 \_ xargs -I xxx -P 3 sleep xxx
12263 pts/3 S+ 0:00 \_ sleep 1
12265 pts/3 S+ 0:00 \_ sleep 2
12267 pts/3 S+ 0:00 \_ sleep 3
xargs starts with 3 jobs and when one of them is finished it will add the next which bacomes:
12260 pts/3 S+ 0:00 \_ xargs -I xxx -P 3 sleep xxx
12265 pts/3 S+ 0:00 \_ sleep 2
12267 pts/3 S+ 0:00 \_ sleep 3
12269 pts/3 S+ 0:00 \_ sleep 4 # this one was added
I've done just this by using trap to handle SIGCHLD and start another transfer when one ends.
The difficult part is that once your script installs a SIGCHLD handler with that trap line, you can't create any child processes other than your transfer processes. For example, if your shell doesn't have a built-in echo, calling echo would spawn a child process that would cause you to start one more transfer when the echo process ends.
I don't have a copy available, but it was something like this:
startDownload() {
# only start another download if there are URLs left in
# in the array that haven't been downloaded yet
if [ ${ urls[ $fileno ] } ];
# start a curl download in the background and increment fileno
# so the next call downloads the next URL in the array
curl ... ${ urls[ $fileno ] } &
fileno=$((fileno+1))
fi
}
trap startDownload SIGCHLD
# start at file zero and set up an array
# of URLs to download
fileno=0
urls=...
parallel=3
# start the initial parallel downloads
# when one ends, the SIGCHLD will cause
# another one to be started if there are
# remaining URLs in the array
for (( i=0; i<$parallel; i++ )); do
startDownload
done
wait
That's not been tested at all, and probably has all kinds of errors.
I would read all provided filenames into three variables, and then process each stream separately, e.g.
PARALLEL=3
COUNTER=1
for FILENAME in $#
do
eval FILESTREAM${COUNTER}="\$FILESTREAM${COUNTER} \${FILENAME}"
COUNTER=`expr ${COUNTER} + 1`
if [ ${COUNTER} -gt ${PARALLEL} ]
then
COUNTER=1
fi
done
and now call the download function for each of the streams in parallel:
COUNTER=1
while [ ${COUNTER} -le ${PARALLEL} ]
do
eval "download \$FILESTREAM${COUNTER} &"
COUNTER=`expr ${COUNTER} + 1`
done
Besides implementing a parallel bash script from scratch, GNU parallel is an available tool to use which is quite suitable to perform these type of tasks.
parallel -j3 curl -s -o {}.tar.gz download_url ::: "${layers[#]}"
-j3 ensures a maximum of 3 jobs running at the same time
you can add an additional option --dry-run after parallel to make sure the built command is exactly as you want

how to write a process-pool bash shell

I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
My task can be started like:
myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
Use xargs:
xargs -P <maximum-number-of-process-at-a-time> -n <arguments-per-process> <command>
Details here.
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
# \brief the worker function that is called when we fork off worker processes
# \param[in] id the worker ID
# \param[in] job_queue the fifo to read jobs from
# \param[in] result_log the temporary log file to write exit codes to
function _job_pool_worker()
{
local id=$1
local job_queue=$2
local result_log=$3
local line=
exec 7<> ${job_queue}
while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
# workers block on the exclusive lock to read the job queue
flock --exclusive 7
read line <${job_queue}
flock --unlock 7
# the worker should exit if it sees the end-of-job marker or run the
# job otherwise and save its exit code to the result log.
if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
# write it one more time for the next sibling so that everyone
# will know we are exiting.
echo "${line}" >&7
else
_job_pool_echo "### _job_pool_worker-${id}: ${line}"
# run the job
{ ${line} ; }
# now check the exit code and prepend "ERROR" to the result log entry
# which we will use to count errors and then strip out later.
local result=$?
local status=
if [[ "${result}" != "0" ]]; then
status=ERROR
fi
# now write the error to the log, making sure multiple processes
# don't trample over each other.
exec 8<> ${result_log}
flock --exclusive 8
echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
flock --unlock 8
exec 8>&-
_job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
fi
done
exec 7>&-
}
You can get a copy of my solution at Github. Here's a sample program using my implementation.
#!/bin/bash
. job_pool.sh
function foobar()
{
# do something
true
}
# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0
# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false
# wait until all jobs complete before continuing
job_pool_wait
# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar
# don't forget to shut down the job pool
job_pool_shutdown
# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
Using GNU Parallel you can do:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
./configure && make && make install
Personal installation
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Test the installation
After this you should be able to do:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and print
the output when they complete.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
#!/bin/sh
./script1.sh &
./script2.sh &
./script3.sh &
./script4.sh &
I found the best solution proposed in A Foo Walks into a Bar... blog using build-in functionality of well know xargs tool
First create a file commands.txt with list of commands you want to execute
myprog taskname1
myprog taskname2
myprog taskname3
myprog taskname4
...
myprog taskname123
and then pipe it to xargs like this to execute in 4 processes pool:
cat commands.txt | xargs -I CMD --max-procs=4 bash -c CMD
you can modify no of process
Following #Parag Sardas' answer and the documentation linked here's a quick script you might want to add on your .bash_aliases.
Relinking the doc link because it's worth a read
#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt
if [ "$#" -eq 0 ]; then
echo "$0 <file> [max-procs=0]"
exit 1
fi
FILE=${1}
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD
I.e.
./xargs-parallel.sh jobs.txt 4 maximum of 4 processes read from jobs.txt
You could probably do something clever with signals.
Note this is only to illustrate the concept, and thus not thoroughly tested.
#!/usr/local/bin/bash
this_pid="$$"
jobs_running=0
sleep_pid=
# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM
# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
jobs_running=$(($jobs_running - 1))
if [ -n "${sleep_pid}" ]
then
kill -s SIGKILL "${sleep_pid}"
sleep_pid=
fi
}
# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
if [ ${jobs_running} -gt 3 ]
then
(
while true
do
sleep 999
done
) &
sleep_pid=$!
wait ${sleep_pid}
fi
# Launch the requested task, signalling the parent upon completion
(
"$#"
kill -s SIGALRM "${this_pid}"
) &
jobs_running=$((${jobs_running} + 1))
}
# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=$1
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
Main script will create symbolic links to executables following certain namimg convention. For example,
ln -s executable1 ./01-task.01
first prefix is for sorting and suffix identifies batch (01-04).
Now we spawn 4 shell scripts that would take batch number as input and do something like this
for t in $(ls ./*-task.$batch | sort ; do
t
rm t
done
Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:
./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...
Here is my solution. The idea is quite simple. I create a fifo as a semaphore, where each line stands for an available resource. When reading the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echoing anything to the queue.
function task() {
local task_no="$1"
# doing the actual task...
echo "Executing Task ${task_no}"
# which takes a long time
sleep 1
}
function execute_concurrently() {
local tasks="$1"
local ps_pool_size="$2"
# create an anonymous fifo as a Semaphore
local sema_fifo
sema_fifo="$(mktemp -u)"
mkfifo "${sema_fifo}"
exec 3<>"${sema_fifo}"
rm -f "${sema_fifo}"
# every 'x' stands for an available resource
for i in $(seq 1 "${ps_pool_size}"); do
echo 'x' >&3
done
for task_no in $(seq 1 "${tasks}"); do
read dummy <&3 # blocks util a resource is available
(
trap 'echo x >&3' EXIT # returns the resource on exit
task "${task_no}"
)&
done
wait # wait util all forked tasks have finished
}
execute_concurrently 10 4
The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}") sequence to the actual task queue you want to run.
I made my modifications based on methods introduced in this Writing a process pool in Bash.
#!/bin/bash
#set -e # this doesn't work here for some reason
POOL_SIZE=4 # number of workers running in parallel
#######################################################################
# populate jobs #
#######################################################################
declare -a jobs
for (( i = 1988; i < 2019; i++ )); do
jobs+=($i)
done
echo '################################################'
echo ' Launching jobs'
echo '################################################'
parallel() {
local proc procs jobs cur
jobs=("$#") # input jobs array
declare -a procs=() # processes array
cur=0 # current job idx
morework=true
while $morework; do
# if process array size < pool size, try forking a new proc
if [[ "${#procs[#]}" -lt "$POOL_SIZE" ]]; then
if [[ $cur -lt "${#jobs[#]}" ]]; then
proc=${jobs[$cur]}
echo "JOB ID = $cur; JOB = $proc."
###############
# do job here #
###############
sleep 3 &
# add to current running processes
procs+=("$!")
# move to the next job
((cur++))
else
morework=false
continue
fi
fi
for n in "${!procs[#]}"; do
kill -0 "${procs[n]}" 2>/dev/null && continue
# if process is not running anymore, remove from array
unset procs[n]
done
done
wait
}
parallel "${jobs[#]}"
xargs with -P and -L options does the job.
You can extract the idea from the example below:
#!/usr/bin/env bash
workers_pool_size=10
set -e
function doit {
cmds=""
for e in 4 8 16; do
for m in 1 2 3 4 5 6; do
cmd="python3 ./doit.py --m $m -e $e -m $m"
cmds="$cmd\n$cmds"
done
done
echo -e "All commands:\n$cmds"
echo "Workers pool size = $workers_pool_size"
echo -e "$cmds" | xargs -t -P $workers_pool_size -L 1 time > /dev/null
}
doit
#! /bin/bash
doSomething() {
<...>
}
getCompletedThreads() {
_runningThreads=("$#")
removableThreads=()
for pid in "${_runningThreads[#]}"; do
if ! ps -p $pid > /dev/null; then
removableThreads+=($pid)
fi
done
echo "$removableThreads"
}
releasePool() {
while [[ ${#runningThreads[#]} -eq $MAX_THREAD_NO ]]; do
echo "releasing"
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
else
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
echo "released"
fi
done
}
waitAllThreadComplete() {
while [[ ${#runningThreads[#]} -ne 0 ]]; do
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
fi
done
}
MAX_THREAD_NO=10
runningThreads=()
sequenceNo=0
for i in {1..36}; do
releasePool
((sequenceNo++))
echo "added $sequenceNo"
doSomething &
pid=$!
runningThreads+=($pid)
done
waitAllThreadComplete

Resources