I have created this script which currently is taking a list of arguments from command line but what I want to do is let the user pass any numerical value which would then start executing the loop for number of the times the user has asked. The script is run in the following way for example ./testing.sh launch 1 2 3 4 5 6 7 8. How can I make a user pass a numerical value like 8 which would then loop over the IPs instead of doing 1 2 3 4 5 6 7 8. Also is there a better way to deal with so many IPs that I have passed in the script like for example map them and read them from a file.
#!/bin/bash
#!/usr/bin/expect
ips=()
tarts=()
launch_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Launching Tart $1 ---- "
sshpass -p "tart123" ssh -Y -X -L 5900:$ip:5901 tarts#$ip <<EOF1
export DISPLAY=:1
gnome-terminal -e "bash -c \"pwd; cd /home/tarts; pwd; ./launch_tarts.sh exec bash\""
exit
EOF1
}
kill_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Killing Tart $1 ---- "
sshpass -p "tart123" ssh -tt -o StrictHostKeyChecking=no tarts#$ip <<EOF1
. ./tartsenvironfile.8.1.1.0
nohup yes | kill_tarts mcgdrv &
nohup yes | kill_tarts server &
pkill -f traf
pkill -f terminal-server
exit
EOF1
}
tarts_setup () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Setting-Up Tart $1 ---- "
sshpass -p "root12" ssh -tt -o StrictHostKeyChecking=no root#$ip <<EOF1
pwd
nohup yes | /etc/rc.d/init.d/lifconfig
su tarts
nohup yes | vncserver
sleep 10
exit
exit
EOF1
}
ip[1]=10.171.0.10
ip[2]=10.171.0.11
ip[3]=10.171.0.12
ip[4]=10.171.0.13
ip[5]=10.171.0.14
ip[6]=10.171.0.15
ip[7]=10.171.0.16
ip[8]=10.171.0.17
ip[9]=10.171.0.18
ip[10]=10.171.0.19
ip[11]=10.171.0.20
ip[12]=10.171.0.21
ip[13]=10.171.0.100
ip[14]=10.171.0.101
ip[15]=10.171.0.102
ip[16]=10.171.0.103
ip[17]=10.171.0.104
ip[18]=10.171.0.105
ip[19]=10.171.0.106
ip[20]=10.171.0.107
case $1 in
kill) function=kill_tarts;;
launch) function=launch_tarts;;
setup) function=tarts_setup;;
*) exit 1;;
esac
shift
for tart in "$#"; do
($function $tart) &
ips+=(${ip[tart]})
# echo $ips
tarts+=(${tart[#]})
# echo $tarts
done
wait
Can someone guide please?
Try changing the bottom loop to: for ((tart=1; tart<=$2; tart++)), then use like: ./testing.sh launch 8.
You you can put multiple variable declarations on one line, so you could split the ip list in to two or three columns.
Or use mapfile: mapfile -t ip < ip-list. You will need to use tart - 1 for the array index though, like "${ip[tart-1]}", as the array will start at 0, not 1.
You want the seq command:
for x in $(seq 5); do
echo $x
done
this will produce the output
1
2
3
4
5
Then just take the number of iterations you want as another parameter on the command line, and use that in place of the hard coded 5 in my example.
seq just generates a sequence of numbers. From the man page:
SYNOPSIS
seq [-w] [-f format] [-s string] [-t string] [first [incr]] last
DESCRIPTION
The seq utility prints a sequence of numbers, one per line >(default), from first (default 1), to near last as possible, in >increments of incr (default 1). When first is larger than last the >default incr
is -1.
Related
I have a file in which I have given all the IP addresses. The file looks like following:
[asad.javed#tarts16 ~]#cat file.txt
10.171.0.201
10.171.0.202
10.171.0.203
10.171.0.204
10.171.0.205
10.171.0.206
10.171.0.207
10.171.0.208
I have been trying to loop over the IP addresses by doing the following:
launch_sipp () {
readarray -t sipps < file.txt
for i in "${!sipps[#]}";do
ip1=(${sipps[i]})
echo $ip1
sip=(${i[#]})
echo $sip
done
But when I try to access the array I get only the last IP address which is 10.171.0.208. This is how I am trying to access in the same function launch_sipp():
local sipp=$1
echo $sipp
Ip=(${ip1[*]})
echo $Ip
Currently I have IP addresses in the same script and I have other functions that are using those IPs:
launch_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Launching Tart $1 ---- "
sshpass -p "tart123" ssh -Y -X -L 5900:$ip:5901 tarts#$ip <<EOF1
export DISPLAY=:1
gnome-terminal -e "bash -c \"pwd; cd /home/tarts; pwd; ./launch_tarts.sh exec bash\""
exit
EOF1
}
kill_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Killing Tart $1 ---- "
sshpass -p "tart123" ssh -tt -o StrictHostKeyChecking=no tarts#$ip <<EOF1
. ./tartsenvironfile.8.1.1.0
nohup yes | kill_tarts mcgdrv &
nohup yes | kill_tarts server &
pkill -f traf
pkill -f terminal-server
exit
EOF1
}
ip[1]=10.171.0.10
ip[2]=10.171.0.11
ip[3]=10.171.0.12
ip[4]=10.171.0.13
ip[5]=10.171.0.14
case $1 in
kill) function=kill_tarts;;
launch) function=launch_tarts;;
*) exit 1;;
esac
shift
for ((tart=1; tart<=$1; tart++)); do
($function $tart) &
ips=(${ip[tart]})
tarts+=(${tart[#]})
done
wait
How can I use different list of IPs for a function created for different purpose from a file?
How about using GNU parallel? It's an incredibly powerful wonderful-to-know very popular free linux tool, easy to install.
Firstly, here's a basic parallel tool usage ex.:
$ parallel echo {} :::: list_of_ips.txt
# The four colons function as file input syntax.†
10.171.0.202
10.171.0.201
10.171.0.203
10.171.0.204
10.171.0.205
10.171.0.206
10.171.0.207
10.171.0.208
†(Specific to parallel; see parallel usage cheatsheet here]).
But you can replace echo with just about any as complex series of commands as you can imagine / calls to other scripts. parallel loops through the input it receives and performs (in parallel) the same operation on each input.
More specific to your question, you could replace echo simply with a command call to your script
Now you would no longer need to handle any looping through ip's itself, and instead be written designed for just a single IP input. parallel will handle running the program in parallel (you can custom set the number of concurrent jobs with option -j n for any int 'n')* .
*By default parallel sets the number of jobs to the number of vCPUs it automatically determines your machine has available.
$ parallel process_ip.sh :::: list_of_ips.txt
In pure Bash:
#!/bin/bash
while read ip; do
echo "$ip"
# ...
done < file.txt
Or in parallel:
#!/bin/bash
while read ip; do
(
sleep "0.$RANDOM" # random execution time
echo "$ip"
# ...
) &
done < file.txt
wait
I would like to expand a little more on "Bash - How to pass arguments to a script that is read via standard input" post.
I would like to create a script that takes standard input and runs it remotely while passing arguments to it.
Simplified contents of the script that I'm building:
ssh server_name bash <&0
How do I take the following method of accepting arguments and apply it to my script?
cat script.sh | bash /dev/stdin arguments
Maybe I am doing this incorrectly, please provide alternate solutions as well.
Try this:
cat script.sh | ssh some_server bash -s - <arguments>
ssh shouldn't make a difference:
$ cat do_x
#!/bin/sh
arg1=$1
arg2=$2
all_cmdline=$*
read arg2_from_stdin
echo "arg1: ${arg1}"
echo "arg2: ${arg2}"
echo "all_cmdline: ${all_cmdline}"
echo "arg2_from_stdin: ${arg2_from_stdin}"
$ echo 'a b c' > some_file
$ ./do_x 1 2 3 4 5 < some_file
arg1: 1
arg2: 2
all_cmdline: 1 2 3 4 5
arg2_from_stdin: a b c
$ ssh some-server do_x 1 2 3 4 5 < some_file
arg1: 1
arg2: 2
all_cmdline: 1 2 3 4 5
arg2_from_stdin: a b c
This variant on ccarton's answer also seems to work well:
ssh some_server bash -s - < script.sh <arguments>
I am using PBS job scheduler on my cluster. In bash,I would like to monitor the job status and once the job is done I would like to copy the results to a
certain location(/data/myfolder/)
My qstat output looks like this:
JobID Username Queue Jobname SessID NDS TSK Memory Time Status
----------------------------------------------------------------
717.XXXXXX user XXXX SS 2323283 1 24 122gb -- E
Thanks in advance
There is a script here that does this (for SGE). I started to excerpt just the relevant parts for you, but it will probably be easier for you to start with the full script and just insert the qsub commands inside the submit_job function, and then put the code you want for copying the results after the wait_job_finish command in the script. You can remove the log printing at the end if you want.
#!/bin/bash
# this script will submit a qsub job and check on host information for the cluster
# node which it ends up running on
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
submit_job () {
local job_name="$1"
qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
set -x
hostname
cat /etc/hosts
python -c "import socket; print socket.gethostbyname(socket.gethostname())"
# sleep 5000
E0F
}
wait_job_start () {
local job_id="$1"
printf "waiting for job to start"
while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
do
printf "."
sleep 1
done
printf "\n\n"
local node_name="$(get_node_name "$job_id")"
printf "Job is running on node $node_name \n\n"
}
wait_job_finish () {
local job_id="$1"
printf "waiting for job to finish"
while qstat | grep -q "$job_id"
do
printf "."
sleep 1
done
printf "\n\n"
}
check_for_job_submission () {
local job_id="$1"
if ! qstat | grep -q "$job_id" ; then
echo "its there"
else
echo "not there"
fi
}
get_node_name () {
local job_id="$1"
qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*#[^ ]*\).*$|\1|g'
}
# ~~~~~ RUN ~~~~~ #
printf "Submitting cluster job to get node hostname and IP\n\n"
job_name="get_node_hostnames"
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*|\1|g' )"
job_stdout_log="${job_name}.o${job_id}"
printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
wait_job_start "$job_id"
wait_job_finish "$job_id"
printf "\n\nReading log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && cat "$job_stdout_log"
printf "\n\nRemoving log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
Sidenote: If you like Python, there is a slightly more robust equivalent here
You'll probably have to do some little tweaks to both to adjust it for your PBS system, since this was written for SGE.
You can just look for " C " with grep, but you could also just use -o [hostname:]path to stream to the final destination, as long as you have your ssh keys set up from the node for your POSIX account.
If you end up doing grep, you should be a good citizen and limit your check frequency to once or twice a minute, so as not to contribute to server spam, which can impact performance.
I'm not used to writing code in bash but I'm self teaching myself. I'm trying to create a script that will query info from the process list. I've done that but I want to take it further and make it so:
The script runs with one set of commands if A OS is present.
The script runs with a different set of commands if B OS is present.
Here's what I have so far. It works on my Centos distro but won't work on my Ubuntu. Any help is greatly appreciated.
#!/bin/bash
pid=$(ps -eo pmem,pid | sort -nr -k 1 | cut -d " " -f 2 | head -1)
howmany=$(lsof -l -n -p $pid | wc -l)
nameofprocess=$(ps -eo pmem,fname | sort -nr -k 1 | cut -d " " -f 2 | head -1)
percent=$(ps -eo pmem,pid,fname | sort -k 1 -nr | head -1 | cut -d " " -f 1)
lsof -l -n -p $pid > ~/`date "+%Y-%m-%d-%H%M"`.process.log 2>&1
echo " "
echo "$nameofprocess has $howmany files open, and is using $percent"%" of memory."
echo "-----------------------------------"
echo "A log has been created in your home directory"
echo "-----------------------------------"
echo " "
echo ""$USER", do you want to terminate? (y/n)"
read yn
case $yn in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
Here's my version of your script. It works with Ubuntu and Debian. It's probably safer than yours in some regards (I clearly had a bug in yours when a process takes more than 10% of memory, due to your awkward cut). Moreover, your ps are not "atomic", so things can change between different calls of ps.
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
mapfile -t openfiles < <(lsof -l -n -p $pid)
howmany=${#openfiles[#]}
printf '%s\n' "${openfiles[#]}" > ~/$(date "+%Y-%m-%d-%H%M.process.log")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
First, check that your version of ps has the --sort flag and the h option:
--sort=-pmem tells ps to sort wrt decreasing pmem
h tells ps to not show any header
All this is given to the read bash builtin, which reads space-separated fields, here the fields pmem, pid, fname and puts these values in the corresponding variables percent, pid and nameofprocess.
The mapfile command reads standard input (here the output of the lsof command) and puts each line in an array field. The size of this array is computed by the line howmany=${#openfiles[#]}. The output of lsof, as stored in the array openfiles is output to the corresponing file.
Then, instead of the many echos, we use a cat <<EOF, and then the read is use with the -p (prompt) option.
I don't know if this really answers your question, but at least, you have a well-written bash script, with less multiple useless command calls (until your case statement, you called 16 processes, I only called 4). Moreover, after the first ps call, things can change in your script (even though it's very unlikely to happen), not in mine.
You might also like the following which doesn't put the output of lsof in an array, but uses an extra wc command:
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
logfilename="~/$(date "+%Y-%m-%d-%H%M.process.log")
lsof -l -n -p $pid > "$logfilename"
howmany=$(wc -l < "$logfilename")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory ($logfilename)
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
You could achieve this for example by (update)
#!/bin/bash
# place distribution independent code here
# dist=$(lsb_release -is)
if [[ -f /etc/redheat-release ]];
then # this is a RedHead based distribution like centos, fedora, ...
dist="redhead"
elif [[ -f /etc/issue.net ]];
then
# dist=$(cat /etc/issue.net | cut -d' ' -f1) # debian, ubuntu, ...
dist="ubuntu"
else
dist="unknown"
fi
if [[ $dist == "ubuntu" ]];
then
# use your ubuntu command set
elif [[ $dist == "redhead" ]];
then
# use your centos command set
else
# do some magic here
fi
# place distribution independent code here
I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
My task can be started like:
myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
Use xargs:
xargs -P <maximum-number-of-process-at-a-time> -n <arguments-per-process> <command>
Details here.
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
# \brief the worker function that is called when we fork off worker processes
# \param[in] id the worker ID
# \param[in] job_queue the fifo to read jobs from
# \param[in] result_log the temporary log file to write exit codes to
function _job_pool_worker()
{
local id=$1
local job_queue=$2
local result_log=$3
local line=
exec 7<> ${job_queue}
while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
# workers block on the exclusive lock to read the job queue
flock --exclusive 7
read line <${job_queue}
flock --unlock 7
# the worker should exit if it sees the end-of-job marker or run the
# job otherwise and save its exit code to the result log.
if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
# write it one more time for the next sibling so that everyone
# will know we are exiting.
echo "${line}" >&7
else
_job_pool_echo "### _job_pool_worker-${id}: ${line}"
# run the job
{ ${line} ; }
# now check the exit code and prepend "ERROR" to the result log entry
# which we will use to count errors and then strip out later.
local result=$?
local status=
if [[ "${result}" != "0" ]]; then
status=ERROR
fi
# now write the error to the log, making sure multiple processes
# don't trample over each other.
exec 8<> ${result_log}
flock --exclusive 8
echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
flock --unlock 8
exec 8>&-
_job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
fi
done
exec 7>&-
}
You can get a copy of my solution at Github. Here's a sample program using my implementation.
#!/bin/bash
. job_pool.sh
function foobar()
{
# do something
true
}
# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0
# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false
# wait until all jobs complete before continuing
job_pool_wait
# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar
# don't forget to shut down the job pool
job_pool_shutdown
# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
Using GNU Parallel you can do:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
./configure && make && make install
Personal installation
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Test the installation
After this you should be able to do:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and print
the output when they complete.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
#!/bin/sh
./script1.sh &
./script2.sh &
./script3.sh &
./script4.sh &
I found the best solution proposed in A Foo Walks into a Bar... blog using build-in functionality of well know xargs tool
First create a file commands.txt with list of commands you want to execute
myprog taskname1
myprog taskname2
myprog taskname3
myprog taskname4
...
myprog taskname123
and then pipe it to xargs like this to execute in 4 processes pool:
cat commands.txt | xargs -I CMD --max-procs=4 bash -c CMD
you can modify no of process
Following #Parag Sardas' answer and the documentation linked here's a quick script you might want to add on your .bash_aliases.
Relinking the doc link because it's worth a read
#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt
if [ "$#" -eq 0 ]; then
echo "$0 <file> [max-procs=0]"
exit 1
fi
FILE=${1}
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD
I.e.
./xargs-parallel.sh jobs.txt 4 maximum of 4 processes read from jobs.txt
You could probably do something clever with signals.
Note this is only to illustrate the concept, and thus not thoroughly tested.
#!/usr/local/bin/bash
this_pid="$$"
jobs_running=0
sleep_pid=
# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM
# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
jobs_running=$(($jobs_running - 1))
if [ -n "${sleep_pid}" ]
then
kill -s SIGKILL "${sleep_pid}"
sleep_pid=
fi
}
# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
if [ ${jobs_running} -gt 3 ]
then
(
while true
do
sleep 999
done
) &
sleep_pid=$!
wait ${sleep_pid}
fi
# Launch the requested task, signalling the parent upon completion
(
"$#"
kill -s SIGALRM "${this_pid}"
) &
jobs_running=$((${jobs_running} + 1))
}
# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=$1
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
Main script will create symbolic links to executables following certain namimg convention. For example,
ln -s executable1 ./01-task.01
first prefix is for sorting and suffix identifies batch (01-04).
Now we spawn 4 shell scripts that would take batch number as input and do something like this
for t in $(ls ./*-task.$batch | sort ; do
t
rm t
done
Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:
./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...
Here is my solution. The idea is quite simple. I create a fifo as a semaphore, where each line stands for an available resource. When reading the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echoing anything to the queue.
function task() {
local task_no="$1"
# doing the actual task...
echo "Executing Task ${task_no}"
# which takes a long time
sleep 1
}
function execute_concurrently() {
local tasks="$1"
local ps_pool_size="$2"
# create an anonymous fifo as a Semaphore
local sema_fifo
sema_fifo="$(mktemp -u)"
mkfifo "${sema_fifo}"
exec 3<>"${sema_fifo}"
rm -f "${sema_fifo}"
# every 'x' stands for an available resource
for i in $(seq 1 "${ps_pool_size}"); do
echo 'x' >&3
done
for task_no in $(seq 1 "${tasks}"); do
read dummy <&3 # blocks util a resource is available
(
trap 'echo x >&3' EXIT # returns the resource on exit
task "${task_no}"
)&
done
wait # wait util all forked tasks have finished
}
execute_concurrently 10 4
The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}") sequence to the actual task queue you want to run.
I made my modifications based on methods introduced in this Writing a process pool in Bash.
#!/bin/bash
#set -e # this doesn't work here for some reason
POOL_SIZE=4 # number of workers running in parallel
#######################################################################
# populate jobs #
#######################################################################
declare -a jobs
for (( i = 1988; i < 2019; i++ )); do
jobs+=($i)
done
echo '################################################'
echo ' Launching jobs'
echo '################################################'
parallel() {
local proc procs jobs cur
jobs=("$#") # input jobs array
declare -a procs=() # processes array
cur=0 # current job idx
morework=true
while $morework; do
# if process array size < pool size, try forking a new proc
if [[ "${#procs[#]}" -lt "$POOL_SIZE" ]]; then
if [[ $cur -lt "${#jobs[#]}" ]]; then
proc=${jobs[$cur]}
echo "JOB ID = $cur; JOB = $proc."
###############
# do job here #
###############
sleep 3 &
# add to current running processes
procs+=("$!")
# move to the next job
((cur++))
else
morework=false
continue
fi
fi
for n in "${!procs[#]}"; do
kill -0 "${procs[n]}" 2>/dev/null && continue
# if process is not running anymore, remove from array
unset procs[n]
done
done
wait
}
parallel "${jobs[#]}"
xargs with -P and -L options does the job.
You can extract the idea from the example below:
#!/usr/bin/env bash
workers_pool_size=10
set -e
function doit {
cmds=""
for e in 4 8 16; do
for m in 1 2 3 4 5 6; do
cmd="python3 ./doit.py --m $m -e $e -m $m"
cmds="$cmd\n$cmds"
done
done
echo -e "All commands:\n$cmds"
echo "Workers pool size = $workers_pool_size"
echo -e "$cmds" | xargs -t -P $workers_pool_size -L 1 time > /dev/null
}
doit
#! /bin/bash
doSomething() {
<...>
}
getCompletedThreads() {
_runningThreads=("$#")
removableThreads=()
for pid in "${_runningThreads[#]}"; do
if ! ps -p $pid > /dev/null; then
removableThreads+=($pid)
fi
done
echo "$removableThreads"
}
releasePool() {
while [[ ${#runningThreads[#]} -eq $MAX_THREAD_NO ]]; do
echo "releasing"
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
else
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
echo "released"
fi
done
}
waitAllThreadComplete() {
while [[ ${#runningThreads[#]} -ne 0 ]]; do
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
fi
done
}
MAX_THREAD_NO=10
runningThreads=()
sequenceNo=0
for i in {1..36}; do
releasePool
((sequenceNo++))
echo "added $sequenceNo"
doSomething &
pid=$!
runningThreads+=($pid)
done
waitAllThreadComplete