Looping over IP addresses from a file using bash array - bash

I have a file in which I have given all the IP addresses. The file looks like following:
[asad.javed#tarts16 ~]#cat file.txt
10.171.0.201
10.171.0.202
10.171.0.203
10.171.0.204
10.171.0.205
10.171.0.206
10.171.0.207
10.171.0.208
I have been trying to loop over the IP addresses by doing the following:
launch_sipp () {
readarray -t sipps < file.txt
for i in "${!sipps[#]}";do
ip1=(${sipps[i]})
echo $ip1
sip=(${i[#]})
echo $sip
done
But when I try to access the array I get only the last IP address which is 10.171.0.208. This is how I am trying to access in the same function launch_sipp():
local sipp=$1
echo $sipp
Ip=(${ip1[*]})
echo $Ip
Currently I have IP addresses in the same script and I have other functions that are using those IPs:
launch_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Launching Tart $1 ---- "
sshpass -p "tart123" ssh -Y -X -L 5900:$ip:5901 tarts#$ip <<EOF1
export DISPLAY=:1
gnome-terminal -e "bash -c \"pwd; cd /home/tarts; pwd; ./launch_tarts.sh exec bash\""
exit
EOF1
}
kill_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Killing Tart $1 ---- "
sshpass -p "tart123" ssh -tt -o StrictHostKeyChecking=no tarts#$ip <<EOF1
. ./tartsenvironfile.8.1.1.0
nohup yes | kill_tarts mcgdrv &
nohup yes | kill_tarts server &
pkill -f traf
pkill -f terminal-server
exit
EOF1
}
ip[1]=10.171.0.10
ip[2]=10.171.0.11
ip[3]=10.171.0.12
ip[4]=10.171.0.13
ip[5]=10.171.0.14
case $1 in
kill) function=kill_tarts;;
launch) function=launch_tarts;;
*) exit 1;;
esac
shift
for ((tart=1; tart<=$1; tart++)); do
($function $tart) &
ips=(${ip[tart]})
tarts+=(${tart[#]})
done
wait
How can I use different list of IPs for a function created for different purpose from a file?

How about using GNU parallel? It's an incredibly powerful wonderful-to-know very popular free linux tool, easy to install.
Firstly, here's a basic parallel tool usage ex.:
$ parallel echo {} :::: list_of_ips.txt
# The four colons function as file input syntax.†
10.171.0.202
10.171.0.201
10.171.0.203
10.171.0.204
10.171.0.205
10.171.0.206
10.171.0.207
10.171.0.208
†(Specific to parallel; see parallel usage cheatsheet here]).
But you can replace echo with just about any as complex series of commands as you can imagine / calls to other scripts. parallel loops through the input it receives and performs (in parallel) the same operation on each input.
More specific to your question, you could replace echo simply with a command call to your script
Now you would no longer need to handle any looping through ip's itself, and instead be written designed for just a single IP input. parallel will handle running the program in parallel (you can custom set the number of concurrent jobs with option -j n for any int 'n')* .
*By default parallel sets the number of jobs to the number of vCPUs it automatically determines your machine has available.
$ parallel process_ip.sh :::: list_of_ips.txt

In pure Bash:
#!/bin/bash
while read ip; do
echo "$ip"
# ...
done < file.txt
Or in parallel:
#!/bin/bash
while read ip; do
(
sleep "0.$RANDOM" # random execution time
echo "$ip"
# ...
) &
done < file.txt
wait

Related

How to choose any number of elements through user input in bash?

I have created this script which currently is taking a list of arguments from command line but what I want to do is let the user pass any numerical value which would then start executing the loop for number of the times the user has asked. The script is run in the following way for example ./testing.sh launch 1 2 3 4 5 6 7 8. How can I make a user pass a numerical value like 8 which would then loop over the IPs instead of doing 1 2 3 4 5 6 7 8. Also is there a better way to deal with so many IPs that I have passed in the script like for example map them and read them from a file.
#!/bin/bash
#!/usr/bin/expect
ips=()
tarts=()
launch_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Launching Tart $1 ---- "
sshpass -p "tart123" ssh -Y -X -L 5900:$ip:5901 tarts#$ip <<EOF1
export DISPLAY=:1
gnome-terminal -e "bash -c \"pwd; cd /home/tarts; pwd; ./launch_tarts.sh exec bash\""
exit
EOF1
}
kill_tarts () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Killing Tart $1 ---- "
sshpass -p "tart123" ssh -tt -o StrictHostKeyChecking=no tarts#$ip <<EOF1
. ./tartsenvironfile.8.1.1.0
nohup yes | kill_tarts mcgdrv &
nohup yes | kill_tarts server &
pkill -f traf
pkill -f terminal-server
exit
EOF1
}
tarts_setup () {
local tart=$1
local ip=${ip[tart]}
echo " ---- Setting-Up Tart $1 ---- "
sshpass -p "root12" ssh -tt -o StrictHostKeyChecking=no root#$ip <<EOF1
pwd
nohup yes | /etc/rc.d/init.d/lifconfig
su tarts
nohup yes | vncserver
sleep 10
exit
exit
EOF1
}
ip[1]=10.171.0.10
ip[2]=10.171.0.11
ip[3]=10.171.0.12
ip[4]=10.171.0.13
ip[5]=10.171.0.14
ip[6]=10.171.0.15
ip[7]=10.171.0.16
ip[8]=10.171.0.17
ip[9]=10.171.0.18
ip[10]=10.171.0.19
ip[11]=10.171.0.20
ip[12]=10.171.0.21
ip[13]=10.171.0.100
ip[14]=10.171.0.101
ip[15]=10.171.0.102
ip[16]=10.171.0.103
ip[17]=10.171.0.104
ip[18]=10.171.0.105
ip[19]=10.171.0.106
ip[20]=10.171.0.107
case $1 in
kill) function=kill_tarts;;
launch) function=launch_tarts;;
setup) function=tarts_setup;;
*) exit 1;;
esac
shift
for tart in "$#"; do
($function $tart) &
ips+=(${ip[tart]})
# echo $ips
tarts+=(${tart[#]})
# echo $tarts
done
wait
Can someone guide please?
Try changing the bottom loop to: for ((tart=1; tart<=$2; tart++)), then use like: ./testing.sh launch 8.
You you can put multiple variable declarations on one line, so you could split the ip list in to two or three columns.
Or use mapfile: mapfile -t ip < ip-list. You will need to use tart - 1 for the array index though, like "${ip[tart-1]}", as the array will start at 0, not 1.
You want the seq command:
for x in $(seq 5); do
echo $x
done
this will produce the output
1
2
3
4
5
Then just take the number of iterations you want as another parameter on the command line, and use that in place of the hard coded 5 in my example.
seq just generates a sequence of numbers. From the man page:
SYNOPSIS
seq [-w] [-f format] [-s string] [-t string] [first [incr]] last
DESCRIPTION
The seq utility prints a sequence of numbers, one per line >(default), from first (default 1), to near last as possible, in >increments of incr (default 1). When first is larger than last the >default incr
is -1.

How to convert for loop to multiple job submission?

I submit a job to cluster using qsub SubmitJob.sh. It works well but takes a long time to finish. Inside of SubmitJob.sh there is for loop which runs sequentially. I would like to convert my for loop for parallel job submission, such that each of them submits a single job (SubmitJob.sh).
#!/bin/bash
#$ -S /bin/bash
#$ -V -cwd
#$ -e ./error.$JOB_NAME.$JOB_ID
#$ -o ./outpt.$JOB_NAME.$JOB_ID
#$ -l h_vmem=256g
##$ -q long
##$ -pe smp 4
#$ -l h_rt=24:00:00
cd /mydirectroy/
for ID in $(cat FilID.txt) ; do
Do_Somthing -n $ID -o /OutputDirectory/$ID
done
I had to do something like this once or twice. The generic idea is that you supply parts of a array as reference to a function and execute it as child processes. I choose to use the square root as divider, because the work load will grow linear to the amount of items to process.
#! /bin/bash
FILE="FilID.txt"
DATA=($(cat ${FILE}))
AMOUNT=${#DATA[#]}
RANGE=$(echo "sqrt(${AMOUNT})" | bc)
echo ${amount}
echo $range
function _child {
local -n numbers=$1
echo "From ${numbers[0]} to ${numbers[-1]}"
for n in ${numbers[#]}; do echo -n "$n, "; done
echo
}
for ((i=0; i<AMOUNT; i+=RANGE)) {
part=(${DATA[#]:$i:$RANGE})
_child part &
# wait
}
wait
exit 0
You can test the script by populating FilID.txt as follows. Uncomment the wait in the for loop for readable output.
$ seq 0 98 > FilID.txt
You might want to wait until every N child processes are finished before you start the next batch. Back when I executed the script, the load became too high and Linux choose to kill our virtual development environment :p
P.S. if FilID.txt contain spaces with filenames you have to set IFS=$'\n' or something.

Grep qstat output and copy files once done

I am using PBS job scheduler on my cluster. In bash,I would like to monitor the job status and once the job is done I would like to copy the results to a
certain location(/data/myfolder/)
My qstat output looks like this:
JobID Username Queue Jobname SessID NDS TSK Memory Time Status
----------------------------------------------------------------
717.XXXXXX user XXXX SS 2323283 1 24 122gb -- E
Thanks in advance
There is a script here that does this (for SGE). I started to excerpt just the relevant parts for you, but it will probably be easier for you to start with the full script and just insert the qsub commands inside the submit_job function, and then put the code you want for copying the results after the wait_job_finish command in the script. You can remove the log printing at the end if you want.
#!/bin/bash
# this script will submit a qsub job and check on host information for the cluster
# node which it ends up running on
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
submit_job () {
local job_name="$1"
qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
set -x
hostname
cat /etc/hosts
python -c "import socket; print socket.gethostbyname(socket.gethostname())"
# sleep 5000
E0F
}
wait_job_start () {
local job_id="$1"
printf "waiting for job to start"
while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
do
printf "."
sleep 1
done
printf "\n\n"
local node_name="$(get_node_name "$job_id")"
printf "Job is running on node $node_name \n\n"
}
wait_job_finish () {
local job_id="$1"
printf "waiting for job to finish"
while qstat | grep -q "$job_id"
do
printf "."
sleep 1
done
printf "\n\n"
}
check_for_job_submission () {
local job_id="$1"
if ! qstat | grep -q "$job_id" ; then
echo "its there"
else
echo "not there"
fi
}
get_node_name () {
local job_id="$1"
qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*#[^ ]*\).*$|\1|g'
}
# ~~~~~ RUN ~~~~~ #
printf "Submitting cluster job to get node hostname and IP\n\n"
job_name="get_node_hostnames"
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*|\1|g' )"
job_stdout_log="${job_name}.o${job_id}"
printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
wait_job_start "$job_id"
wait_job_finish "$job_id"
printf "\n\nReading log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && cat "$job_stdout_log"
printf "\n\nRemoving log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
Sidenote: If you like Python, there is a slightly more robust equivalent here
You'll probably have to do some little tweaks to both to adjust it for your PBS system, since this was written for SGE.
You can just look for " C " with grep, but you could also just use -o [hostname:]path to stream to the final destination, as long as you have your ssh keys set up from the node for your POSIX account.
If you end up doing grep, you should be a good citizen and limit your check frequency to once or twice a minute, so as not to contribute to server spam, which can impact performance.

Shell script port scanner

I would like to scan multiple ports in multiple hosts. I used this script but it takes long time to show the result.
#!/bin/bash
hosts=(
"server1"
"server2"
)
for host in "${hosts[#]}"
do
echo "=========================================="
echo "Scanning $host"
echo "=========================================="
for port in {21,22,80}
do
echo "" > /dev/tcp/$host/$port && echo "Port $port is open"
done 2>/dev/null
done
Some people suggested to use telnet or NetCat instead but i prefer to do it without installing any new packages. So, are there any ways to speed it up by multithreading or other way.
You could use GNU Parallel to run all the checks in parallel. I am not the best at using it, and #OleTange (the author) normally has to correct me but I keep trying. So, let's try your case, by building up to it slowly:
parallel echo {1} {2} ::: 192.168.0.1 192.168.0.8 ::: 21 22 80
192.168.0.8 22
192.168.0.8 80
192.168.0.8 21
192.168.0.1 80
192.168.0.1 22
192.168.0.1 21
looks kind of hopeful to me. Then I add in -k to keep the results in order, and I supply a function that takes those IP addresses and ports as arguments:
parallel -k 'echo "" > /dev/tcp/{1}/{2} && echo {1}:{2} is open' ::: 192.168.0.1 192.168.0.8 ::: 21 22 80 2>/dev/null
192.168.0.1:80 is open
192.168.0.8:21 is open
192.168.0.8:22 is open
192.168.0.8:80 is open
This will run 8 jobs in parallel if your CPU has 8 cores, however echo is not very resource intensive so you can probably run 32 in parallel, so add -j 32 after the -k.
If you wanted to stick closer to your own script, you can do it like this:
#!/bin/bash
hosts=(
"192.168.0.1"
"192.168.0.8"
)
for host in "${hosts[#]}"
do
for port in {21,22,80}
do
echo "(echo > /dev/tcp/$host/$port) 2>/dev/null && echo Host:$host Port:$port is open"
done
done | parallel -k -j 32
Basically, instead of running your commands, I am just sending them to the stdin of parallel so it can do its magic with them.
You could run all three pokes in the background, then wait for them all to finish, and probably slash the running time to 1/3.
for port in 21 22 80; do
echo "" > /dev/tcp/$host/$port 2>/dev/null &
pid[$port]=$!
done
for port in 21 22 80; do
wait $pid[$port] && echo "Port $port" is open"
done
You could add parallelism by running multiple hosts in the background, too, but that should be an obvious extension.
#!/bin/bash
function alarm {
local timeout=$1; shift;
# execute command, store PID
bash -c "$#" &
local pid=$!
# sleep for $timeout seconds, then attempt to kill PID
{
sleep "$timeout"
kill $pid 2> /dev/null
} &
wait $pid 2> /dev/null
return $?
}
function scan {
if [[ -z $1 || -z $2 ]]; then
echo "Usage: ./scanner <host> <port, ports, or port-range>"
echo "Example: ./scanner google.com 79-81"
return
fi
local host=$1
local ports=()
# store user-provided ports in array
case $2 in
*-*)
IFS=- read start end <<< "$2"
for ((port=start; port <= end; port++)); do
ports+=($port)
done
;;
*,*)
IFS=, read -ra ports <<< "$2"
;;
*)
ports+=($2)
;;
esac
# attempt to write to each port, print open if successful, closed if not
for port in "${ports[#]}"; do
alarm 1 "echo >/dev/tcp/$host/$port" &&
echo "$port/tcp open" ||
echo "$port/tcp closed"
done
}
scan $1 $2

how to write a process-pool bash shell

I have more than 10 tasks to execute, and the system restrict that there at most 4 tasks can run at the same time.
My task can be started like:
myprog taskname
How can I write a bash shell script to run these task. The most important thing is that when one task finish, the script can start another immediately, making the running tasks count remain 4 all the time.
Use xargs:
xargs -P <maximum-number-of-process-at-a-time> -n <arguments-per-process> <command>
Details here.
I chanced upon this thread while looking into writing my own process pool and particularly liked Brandon Horsley's solution, though I couldn't get the signals working right, so I took inspiration from Apache and decided to try a pre-fork model with a fifo as my job queue.
The following function is the function that the worker processes run when forked.
# \brief the worker function that is called when we fork off worker processes
# \param[in] id the worker ID
# \param[in] job_queue the fifo to read jobs from
# \param[in] result_log the temporary log file to write exit codes to
function _job_pool_worker()
{
local id=$1
local job_queue=$2
local result_log=$3
local line=
exec 7<> ${job_queue}
while [[ "${line}" != "${job_pool_end_of_jobs}" && -e "${job_queue}" ]]; do
# workers block on the exclusive lock to read the job queue
flock --exclusive 7
read line <${job_queue}
flock --unlock 7
# the worker should exit if it sees the end-of-job marker or run the
# job otherwise and save its exit code to the result log.
if [[ "${line}" == "${job_pool_end_of_jobs}" ]]; then
# write it one more time for the next sibling so that everyone
# will know we are exiting.
echo "${line}" >&7
else
_job_pool_echo "### _job_pool_worker-${id}: ${line}"
# run the job
{ ${line} ; }
# now check the exit code and prepend "ERROR" to the result log entry
# which we will use to count errors and then strip out later.
local result=$?
local status=
if [[ "${result}" != "0" ]]; then
status=ERROR
fi
# now write the error to the log, making sure multiple processes
# don't trample over each other.
exec 8<> ${result_log}
flock --exclusive 8
echo "${status}job_pool: exited ${result}: ${line}" >> ${result_log}
flock --unlock 8
exec 8>&-
_job_pool_echo "### _job_pool_worker-${id}: exited ${result}: ${line}"
fi
done
exec 7>&-
}
You can get a copy of my solution at Github. Here's a sample program using my implementation.
#!/bin/bash
. job_pool.sh
function foobar()
{
# do something
true
}
# initialize the job pool to allow 3 parallel jobs and echo commands
job_pool_init 3 0
# run jobs
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run sleep 3
job_pool_run foobar
job_pool_run foobar
job_pool_run /bin/false
# wait until all jobs complete before continuing
job_pool_wait
# more jobs
job_pool_run /bin/false
job_pool_run sleep 1
job_pool_run sleep 2
job_pool_run foobar
# don't forget to shut down the job pool
job_pool_shutdown
# check the $job_pool_nerrors for the number of jobs that exited non-zero
echo "job_pool_nerrors: ${job_pool_nerrors}"
Hope this helps!
Using GNU Parallel you can do:
cat tasks | parallel -j4 myprog
If you have 4 cores, you can even just do:
cat tasks | parallel myprog
From http://git.savannah.gnu.org/cgit/parallel.git/tree/README:
Full installation
Full installation of GNU Parallel is as simple as:
./configure && make && make install
Personal installation
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
Minimal installation
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Test the installation
After this you should be able to do:
parallel -j0 ping -nc 3 ::: foss.org.my gnu.org freenetproject.org
This will send 3 ping packets to 3 different hosts in parallel and print
the output when they complete.
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
I would suggest writing four scripts, each one of which executes a certain number of tasks in series. Then write another script that starts the four scripts in parallel. For instance, if you have scripts, script1.sh, script2.sh, script3.sh, and script4.sh, you could have a script called headscript.sh like so.
#!/bin/sh
./script1.sh &
./script2.sh &
./script3.sh &
./script4.sh &
I found the best solution proposed in A Foo Walks into a Bar... blog using build-in functionality of well know xargs tool
First create a file commands.txt with list of commands you want to execute
myprog taskname1
myprog taskname2
myprog taskname3
myprog taskname4
...
myprog taskname123
and then pipe it to xargs like this to execute in 4 processes pool:
cat commands.txt | xargs -I CMD --max-procs=4 bash -c CMD
you can modify no of process
Following #Parag Sardas' answer and the documentation linked here's a quick script you might want to add on your .bash_aliases.
Relinking the doc link because it's worth a read
#!/bin/bash
# https://stackoverflow.com/a/19618159
# https://stackoverflow.com/a/51861820
#
# Example file contents:
# touch /tmp/a.txt
# touch /tmp/b.txt
if [ "$#" -eq 0 ]; then
echo "$0 <file> [max-procs=0]"
exit 1
fi
FILE=${1}
MAX_PROCS=${2:-0}
cat $FILE | while read line; do printf "%q\n" "$line"; done | xargs --max-procs=$MAX_PROCS -I CMD bash -c CMD
I.e.
./xargs-parallel.sh jobs.txt 4 maximum of 4 processes read from jobs.txt
You could probably do something clever with signals.
Note this is only to illustrate the concept, and thus not thoroughly tested.
#!/usr/local/bin/bash
this_pid="$$"
jobs_running=0
sleep_pid=
# Catch alarm signals to adjust the number of running jobs
trap 'decrement_jobs' SIGALRM
# When a job finishes, decrement the total and kill the sleep process
decrement_jobs()
{
jobs_running=$(($jobs_running - 1))
if [ -n "${sleep_pid}" ]
then
kill -s SIGKILL "${sleep_pid}"
sleep_pid=
fi
}
# Check to see if the max jobs are running, if so sleep until woken
launch_task()
{
if [ ${jobs_running} -gt 3 ]
then
(
while true
do
sleep 999
done
) &
sleep_pid=$!
wait ${sleep_pid}
fi
# Launch the requested task, signalling the parent upon completion
(
"$#"
kill -s SIGALRM "${this_pid}"
) &
jobs_running=$((${jobs_running} + 1))
}
# Launch all of the tasks, this can be in a loop, etc.
launch_task task1
launch_task tast2
...
launch_task task99
This tested script runs 5 jobs at a time and will restart a new job as soon as it does (due to the kill of the sleep 10.9 when we get a SIGCHLD. A simpler version of this could use direct polling (change the sleep 10.9 to sleep 1 and get rid of the trap).
#!/usr/bin/bash
set -o monitor
trap "pkill -P $$ -f 'sleep 10\.9' >&/dev/null" SIGCHLD
totaljobs=15
numjobs=5
worktime=10
curjobs=0
declare -A pidlist
dojob()
{
slot=$1
time=$(echo "$RANDOM * 10 / 32768" | bc -l)
echo Starting job $slot with args $time
sleep $time &
pidlist[$slot]=`jobs -p %%`
curjobs=$(($curjobs + 1))
totaljobs=$(($totaljobs - 1))
}
# start
while [ $curjobs -lt $numjobs -a $totaljobs -gt 0 ]
do
dojob $curjobs
done
# Poll for jobs to die, restarting while we have them
while [ $totaljobs -gt 0 ]
do
for ((i=0;$i < $curjobs;i++))
do
if ! kill -0 ${pidlist[$i]} >&/dev/null
then
dojob $i
break
fi
done
sleep 10.9 >&/dev/null
done
wait
Other answer about 4 shell scripts does not fully satisfies me as it assumes that all tasks take approximatelu the same time and because it requires manual set up. But here is how I would improve it.
Main script will create symbolic links to executables following certain namimg convention. For example,
ln -s executable1 ./01-task.01
first prefix is for sorting and suffix identifies batch (01-04).
Now we spawn 4 shell scripts that would take batch number as input and do something like this
for t in $(ls ./*-task.$batch | sort ; do
t
rm t
done
Look at my implementation of job pool in bash: https://github.com/spektom/shell-utils/blob/master/jp.sh
For example, to run at most 3 processes of cURL when downloading from a lot of URLs, you can wrap your cURL commands as follows:
./jp.sh "My Download Pool" 3 curl http://site1/...
./jp.sh "My Download Pool" 3 curl http://site2/...
./jp.sh "My Download Pool" 3 curl http://site3/...
...
Here is my solution. The idea is quite simple. I create a fifo as a semaphore, where each line stands for an available resource. When reading the queue, the main process blocks if there is nothing left. And, we return the resource after the task is done by simply echoing anything to the queue.
function task() {
local task_no="$1"
# doing the actual task...
echo "Executing Task ${task_no}"
# which takes a long time
sleep 1
}
function execute_concurrently() {
local tasks="$1"
local ps_pool_size="$2"
# create an anonymous fifo as a Semaphore
local sema_fifo
sema_fifo="$(mktemp -u)"
mkfifo "${sema_fifo}"
exec 3<>"${sema_fifo}"
rm -f "${sema_fifo}"
# every 'x' stands for an available resource
for i in $(seq 1 "${ps_pool_size}"); do
echo 'x' >&3
done
for task_no in $(seq 1 "${tasks}"); do
read dummy <&3 # blocks util a resource is available
(
trap 'echo x >&3' EXIT # returns the resource on exit
task "${task_no}"
)&
done
wait # wait util all forked tasks have finished
}
execute_concurrently 10 4
The script above will run 10 tasks and 4 each time concurrently. You can change the $(seq 1 "${tasks}") sequence to the actual task queue you want to run.
I made my modifications based on methods introduced in this Writing a process pool in Bash.
#!/bin/bash
#set -e # this doesn't work here for some reason
POOL_SIZE=4 # number of workers running in parallel
#######################################################################
# populate jobs #
#######################################################################
declare -a jobs
for (( i = 1988; i < 2019; i++ )); do
jobs+=($i)
done
echo '################################################'
echo ' Launching jobs'
echo '################################################'
parallel() {
local proc procs jobs cur
jobs=("$#") # input jobs array
declare -a procs=() # processes array
cur=0 # current job idx
morework=true
while $morework; do
# if process array size < pool size, try forking a new proc
if [[ "${#procs[#]}" -lt "$POOL_SIZE" ]]; then
if [[ $cur -lt "${#jobs[#]}" ]]; then
proc=${jobs[$cur]}
echo "JOB ID = $cur; JOB = $proc."
###############
# do job here #
###############
sleep 3 &
# add to current running processes
procs+=("$!")
# move to the next job
((cur++))
else
morework=false
continue
fi
fi
for n in "${!procs[#]}"; do
kill -0 "${procs[n]}" 2>/dev/null && continue
# if process is not running anymore, remove from array
unset procs[n]
done
done
wait
}
parallel "${jobs[#]}"
xargs with -P and -L options does the job.
You can extract the idea from the example below:
#!/usr/bin/env bash
workers_pool_size=10
set -e
function doit {
cmds=""
for e in 4 8 16; do
for m in 1 2 3 4 5 6; do
cmd="python3 ./doit.py --m $m -e $e -m $m"
cmds="$cmd\n$cmds"
done
done
echo -e "All commands:\n$cmds"
echo "Workers pool size = $workers_pool_size"
echo -e "$cmds" | xargs -t -P $workers_pool_size -L 1 time > /dev/null
}
doit
#! /bin/bash
doSomething() {
<...>
}
getCompletedThreads() {
_runningThreads=("$#")
removableThreads=()
for pid in "${_runningThreads[#]}"; do
if ! ps -p $pid > /dev/null; then
removableThreads+=($pid)
fi
done
echo "$removableThreads"
}
releasePool() {
while [[ ${#runningThreads[#]} -eq $MAX_THREAD_NO ]]; do
echo "releasing"
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
else
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
echo "released"
fi
done
}
waitAllThreadComplete() {
while [[ ${#runningThreads[#]} -ne 0 ]]; do
removableThreads=( $(getCompletedThreads "${runningThreads[#]}") )
for removableThread in "${removableThreads[#]}"; do
runningThreads=( ${runningThreads[#]/$removableThread} )
done
if [ ${#removableThreads[#]} -eq 0 ]; then
sleep 0.2
fi
done
}
MAX_THREAD_NO=10
runningThreads=()
sequenceNo=0
for i in {1..36}; do
releasePool
((sequenceNo++))
echo "added $sequenceNo"
doSomething &
pid=$!
runningThreads+=($pid)
done
waitAllThreadComplete

Resources