Slurm wait option: show time waiting - time

When you apply the --wait flag in a slurm script, is it possible to display how long it has been waiting in real time?

When sbatch is used with the --wait option, the command does not exit until the submitted job terminates.
There is no additional option available to show the pending time.
However, you can open another session and execute the following command to display the pending time (in seconds) if the job is still in pending state:
squeue --Format=PendingTime -j <jobid> --noheader
One time display
If you simply wish to know the elapsed time before the job was scheduled, you can add the following line in your batch script:
echo "waited: $(squeue --Format=PendingTime -j $SLURM_JOB_ID --noheader | tr -d ' ')s"
Note: the tr command is used here to delete the trailing spaces added by squeue
Real time counter
If you would like to display the elapsed time in real time you can remove the --wait option and use a sbatch-wrapper such as:
#!/bin/sh
# Time before issuing another squeue command
# XXX: Ensure this is large enough to avoid flooding the Slurm controller
WAIT=20
# Convert seconds to days:hours:minutes:seconds format
seconds_to_days()
{
printf '%dd:%dh:%dm:%ds\n' $(($1/86400)) $(($1%86400/3600)) $(($1%3600/60)) $(($1%60))
}
# Convert days-hours:minutes:seconds time format to seconds
squeue_time_to_seconds()
{
local time=$(echo $1 | tr -d ' ') # Removing spaces
# Print input and return if the time format is not recongized
echo $time | grep -q ':' ||
{
printf "$time"
return
}
# Check if time contains hours, otherwise add 0 hour
[ $(echo $time | awk -F: '{print NF-1}') -eq 2 ] || time="0:$time"
# Check if time contains days, otherwise add 0 day
echo $time | grep -q '-' || time="0-$time"
# Parse and convert to seconds
echo $time | tr '-' ':' |
awk -F: '{ print ($1 * 86400) + ($2 * 3600) + ($3 * 60) + $4 }'
}
# Poll job counter with squeue
squeue_polling()
{
local counter=$1
local counter_description=$2
local jobid=$3
local prev_time="-${WAIT}"
while true; do
elapsed_time=$(squeue --Format=$counter -j $jobid --noheader || exit $?)
elapsed_time=$(squeue_time_to_seconds "$elapsed_time")
# Return in case no counter is found
if [ -z "$elapsed_time" ]; then
echo; return
fi
# Update one more time the counter if it is not progressing anymore
if [ "$elapsed_time" -lt "$((prev_time + WAIT ))" ]; then
printf "\33[2K\r$counter_description: $(seconds_to_days $prev_time)\n"
return
fi
# Update the counter without calling squeue to release the pressure on
# the Slurm controller
for i in $(seq 1 $WAIT); do
printf "\33[2K\r$counter_description: $(seconds_to_days $(($elapsed_time + i)))"
sleep 1
done
prev_time=$elapsed_time
done
}
# Execute sbatch and display the output
OUTPUT=$(sbatch $#)
echo $OUTPUT
# Exit on error
if [ $? -ne 0 ]; then
exit $?
fi
# Parse the job ID
JOBID=$(echo $OUTPUT | sed -rn 's/Submitted batch job ([0-9]+)/\1/p')
# Display pending time until the job is scheduled
squeue_polling 'PendingTime' 'Pending time' $JOBID
# Display the time used by the allocation until the job is over
squeue_polling 'TimeUsed' 'Allocation time' $JOBID
It will act as if you submitted the job with the --wait flag (i.e. will return when the job completed). The pending time will be updated in real time
./sbatch-wait <options> <batch script>
Submitted batch job 42
Pending time: 0d:0h:1m:0s
Allocation time: 0d:0h:1m:23s

An easy way is to (ab)use the pv command like this:
sbatch --wait ... | pv -t
It will look like this:
$ sbatch --wait --wrap "sleep 30" | pv -t
Submitted batch job 123456
0:00:42
and the stopwatch will stop when the job is completed

Related

Parallelizing a ping sweep

I'm attempting to sweep an IP block totaling about 65,000 addresses. We've been instructed to use specifically ICMP packets with bash and find a way to parallelize it. Here's what I've come up with:
#!/bin/bash
ping() {
if ping -c 1 -W 5 131.212.$i.$j >/dev/null
then
((++s))
echo -n "*"
else
((++f))
echo -n "."
fi
((++j))
#if j has reached 255, set it to zero and increment i
if [ $j -gt 255 ]; then
j=0
((++i))
echo "Pinging 131.212.$i.xx IP Block...\n"
fi
}
s=0 #number of responses recieved
f=0 #number of failures recieved
i=0 #IP increment 1
j=0 #IP increment 2
curProcs=$(ps | wc -l)
maxProcs=$(getconf OPEN_MAX)
while [ $i -lt 256 ]; do
curProcs=$(ps | wc -l)
if [ $curProcs -lt $maxProcs ]; then
ping &
else
sleep 10
fi
done
echo "Found "$s" responses and "$f" timeouts."
echo /usr/bin/time -l
done
However, I've been running into the following error (on macOS):
redirection error: cannot duplicate fd: Too many open files
My understanding is I'm going over a resource limit, which I've attempted to rectify by only starting new ping processes if the existing processes count is less than the specified max, but this does not solve the issue.
Thank you for your time and suggestions.
EDIT:
There are a lot of good suggestions below for doing this with preexisting tools. Since I was limited by academic requirements, I ended up splitting the ping loops into a different process for each 12.34.x.x blocks, which although ugly did the trick in under 5 minutes. This code has a lot of problems, but it might be a good starting point for someone in the future:
#!/bin/bash
#############################
# Ping Subfunction #
#############################
# blocks with more responses will complete first since worst-case scenerio
# is O(n) if no IPs generate a response
pingSubnet() {
for ((j = 0 ; j <= 255 ; j++)); do
# send a single ping with a timeout of 5 sec, piping output to the bitbucket
if ping -c 1 -W 1 131.212."$i"."$j" >/dev/null
then
((++s))
else
((++f))
fi
done
#echo "Recieved $s responses with $f timeouts in block $i..."
# output number of success results to the pipe opened in at the start
echo "$s" >"$pipe"
exit 0
}
#############################
# Variable Declaration #
#############################
start=$(date +%s) #start of execution time
startMem=$(vm_stat | awk '/Pages free/ {print $3}' | awk 'BEGIN { FS = "\." }; {print ($1*0.004092)}' | sed 's/\..*$//');
startCPU=$(top -l 1 | grep "CPU usage" | awk '{print 100-$7;}' | sed 's/\..*$//')
s=0 #number of responses recieved
f=0 #number of failures recieved
i=0 #IP increment 1
j=0 #IP increment 2
#############################
# Pipe Initialization #
#############################
# create a pipe for child procs to write to
# child procs inherit runtime environment of parent proc, but cannot
# write back to it (like passing by value in C, but the whole env)
# hence, they need somewhere else to write back to that the parent
# proc can read back in
pipe=/tmp/pingpipe
trap 'rm -f $pipe' EXIT
if [[ ! -p $pipe ]]; then
mkfifo $pipe
exec 3<> $pipe
fi
#############################
# IP Block Iteration #
#############################
# adding an ampersand to the end forks the command to a separate, backgrounded
# child process. this allows for parellel computation but adds logistical
# challenges since children can't write the parent's variables
echo "Initiating scan processes..."
while [ $i -lt 256 ]; do
#echo "Beginning 131.212.$i.x block scan..."
#ping subnet asynchronously
pingSubnet &
((++i))
done
echo "Waiting for scans to complete (this may take up to 5 minutes)..."
peakMem=$(vm_stat | awk '/Pages free/ {print $3}' | awk 'BEGIN { FS = "\." }; {print ($1*0.004092)}' | sed 's/\..*$//')
peakCPU=$(top -l 1 | grep "CPU usage" | awk '{print 100-$7;}' | sed 's/\..*$//')
wait
echo -e "done" >$pipe
#############################
# Concat Pipe Outputs #
#############################
# read each line from the pipe we created earlier, adding the number
# of successes up in a variable
success=0
echo "Tallying responses..."
while read -r line <$pipe; do
if [[ "$line" == 'done' ]]; then
break
fi
success=$((line+success))
done
#############################
# Output Statistics #
#############################
echo "Gathering Statistics..."
fail=$((65535-success))
#output program statistics
averageMem=$((peakMem-startMem))
averageCPU=$((peakCPU-startCPU))
end=$(date +%s) #end of execution time
runtime=$((end-start))
echo "Scan completed in $runtime seconds."
echo "Found $success active servers and $fail nonresponsive addresses with a timeout of 1."
echo "Estimated memory usage was $averageMem MB."
echo "Estimated CPU utilization was $averageCPU %"
This should give you some ideas with GNU Parallel
parallel --dry-run -j 64 -k ping 131.212.{1}.{2} ::: $(seq 1 3) ::: $(seq 11 13)
ping 131.212.1.11
ping 131.212.1.12
ping 131.212.1.13
ping 131.212.2.11
ping 131.212.2.12
ping 131.212.2.13
ping 131.212.3.11
ping 131.212.3.12
ping 131.212.3.13
-j64 executes 64 pings in parallel at a time
-dry-run means do nothing but show what it would do
-k means keep the output in order - (just so you can understand it)
The ::: introduces the arguments and I have repeated them with different numbers (1 through 3, and then 11 through 13) so you can distinguish the two counters and see that all permutations and combinations are generated.
Don't do that.
Use fping instead. It will probe far more efficiently than your program will.
$ brew install fping
will make it available, thanks to the magic of brew.
Of course it's not as optimal as you are trying to build above, but you could start the maximum allow number of processes on the background, wait for them to end and start the next batch, something like this (except I'm using sleep 1s):
for i in {1..20} # iterate some
do
sleep 1 & # start in the background
if ! ((i % 5)) # after every 5th (using mod to detect)
then
wait %1 %2 %3 %4 %5 # wait for all jobs to finish
fi
done

Grep qstat output and copy files once done

I am using PBS job scheduler on my cluster. In bash,I would like to monitor the job status and once the job is done I would like to copy the results to a
certain location(/data/myfolder/)
My qstat output looks like this:
JobID Username Queue Jobname SessID NDS TSK Memory Time Status
----------------------------------------------------------------
717.XXXXXX user XXXX SS 2323283 1 24 122gb -- E
Thanks in advance
There is a script here that does this (for SGE). I started to excerpt just the relevant parts for you, but it will probably be easier for you to start with the full script and just insert the qsub commands inside the submit_job function, and then put the code you want for copying the results after the wait_job_finish command in the script. You can remove the log printing at the end if you want.
#!/bin/bash
# this script will submit a qsub job and check on host information for the cluster
# node which it ends up running on
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
submit_job () {
local job_name="$1"
qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
set -x
hostname
cat /etc/hosts
python -c "import socket; print socket.gethostbyname(socket.gethostname())"
# sleep 5000
E0F
}
wait_job_start () {
local job_id="$1"
printf "waiting for job to start"
while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
do
printf "."
sleep 1
done
printf "\n\n"
local node_name="$(get_node_name "$job_id")"
printf "Job is running on node $node_name \n\n"
}
wait_job_finish () {
local job_id="$1"
printf "waiting for job to finish"
while qstat | grep -q "$job_id"
do
printf "."
sleep 1
done
printf "\n\n"
}
check_for_job_submission () {
local job_id="$1"
if ! qstat | grep -q "$job_id" ; then
echo "its there"
else
echo "not there"
fi
}
get_node_name () {
local job_id="$1"
qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*#[^ ]*\).*$|\1|g'
}
# ~~~~~ RUN ~~~~~ #
printf "Submitting cluster job to get node hostname and IP\n\n"
job_name="get_node_hostnames"
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*|\1|g' )"
job_stdout_log="${job_name}.o${job_id}"
printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
wait_job_start "$job_id"
wait_job_finish "$job_id"
printf "\n\nReading log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && cat "$job_stdout_log"
printf "\n\nRemoving log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
Sidenote: If you like Python, there is a slightly more robust equivalent here
You'll probably have to do some little tweaks to both to adjust it for your PBS system, since this was written for SGE.
You can just look for " C " with grep, but you could also just use -o [hostname:]path to stream to the final destination, as long as you have your ssh keys set up from the node for your POSIX account.
If you end up doing grep, you should be a good citizen and limit your check frequency to once or twice a minute, so as not to contribute to server spam, which can impact performance.

How to control the number of a job executions ended with "&" within a loop [duplicate]

This question already has answers here:
Parallelize Bash script with maximum number of processes
(16 answers)
Closed 1 year ago.
Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.
I know I can implement this with ps | grep -style tricks, but is there an easier way?
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
parallel gzip ::: *.log
which will run one gzip per CPU core until all logfiles are gzipped.
If it is part of a larger loop you can use sem instead:
for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait
It will do the same, but give you a chance to do more stuff for each file.
If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
It will download, check signature, and do a personal installation if it cannot install globally.
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
A small bash script could help you:
# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
sleep 1
joblist=($(jobs -p))
done
$* &
If you call:
. exec-async.sh sleep 10
...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.
You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.
The sleep inside is ugly, but I didn't find a way to wait for the first job that terminates.
The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:
. /path/to/bgx.sh
It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).
It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:
Set up an empty group variable.
Transfer that to bgxgrp.
Call bgxlimit() with the limit and command you want to run.
Transfer the new group back to your group variable.
Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.
#!/bin/bash
# bgxupdate - update active processes in a group.
# Works by transferring each process to new group
# if it is still active.
# in: bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.
bgxupdate() {
bgxoldgrp=${bgxgrp}
bgxgrp=""
((bgxcount = 0))
bgxjobs=" $(jobs -pr | tr '\n' ' ')"
for bgxpid in ${bgxoldgrp} ; do
echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
bgxgrp="${bgxgrp} ${bgxpid}"
((bgxcount++))
fi
done
}
# bgxlimit - start a sub-process with a limit.
# Loops, calling bgxupdate until there is a free
# slot to run another sub-process. Then runs it
# an updates the process group.
# in: $1 - the limit on processes.
# in: $2+ - the command to run for new process.
# in: bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes
bgxlimit() {
bgxmax=$1; shift
bgxupdate
while [[ ${bgxcount} -ge ${bgxmax} ]]; do
sleep 1
bgxupdate
done
if [[ "$1" != "-" ]]; then
$* &
bgxgrp="${bgxgrp} $!"
fi
}
# Test program, create group and run 6 sleeps with
# limit of 3.
group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done
# Wait until all others are finished.
echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
oldcount=${bgxcount}
while [[ ${oldcount} -eq ${bgxcount} ]]; do
sleep 1
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
done
echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done
Here’s a sample run, with blank lines inserted to clearly delineate different time points:
0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]
4 12:38:10 [ 5880 2524 1560 ]
5 12:38:20 [ 2524 1560 5032 ]
6 12:38:30 [ 1560 5032 5212 ]
9 12:38:50 [ 5032 5212 ]
9 12:39:10 [ 5212 ]
9 12:39:30 [ ]
The whole thing starts at 12:38:00 (time t = 0) and, as you can see, the first three processes run immediately.
Each process sleeps for 10n seconds and the fourth process doesn’t start until the first exits (at time t = 10). You can see that process 3368 has disappeared from the list before 1560 is added.
Similarly, the fifth process 5032 starts when 5880 (the second) exits at time t = 20.
And finally, the sixth process 5212 starts when 2524 (the third) exits at time t = 30.
Then the rundown begins, the fourth process exits at time t = 50 (started at 10 with 40 duration).
The fifth exits at time t = 70 (started at 20 with 50 duration).
Finally, the sixth exits at time t = 90 (started at 30 with 60 duration).
Or, if you prefer it in a more graphical time-line form:
Process: 1 2 3 4 5 6
-------- - - - - - -
12:38:00 ^ ^ ^ 1/2/3 start together.
12:38:10 v | | ^ 4 starts when 1 done.
12:38:20 v | | ^ 5 starts when 2 done.
12:38:30 v | | ^ 6 starts when 3 done.
12:38:40 | | |
12:38:50 v | | 4 ends.
12:39:00 | |
12:39:10 v | 5 ends.
12:39:20 |
12:39:30 v 6 ends.
Here's the shortest way:
waitforjobs() {
while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}
Call this function before forking off any new job:
waitforjobs 10
run_another_job &
To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.
Assuming you'd like to write code like this:
for x in $(seq 1 100); do # 100 things we want to put into the background.
max_bg_procs 5 # Define the limit. See below.
your_intensive_job &
done
Where max_bg_procs should be put in your .bashrc:
function max_bg_procs {
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
break
fi
sleep 1
done
}
The following function (developed from tangens answer above, either copy into script or source from file):
job_limit () {
# Test for single positive integer input
if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]]
then
# Check number of running jobs
joblist=($(jobs -rp))
while (( ${#joblist[*]} >= $1 ))
do
# Wait for any job to finish
command='wait '${joblist[0]}
for job in ${joblist[#]:1}
do
command+=' || wait '$job
done
eval $command
joblist=($(jobs -rp))
done
fi
}
1) Only requires inserting a single line to limit an existing loop
while :
do
task &
job_limit `nproc`
done
2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks
This might be good enough for most purposes, but is not optimal.
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done
If you're willing to do this outside of pure bash, you should look into a job queuing system.
For instance, there's GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.
Both systems will require some configuration, but it's entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there's no reason you can't use one of these on a single desktop computer without regard for compute time or memory limits.
It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls 'jobs' and 'wc' commands 10x per second. You can reduce the calls to 1x per second for example, if you don't mind waiting a bit longer for each job to complete.
# $1 = maximum concurent jobs
#
limit_jobs()
{
while true; do
if [ "$(jobs -p | wc -l)" -lt "$1" ]; then break; fi
usleep 100000
done
}
# and now start some tasks:
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait
On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER).
[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"
while [ "$1" ]; do
{
do something
with $1
in parallel
echo "[$# items left] $1 done"
} &
while true; do
# load the PIDs of all child processes to the array
joblist=(`jobs -p`)
if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
# when the job limit is reached, wait for *single* job to finish
wait -n
else
# stop checking when we're below the limit
break
fi
done
# it's great we executed zero external commands to check!
shift
done
# wait for all currently active child processes
wait
Wait command, -n option, waits for the next job to terminate.
maxjobs=10
# wait for the amount of processes less to $maxjobs
jobIds=($(jobs -p))
len=${#jobIds[#]}
while [ $len -ge $maxjobs ]; do
# Wait until one job is finished
wait -n $jobIds
jobIds=($(jobs -p))
len=${#jobIds[#]}
done
Have you considered starting ten long-running listener processes and communicating with them via named pipes?
you can use ulimit -u
see http://ss64.com/bash/ulimit.html
Bash mostly processes files line by line.
So you cap split input file input files by N lines then simple pattern is applicable:
mkdir tmp ; pushd tmp ; split -l 50 ../mainfile.txt
for file in * ; do
while read a b c ; do curl -s http://$a/$b/$c <$file &
done ; wait ; done
popd ; rm -rf tmp;

Bash script to run query on 28 cores

I am trying to have an outfile query run a single process per value in an array to speed up the process of exporting data from mysql, id like to run the script on multiple cores. My bash script is:
dbquery=$(mysql -u user -p -e "SELECT distinct(ticker) FROM db.table")
array=( $( for i in $dbquery ; do echo $i ; done ) )
csv ()
{
dbquery=$(mysql -u user --password=password -e "SELECT * FROM db2.table2 WHERE symbol = '$i' INTO OUTFILE '/tmp/$i.csv' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
}
set -m
for i in 'seq 28'; do #trying to run on 28 cores
for j in ${array[#]}; do
csv $j &
done
sleep 5 &
done
while [ 1 ];
do
fg 2> /dev/null; [ $? == 1 ] && break;
done
Now I ran this and it is not exporting files as i wished it too and i cannot figure out how to kill the processes. Could you help me understand how to fix this so that it will run the outfile query per ticker? Also how do I kill the current script that is running without killing other scripts and programs that are running?
You can use xargs to automatically handle job scheduling:
dbquery=$(mysql -u user -p -e "SELECT distinct(ticker) FROM db.table")
array=( $( for i in $dbquery ; do echo $i ; done ) )
csv ()
{
dbquery=$(mysql -u user --password=password -e "SELECT * FROM db2.table2 WHERE symbol = '$i' INTO OUTFILE '/tmp/$i.csv' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
}
export -f csv
echo "${array[#]}" | xargs -P 28 -n 1 bash -c 'csv "$1"' --
The problem with your approach is that because the loops are nested, you start all processes 28 times each, rather than running them once and 28 at a time.
wait will wait until all the child processes are done.
for i in 'seq 28'; do #trying to run on 28 cores
for j in ${array[#]}; do
csv $j &
done
done
wait

Bash: limit the number of concurrent jobs? [duplicate]

This question already has answers here:
Parallelize Bash script with maximum number of processes
(16 answers)
Closed 1 year ago.
Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.
I know I can implement this with ps | grep -style tricks, but is there an easier way?
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
parallel gzip ::: *.log
which will run one gzip per CPU core until all logfiles are gzipped.
If it is part of a larger loop you can use sem instead:
for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait
It will do the same, but give you a chance to do more stuff for each file.
If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
It will download, check signature, and do a personal installation if it cannot install globally.
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
A small bash script could help you:
# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
sleep 1
joblist=($(jobs -p))
done
$* &
If you call:
. exec-async.sh sleep 10
...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.
You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.
The sleep inside is ugly, but I didn't find a way to wait for the first job that terminates.
The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:
. /path/to/bgx.sh
It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).
It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:
Set up an empty group variable.
Transfer that to bgxgrp.
Call bgxlimit() with the limit and command you want to run.
Transfer the new group back to your group variable.
Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.
#!/bin/bash
# bgxupdate - update active processes in a group.
# Works by transferring each process to new group
# if it is still active.
# in: bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.
bgxupdate() {
bgxoldgrp=${bgxgrp}
bgxgrp=""
((bgxcount = 0))
bgxjobs=" $(jobs -pr | tr '\n' ' ')"
for bgxpid in ${bgxoldgrp} ; do
echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
bgxgrp="${bgxgrp} ${bgxpid}"
((bgxcount++))
fi
done
}
# bgxlimit - start a sub-process with a limit.
# Loops, calling bgxupdate until there is a free
# slot to run another sub-process. Then runs it
# an updates the process group.
# in: $1 - the limit on processes.
# in: $2+ - the command to run for new process.
# in: bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes
bgxlimit() {
bgxmax=$1; shift
bgxupdate
while [[ ${bgxcount} -ge ${bgxmax} ]]; do
sleep 1
bgxupdate
done
if [[ "$1" != "-" ]]; then
$* &
bgxgrp="${bgxgrp} $!"
fi
}
# Test program, create group and run 6 sleeps with
# limit of 3.
group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done
# Wait until all others are finished.
echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
oldcount=${bgxcount}
while [[ ${oldcount} -eq ${bgxcount} ]]; do
sleep 1
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
done
echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done
Here’s a sample run, with blank lines inserted to clearly delineate different time points:
0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]
4 12:38:10 [ 5880 2524 1560 ]
5 12:38:20 [ 2524 1560 5032 ]
6 12:38:30 [ 1560 5032 5212 ]
9 12:38:50 [ 5032 5212 ]
9 12:39:10 [ 5212 ]
9 12:39:30 [ ]
The whole thing starts at 12:38:00 (time t = 0) and, as you can see, the first three processes run immediately.
Each process sleeps for 10n seconds and the fourth process doesn’t start until the first exits (at time t = 10). You can see that process 3368 has disappeared from the list before 1560 is added.
Similarly, the fifth process 5032 starts when 5880 (the second) exits at time t = 20.
And finally, the sixth process 5212 starts when 2524 (the third) exits at time t = 30.
Then the rundown begins, the fourth process exits at time t = 50 (started at 10 with 40 duration).
The fifth exits at time t = 70 (started at 20 with 50 duration).
Finally, the sixth exits at time t = 90 (started at 30 with 60 duration).
Or, if you prefer it in a more graphical time-line form:
Process: 1 2 3 4 5 6
-------- - - - - - -
12:38:00 ^ ^ ^ 1/2/3 start together.
12:38:10 v | | ^ 4 starts when 1 done.
12:38:20 v | | ^ 5 starts when 2 done.
12:38:30 v | | ^ 6 starts when 3 done.
12:38:40 | | |
12:38:50 v | | 4 ends.
12:39:00 | |
12:39:10 v | 5 ends.
12:39:20 |
12:39:30 v 6 ends.
Here's the shortest way:
waitforjobs() {
while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}
Call this function before forking off any new job:
waitforjobs 10
run_another_job &
To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.
Assuming you'd like to write code like this:
for x in $(seq 1 100); do # 100 things we want to put into the background.
max_bg_procs 5 # Define the limit. See below.
your_intensive_job &
done
Where max_bg_procs should be put in your .bashrc:
function max_bg_procs {
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
break
fi
sleep 1
done
}
The following function (developed from tangens answer above, either copy into script or source from file):
job_limit () {
# Test for single positive integer input
if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]]
then
# Check number of running jobs
joblist=($(jobs -rp))
while (( ${#joblist[*]} >= $1 ))
do
# Wait for any job to finish
command='wait '${joblist[0]}
for job in ${joblist[#]:1}
do
command+=' || wait '$job
done
eval $command
joblist=($(jobs -rp))
done
fi
}
1) Only requires inserting a single line to limit an existing loop
while :
do
task &
job_limit `nproc`
done
2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks
This might be good enough for most purposes, but is not optimal.
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done
If you're willing to do this outside of pure bash, you should look into a job queuing system.
For instance, there's GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.
Both systems will require some configuration, but it's entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there's no reason you can't use one of these on a single desktop computer without regard for compute time or memory limits.
It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls 'jobs' and 'wc' commands 10x per second. You can reduce the calls to 1x per second for example, if you don't mind waiting a bit longer for each job to complete.
# $1 = maximum concurent jobs
#
limit_jobs()
{
while true; do
if [ "$(jobs -p | wc -l)" -lt "$1" ]; then break; fi
usleep 100000
done
}
# and now start some tasks:
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait
On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER).
[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"
while [ "$1" ]; do
{
do something
with $1
in parallel
echo "[$# items left] $1 done"
} &
while true; do
# load the PIDs of all child processes to the array
joblist=(`jobs -p`)
if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
# when the job limit is reached, wait for *single* job to finish
wait -n
else
# stop checking when we're below the limit
break
fi
done
# it's great we executed zero external commands to check!
shift
done
# wait for all currently active child processes
wait
Wait command, -n option, waits for the next job to terminate.
maxjobs=10
# wait for the amount of processes less to $maxjobs
jobIds=($(jobs -p))
len=${#jobIds[#]}
while [ $len -ge $maxjobs ]; do
# Wait until one job is finished
wait -n $jobIds
jobIds=($(jobs -p))
len=${#jobIds[#]}
done
Have you considered starting ten long-running listener processes and communicating with them via named pipes?
you can use ulimit -u
see http://ss64.com/bash/ulimit.html
Bash mostly processes files line by line.
So you cap split input file input files by N lines then simple pattern is applicable:
mkdir tmp ; pushd tmp ; split -l 50 ../mainfile.txt
for file in * ; do
while read a b c ; do curl -s http://$a/$b/$c <$file &
done ; wait ; done
popd ; rm -rf tmp;

Resources