SLURM batch array loop? - bash

I'm somewhat bash challenged and trying to send a large job array through slurm on my institution's cluster. I am way over my limit (which appears to be 1000 jobs per job array) and am having to iteratively parse out the list into blocks of 1000, which is tedious:
sbatch --array=17001-18000 -p <server-name> --time=12:00:00 <my-bash-script>
How might I write a loop to do this? Each job takes about 11 minutes, so I would need to build in a pause in the loop. Otherwise, I suspect SLURM will reject the new batch job. Anyone out there know what to do? Thanks in advance!

Something like this should do what you want
START=1
END=10000
STEP=1000
SLEEP=700 #Just over 11 Minutes (in seconds)
for i in $(seq $START $STEP $END) ; do
JSTART=$i
JEND=$[ $JSTART + $STEP - 1 ]
echo "Submitting with ${JSTART} and ${JEND}"
sbatch --array=${JSTART}-${JEND} -p <server-name> --time=12:00:00 <my-bash-script>
sleep $SLEEP
done

Related

Shell Scripting to compare the value of current iteration with that of the previous iteration

I have an infinite loop which uses aws cli to get the microservice names, it's parameters like desired tasks,number of running task etc for an environment.
There are 100's of microservices running in an environment. I have a requirement to compare the value of aws ecs metric running task for a particular microservice in the current loop and with that of the previous loop.
Say name a microservice X has the metric running task 5. As it is an infinite loop, after some time, again the loop come for the microservice X. Now, let's assume the value of running task is 4. I want to compare the running task for currnet loop, which is 4 with the value of the running task for the previous run, which is 5.
If you are asking a generic question of how to keep a previous value around so it can be compared to the current value, just store it in a variable. You can use the following as a starting point:
#!/bin/bash
previousValue=0
while read v; do
echo "Previous value=${previousValue}; Current value=${v}"
previousValue=${v}
done
exit 0
If the above script is called testval.sh. And you have an input file called test.in with the following values:
2
1
4
6
3
0
5
Then running
./testval.sh <test.in
will generate the following output:
Previous value=0; Current value=2
Previous value=2; Current value=1
Previous value=1; Current value=4
Previous value=4; Current value=6
Previous value=6; Current value=3
Previous value=3; Current value=0
Previous value=0; Current value=5
If the skeleton script works for you, feel free to modify it for however you need to do comparisons.
Hope this helps.
I dont know how your input looks exactly, but something like this might be useful for you :
The script
#!/bin/bash
declare -A app_stats
while read app tasks
do
if [[ ${app_stats[$app]} -ne $tasks && ! -z ${app_stats[$app]} ]]
then
echo "Number of tasks for $app has changed from ${app_stats[$app]} to $tasks"
app_stats[$app]=$tasks
else
app_stats[$app]=$tasks
fi
done <<< "$( cat input.txt)"
The input
App1 2
App2 5
App3 6
App1 6
The output
Number of tasks for App1 has changed from 2 to 6
Regards!

bash asynchronous variable setting (dns lookup)

Let's say we had a loop that we want to have run as quickly as possible. Let's say something was being done to a list of hosts inside that loop; just for the sake of argument, let's say it was a redis query. Let's say that the list of hosts may change occasionally due to hosts being added/removed from a pool (not load balanced); however, the list is predictable (e.g., they all start with “foo” and end with 2 digits. So we want to run this occasionally; say, once every 15 minutes:
listOfHosts=$(dig +noall +ans foo{00..99}.domain | while read -r n rest; do printf '%s\n' ${n%.}; done)
to get the list of hosts. Let's say our loop looked something like this:
while :; do
for i in $listOfHosts; do
redis-cli -h $i llen something
done
(( ( $(date +%s) % 60 * 15) == 0 )) && callFunctionThatSetslistOfHosts
done
(now obviously there's some things missing, like testing to see if we've already run callFunctionThatSetslistOfHosts in the current minute and only running it once, and doing something with the redis output, and maybe the list of hosts should be an array, but basically this is it.)
How can we run callFunctionThatSetslistOfHosts asynchronously so that it doesn't slow down the loop. I.e., have it running in the background setting listOfHosts occasionally (e.g. once every 15 minutes), so that the next time the inner loop is run it gets a potentially different set of hosts to run the redis query on?
My major problem seems to be that in order to set listOfHosts in a loop, that loop has to be a subshell, and listOfHosts is local to that subshell, and setting it doesn't affect the global listOfHosts.
I may resort to pipes, but will have to poll the reader before generating a new list — not that that's terribly bad if I poll slowly, but I thought I'd present this as a problem.
Thanks.

bash how to wait for completion of forked processes that run in background

I wonder if i could achieve something like the following logic:
given a set of jobs to be done fold_num and a limit number of worker processes, say work_num, i hope to run work_num processes in parallel until all jobs fold_num are done. Finally, there is some other processing on the results of all these jobs. We can assume fold_num is always several times of work_num.
I haven't got the following snippet working so far, with tips from How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?
#!/bin/bash
worker_num=5
fold_num=10
pids=""
result=0
for fold in $(seq 0 $(( $fold_num-1 ))); do
pids_idx=$(( $fold % ${worker_num} ))
echo "pids_idx=${pids_idx}, pids[${pids_idx}]=${pids[${pids_idx}]}"
wait ${pids[$pids_idx]} || let "result=1"
if [ "$result" == "1" ]; then
echo "some job is abnormal, aborting"
exit
fi
cmd="echo fold$fold" # use echo as an example, real command can be time-consuming to run
$cmd &
pids[${pids_idx}]="$!"
echo "pids=${pids[*]}"
done
# when the for-loop completes, do something else...
The output looks like:
pids_idx=0, pids[0]=
pids=5846
pids_idx=1, pids[1]=
fold0
pids=5846 5847
fold1
pids_idx=2, pids[2]=
pids=5846 5847 5848
fold2
pids_idx=3, pids[3]=
pids=5846 5847 5848 5849
fold3
pids_idx=4, pids[4]=
pids=5846 5847 5848 5849 5850
pids_idx=0, pids[0]=5846
fold4
./test_wait.sh: line 12: wait: pid 5846 is not a child of this shell
some job is abnormal, aborting
Question:
1. Seems the pids array has recorded correct process IDs, but failed to be 'wait' for. Any ideas how to fix this?
2. Do we need to use wait after the for-loop? if so, what to do after the for-loop?
alright, I guess I got a working solution with tips from folks on 'parallel'.
export worker_names=("foo", "bar")
export worker_num=${#worker_names[#]}
function some_computation {
fold=$1
cmd="..." #involves worker_names and fold
echo $cmd; $cmd
}
export -f some_computation # important, to make this function visible to subprocesses
for fold in $(seq 0 $(( $fold_num-1 ))); do
sem -j $worker_num some_computation $fold
done
sem --wait # wait for all jobs to complete
# do something below
Couple of things here:
I haven't got parallel working because of the post-computation processing i need to do after those parallel jobs. The parallel version i tried failed to wait for job completion. So i used GNU sem which stands for semaphore.
exporting variables is crucial for the computation function to access to in this situation. Otherwise those global variables are invisible.
exporting the computation function is also necessary for the same reason. Notice the -f option.
sem --wait perfectly fulfills the needs to wait for parallel jobs.
HTH.

How to run bash script from cron passing it greater argument every 15 minutes?

I have a simple script that I need to run every 15 minutes everyday (until I get to the last record in my database) giving it greater argument. I know how to do this with the constant argument - example:
*/15 * * * * ./my_awesome_script 1
But I need this, let's say, we start from 8:00 AM:
at 8:00 it should run ./my_awesome_script 1
at 8:15 it should run ./my_awesome_script 2
at 8:30 it should run ./my_awesome_script 3
at 8:45 it should run ./my_awesome_script 4
at 9:00 it should run ./my_awesome_script 5
...
How to make something like this?
I came up with temporary solution:
#!/bin/bash
start=$1
stop=$2
for i in `seq $start $stop`
do
./my_awesome_script $i
sleep 900
done
Writing a wrapper script is pretty much necessary (for sanity's sake). The script can record in a file the previous value of the number and increment it and record the new value ready for next time. Then you don't need the loop. How are you going to tell when you've reached the end of the data in the database? You need to know about how you want to handle that, too.
New cron entry:
*/15 * * * * ./wrap_my_awesome_script
And wrap_my_awesome_script might be:
crondir="$HOME/cron"
counter="$crondir/my_awesome_script.counter"
[ -d "$crondir" ] || mkdir -p "$crondir"
[ -s "$counter" ] || echo 0 > "$counter"
count=$(<"$counter")
((count++))
echo "$count" > $counter
"$HOME/bin/my_awesome_script" "$count"
I'm not sure why you use ./my_awesome_script; it likely means your script is in your $HOME directory. I'd keep it in $HOME/bin and use that name in the wrapper script — as shown.
Note the general insistence on putting material in some sub-directory of $HOME rather than directly in $HOME. Keeping your home directory uncluttered is generally a good idea. You can place the files and programs where you like, of course, but I recommend being as organized as possible. If you aren't organized then, in a few years time, you'll wish you had been.

Batch files processing in bash with full processor occupancy

Maybe really simple question, but I don't know where to dig.
I have a list of files (random names), and I want to process them using some command
processing_command $i ${i%.*}.txt
I want to speed up by using all processors. How to make such the script occupy the 10 processors simultaneously (by processing 10 files)? processing_command is not parallel by default. Thank you!
the trivial approach would be to use:
for i in $items
do
processing_command $i ${i%.*}.txt &
done
which will start a new (parallel instance of) processing_command for each $i (the trick is the trailing & which will background the process)
the drawback is, that if you have e.g. 1000 items, then this will start 1000 parallel processes, which (while occupying all 10 cores) will be busy doing context switching rather than doing the actual processing.
if you have as many (or less) items as cores, than this is a good and simple solution.
usually you don't want to start more processes than cores.
a simplistic approach (assuming that all items take about the same time when processing), is to split the the original "items" list into number_of_cores equally long lists. the following is slightly modified version of an example taken from an article in the german linux-magazin:
#!/bin/bash
## number of processors
PMAX=$(ls -1d /sys/devices/system/cpu/cpu[0-9]* | wc -l)
## call processing_command on each argument:
doSequential() {
local i
for i in "$#"; do
processing_command $i ${i%.*}.txt
done
}
## run PMAX parallel processes
doParallel() {
# split the arguments into PMAX equally sized lists
local items item currentProcess=0
for item in "$#"; do
items[$currentProcess]="${items[$currentProcess]} "$item""
shift
let currentProcess=$(( (currentProcess+1)%PMAX ))
done
# run PMAX processes, each with the shorter list of items
currentProcess=0
while [ $currentProcess -lt $PMAX ]; do
[ -n "${items[$currentProcess]}" ] &&
eval doSequential ${items[$currentProcess]} &
currentProcess=$((currentProcess+1))
done
wait
}
doParallel $ITEMS

Resources