matlab batch parallelization in bash - bash

I'm trying to run a piece of code on a large computer cluster in order to analyze different parts of the data.
I created 2 loops to assign the jobs to different nodes and the cpu's that the nodes contain.
The analysis function I wrote, 'chnJob()', just needs to take an index to know what part of the data it needs to analyze (it's the shell variable called 'chn' in this case).
the loop is like this:
for NODE in $NODES; do # Loop through nodes
for job_idx in {1..$PROCS_PER_NODE}; do # Loop through jobs per node (8 per node)
echo "this is the channel $chn"
ssh $NODE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob($chn); quit'" &
let chn++
sleep 2
done
done
Even though I see that chn variable is being incremented properly, the value of chn that is passed to the matlab function is always the last value of the chn.
This is probably because matlab takes a lot of time to open on each node and bash finishes the loops by then. So the value that is being passed to each matlab instance is only the last value.
Is there a way to circumvent that? Can I 'bake' the value of that variable when I'm calling the function?
Or is the problem entirely different?

I don't think that's what's happening. Can you try running this:
cnt=0
for a in 1 2; do
for b in 1 2; do
echo --- $cnt
ssh somehost "echo result: '$cnt'" &
let cnt++
done
done
Replace somehost with some host where you have sshd running. This prints numbers 0 - 3 getting back from echo result: '$cnt' getting executed remotely. Thus, executing itself works OK.
One thing that I can suggest is for you to move your command (matlab ...) into some script in a known folder, then run that script in the above loops by giving a full path to that script. Something like:
ssh $NOTE "/path/to/script.sh $cnt"
In the script, $1 will give you the value you want (i.e. $cnt from the loop). You can use echo $1 >> /tmp/values at the beginning of your script to collect all the values in file /tmp/values. Of course, rm /tmp/values before you start. This will confirm whether you are getting all the values as you want them.

Bash can't handle variables in brace range expressions. They have to be literals: {1..10}. Because of the way you have it now, the inner loop is always executed exactly once per iteration of the outer loop instead of eight times (or whatever the value of PROCS_PER_NODE is). As a result, chn goes from its initial value to that plus NODES when it should go from Original_chn to NODES * PROCS_PER_NODE.
Use a C-style for loop instead:
for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++))
You could increment both job_idx and chn in the for (if that doesn't give you off-by-one problems):
for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++, chn++))

If $PBS_NODEFILE contains the filename with the list of nodes (one per line) then this should work:
seq 1 100 | parallel --slf $PBS_NODEFILE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob({}); quit'"
Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Related

Sequential case autoincrementing its variable

This potentially is a stupid question or an impossible request. Anyway, I'm writing a little script to launch a program chosen from a list of installed software. Basically, my script presents me a numbered list, in which the programs are listed alphabetically, and I input the number corresponding to the program I want to launch. The variable in which my choice is stored is sent to a case, which launches the corresponding software, having its location stored in each case's command list.
i=1
echo -e "Which program to launch?\n"$((i++))". Program 1\n"$((i++))". Program 2\n"
read choice
case $choice in
1) path to program 1
2) path to program 2
esac
As you see, in the echo I've used a variable which gets incremented every time a new program is listed. This, to avoid having to manually write static numbers I have to personally shift every time a new program is installed and has to be inserted in the list between two existing programs. All I need to do is copy the universal $((i++)) index and the list adjusts itself.
The problem is I don't know how to implement this in the case cycle. Supposing I install a Program 3 which has to be alphabetically put between the two existing ones, the echo gets modified this way
echo -e "Which program to launch?\n"$((i++))". Program 1\n"$((i++))". Program 3\n"$((i++))". Program 2\n"
But in the case, I manually have to change the 2) before the second program into a 3).
case $choice in
1) path to program 1
2) path to program 3
3) path to program 2
esac
This may not be a problem in his example, but it is now that I have dozens of programs, and I have to change the 5 into a 6, the 6 into a 7 and so on until more than 20.
How can I automatize the case numbering, so that the cycle understands on its own that has to execute the n-th case if the variable value is n?
Bash already has select which does what you need:
#!/bin/bash
select choice in ls date 'ls /' ; do
$choice
break
done
If you want to present something different to what you run, you can use an associative array:
#!/bin/bash
declare -A choices=(
[show files]=ls
[show date]=date
[list root dir]='ls /'
)
select choice in "${!choices[#]}" exit ; do
[[ $choice == exit ]] && break
${choices[$choice]}
done
exit is handled outsice of the associative array as we want to keep it last, but associative arrays are unordered.

How to repeat a function a random number of times with an upper limit on the random number in bash

I am trying to repeatedly run a python script a random number of times using bash. However, to avoid running the script a massive amount of times I want to place an upper limit on the amount of times it can run. Currently I am using the 'modulo' operator to return a remainder and then using that as a string when performing a loop:
#!/bin/bash
RANGE=1000
number=$RANDOM
let "number %= $RANGE"
for run in {1..$number}
do
python script.py
done
The random number works (i.e. $number is a random number between 1-1000), but the problem is that this only seems to be running the script once, no matter what the random number is.
What might the problem be?
Problem is this line:
for run in {1..$number}
Since variables are not allowed (expanded) inside range {..} thus causing your loop to run only once no matter what is the value of $number.
Use it like this:
#!/bin/bash
range=1000
number=$((RANDOM % range))
for ((run=1; run <= number; run++)); do
python script.py
done

Open file in bash script

I've got a bash script accepting several files as input which are mixed with various script's options, for example:
bristat -p log1.log -m lo2.log log3.log -u
I created an array where i save all the index where i can find files in the script's call, so in this case it would be an arrat of 3 elements where
arr_pos[0] = 2
arr_pos[1] = 4
arr_pos[3] = 5
Later in the script I must call "head" and "grep" in those files and i tried this way
head -n 1 ${arr_pos[0]}
but i get this error non runtime
head: cannot open `2' for reading: No such file or directory
I tried various parenthesis combinations, but I can't find which one is correct.
The problem here is that ${arr_pos[0]} stores the index in which you have the file name, not the file name itself -- so you can't simply head it. The array storing your arguments is given by $#.
A possible way to access the data you want is:
#! /bin/bash
declare -a arr_pos=(2 4 5)
echo ${#:${arr_pos[0]}:1}
Output:
log1.log
The expansion ${#:${arr_pos[0]}:1} means you're taking the values ranging from the index ${arr_pos[0]} in the array $#, to the element of index ${arr_pos[0]} + 1 in the same array $#.
Another way to do so, as pointed by #flaschenpost, is to eval the index preceded by $, so that you'd be accessing the array of arguments. Although it works very well, it may be risky depending on who is going to run your script -- as they may add commands in the argument line.
Anyway, you may should try to loop through the entire array of arguments by the beginning of the script, hashing the values you find, so that you won't be in trouble while trying to fetch each value later. You may loop, using a for + case ... esac, and store the values in associative arrays.
I think eval is what you need.
#!/bin/bash
arr_pos[0]=2;
arr_pos[1]=4;
arr_pos[2]=5;
eval "cat \$${arr_pos[1]}"
For me that works.

batch job submission upon completion of job

I would like to write a script to execute the steps outlined below. If someone can provide simple examples on how to modify files and search through folders using a script (not necessarily solving my problem below), I will greatly appreciate it.
submit job MyJob in currentDirectory using myJobShellFile.sh to a queue
upon completion of MyJob, goto to currentDirectory/myJobDataFolder.
In myJobDataFolder, there are folders
myJobData.0000 myJobData.0001 myJobData.0002 myJobData.0003
I want to find the maximum number maxIteration of all the listed folders. Here it would be maxIteration=0003.\
In file myJobShellFile.sh, at the last line says
mpiexec ./main input myJobDataFolder
I want to append this line to
'mpiexec ./main input myJobDataFolder 0003'
I want to submit MyJob to the que while maxIteration < 10
Upon completion of MyJob, find the new maxIteration and change this number in myJobShellFile.sh and goto step 4.
I think people write python scripts typically to do this stuff, but am having a hard time finding out how. I probably don't know the correct terminology for this procedure. I am also aware that the script will vary slightly depending on the queing system, but any help will be greatly appreciated.
Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to
'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash tag instead of python.
In the following script, I put a ### at the end of any line where I am guessing what you are talking about. Lines ending with ### may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1} line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt} form, and the “Arithmetic Expansion” section re $((expression)) form. The 1$maxIter and similar forms are used to change text like 0018 (which is not a valid bash number because 8 is not an octal digit) to 10018.
#!/bin/sh
./myJobShellFile.sh MyJob ###
maxIter=0
while true; do
waitforjobcompletion ###
cd ./myJobDataFolder
maxFile= $(ls myJobData* | tail -1)
maxIter= ${maxFile#myJobData.} #Get max extension
# If you want to increment maxIter, uncomment next line
# t=$((1$maxIter+1)); maxIter=${t#1}
cd ..
if [[ 1$maxIter -lt 11000 ]] ; then
./myJobShellFile.sh MyJobDataFolder $maxIter
else
break
fi
done
Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)
#!/bin/sh
for iter in {0000..1000}; do
./myJobShellFile.sh MyJobDataFolder $iter
waitforjobcompletion ###
done
(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion in the above scripts. Otherwise, if the system has a command jobisrunning that returns true if a job is still running, replace waitforjobcompletion with something like the following:
while jobisrunning ; do sleep 15; done
This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:
while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done
The second line's test could be [ -f abc ] instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while statement's test, replace the word while with until. For example, while [ ! -f abc ]; ... is equivalent to until [ -f abc ]; ....

Bash script execute shell command with Bash variable as argument

I have one loop that creates a group of variables like DISK1, DISK2... where the number at the end of the variable name gets created by the loop and then loaded with a path to a device name. Now I want to use those variables in another loop to execute a shell command, but the variable doesn't give its contents to the shell command.
for (( counter=1 ; counter<=devcount ; counter++))
do
TEMP="\$DISK$counter"
# $TEMP should hold the variable name of the disk, which holds the device name
# TEMP was only for testing, but still has same problem as $DISK$counter
eval echo $TEMP #This echos correctly
STATD$counter=$(eval "smartctl -H -l error \$DISK$counter" | grep -v "5.41" | grep -v "Joe")
eval echo \$STATD$counter
done
Don't use eval ever, except maybe if there is no other way AND you really know what you are doing.
The STATD$counter=$(...) should give an error. That's not a valid assignment because the string "STATD$counter" is not a valid variable name. What will happen is (using a concrete example, if counter happened to be 3 and your pipeline in the $( ) output "output", bash will only expand that line as far as "STATD3=output" so it will try to find a command named "STATD3=output" and run it. Odds are this is not what you intended.
It sounds like everything you want to do can be accomplished with arrays instead. If you are not familiar with bash arrays take a look at Greg's Wiki, in particular this page or the bash man page to find out how to use them.
For example, in the loop you didn't post in your question: make disk (not DISK: don't use all upper case variable names) an array like so
disk+=( "new value" )
or even
disk[counter]="new value"
Then in the loop in your question, you can make statd an array as well and assign it with values from disk by
statd[counter]="... ${disk[counter]} ..."
It's worth saying again: avoid using eval.

Resources