batch job submission upon completion of job - bash

I would like to write a script to execute the steps outlined below. If someone can provide simple examples on how to modify files and search through folders using a script (not necessarily solving my problem below), I will greatly appreciate it.
submit job MyJob in currentDirectory using myJobShellFile.sh to a queue
upon completion of MyJob, goto to currentDirectory/myJobDataFolder.
In myJobDataFolder, there are folders
myJobData.0000 myJobData.0001 myJobData.0002 myJobData.0003
I want to find the maximum number maxIteration of all the listed folders. Here it would be maxIteration=0003.\
In file myJobShellFile.sh, at the last line says
mpiexec ./main input myJobDataFolder
I want to append this line to
'mpiexec ./main input myJobDataFolder 0003'
I want to submit MyJob to the que while maxIteration < 10
Upon completion of MyJob, find the new maxIteration and change this number in myJobShellFile.sh and goto step 4.
I think people write python scripts typically to do this stuff, but am having a hard time finding out how. I probably don't know the correct terminology for this procedure. I am also aware that the script will vary slightly depending on the queing system, but any help will be greatly appreciated.

Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to
'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash tag instead of python.
In the following script, I put a ### at the end of any line where I am guessing what you are talking about. Lines ending with ### may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1} line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt} form, and the “Arithmetic Expansion” section re $((expression)) form. The 1$maxIter and similar forms are used to change text like 0018 (which is not a valid bash number because 8 is not an octal digit) to 10018.
#!/bin/sh
./myJobShellFile.sh MyJob ###
maxIter=0
while true; do
waitforjobcompletion ###
cd ./myJobDataFolder
maxFile= $(ls myJobData* | tail -1)
maxIter= ${maxFile#myJobData.} #Get max extension
# If you want to increment maxIter, uncomment next line
# t=$((1$maxIter+1)); maxIter=${t#1}
cd ..
if [[ 1$maxIter -lt 11000 ]] ; then
./myJobShellFile.sh MyJobDataFolder $maxIter
else
break
fi
done
Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)
#!/bin/sh
for iter in {0000..1000}; do
./myJobShellFile.sh MyJobDataFolder $iter
waitforjobcompletion ###
done
(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion in the above scripts. Otherwise, if the system has a command jobisrunning that returns true if a job is still running, replace waitforjobcompletion with something like the following:
while jobisrunning ; do sleep 15; done
This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:
while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done
The second line's test could be [ -f abc ] instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while statement's test, replace the word while with until. For example, while [ ! -f abc ]; ... is equivalent to until [ -f abc ]; ....

Related

compatible comments for both bash and batch

We write two similar scripts: one for bash (linux) and one for batch (dos/windows).
Even if the specific code is different we would like to visually compare (diff) both scripts and have the similar parts of code aligned side by side.
We use explicit comments with the same text to achieve this. But the beginning of the comments is different in both scripting (REM or :: in windows) and (# in linux).
Therefore there is a wrong alignment:
linux
windows
# first step
REM first step
foo.sh
foo.bat
# second step
REM second step
bar.sh
bar.bat
Is there a way to use a common character or sequence of characters to make the comments equal?
Is the use of : #; safe for both systems/scripts?
linux
windows
: #; first step
: #; first step
foo.sh
foo.bat
: #; second step
: #; second step
bar.sh
bar.bat
Are there any unwanted side effects?
: in bash is not exactly a comment. It is a void command.
A little bit like pass in some languages.
It helps, for example, to fill empty slots, if needed
if condition
then
:
else
doSomething
fi
So, you may use, somehow, as a sort of comment. That would works both in bash and batch (well, I know nothing of batch. But since you said that :: is a comment there). But beware that it is not exactly a comment. So there are some differences
For example
#!/bin/bash
echo one ||
:: foo
echo two
echo un ||
# bar
echo deux
Displays one, two and un but not deux.
Because echo one || prints one and then execute the following command only if it fails (which it doesn't). Here the following command is :: foo. Which is not executed (you wouldn't know, since it does nothing, but it is not executed). And the echo two is a brand new unrelated command that is executed.
Whereas, on the other hand, echo un || likewise prints un, and doesn't execute the next command, since echo un did not fail. But the next command here is echo deux. Because # bar doesn't count, since it is a comment.
And that is only one of the many examples one could probably find to show that : is not a comment.
But, well, if you use it being aware of that, I suppose you could use it to insert void comments in your bash code that would also be void in batch.
Edit:
I won't edit for each new example that comes to mind. But that one is pretty important
echo un # deux
echo one : two
prints
un
one : two
: is a command. So, as other commands, like ls not all occurrence of it is treated as so (no more than echo ls list the directory constant. ls is just a string here)
So, you can't use it as a replacement for inline comments. Only for full lines comments.

How can I display the run count as part of the `watch` output in a terminal?

I'd like to see a run counter at the top of my watch output.
Ex: this command should print a count value which increments every 2 seconds, in addition to the output of my main command, which, in this case is just echo "hello" for the purposes of this demonstration:
export COUNT=0 && watch -n 2 'export COUNT=$((COUNT+1)); echo "Count = $COUNT" \
&& echo "hello"'
But, all it outputs is this, with the count always being 1 and never changing:
Count = 1
hello
How can I get this Count variable to increment every 2 seconds when watch runs the command?
Thanks #Inian for pointing this out in the comments. I've consulted the cross-site duplicate, slightly modified it, and come up with this:
count=0; while sleep 2 ; do clear; echo "$((count++))"; echo "hello" ; done
Replace echo "hello" with the real command you want to run once every sleep n second now, and it works perfectly.
There's no need to use watch at all, and, as a matter of fact, no way to use watch in this way either unless you write the variable to a file rather than to RAM, which I'd like to avoid since that's unnecessary wear and tear write/erase cycles on a solid state drive (SSD).
Now I can run this command to repeatedly build my asciidoctor document so I can view it in a webpage, hitting only F5 each time I make and save a change to view the updated HTML page.
count=0; while sleep 2 ; do clear; echo "$((count++))"; \
asciidoctor -D temp README.adoc ; done
Sample output:
96
asciidoctor: ERROR: README.adoc: line 6: level 0 sections can only be used when doctype is book
Final answer:
And, here's a slightly better version which prints count = 2 instead of just 2, and which also runs first and then sleeps, rather than sleeping first and then running:
count=1; while true ; do clear; echo "count = $((count++))"; \
asciidoctor -D temp README.adoc; sleep 2 ; done
...but, it looks like it's not just updating the file if it's changed, it's rewriting constantly, wearing down my disk anyway. So, I'd need to write a script and only run this if the source file has changed. Oh well...for the purposes of my original question, this answer is done.

Customize argument input routine for shell script

I have written a shell script for automating some tasks that I run from the terminal as -
v#ubuntu:$ ./automate.sh from:a1 to:a2 msg:'edited'
How can I (if at all) customize the script so as to enter each argument in a custom format on a separate line and execute it by pressing some other key to execute the shell script? So, I would do -
v#ubuntu:$ ./automate.sh
from : a1
to : a2
msg : 'next change'
... and then hit say Ctrl+Enter or F5 to execute this particular script?
NOTE : I know there is a hacky work around by simply typing ./automate.sh \ and hitting Enter after the trailing backslash to get a new line, but I was hoping to find a more elegant way to do this from within the script itself.
Also, I've purposely changed each argument to include whitespaces and the msg argument to include a string with spaces. So if anyone can point me in the right direction as to how to accomplish that as an added bonus, I'll be really grateful :)
If you know the number of arguments it is easy. Basics first.
#!/bin/bash
if [ $# == 0 ]
then
read v1 # gets onto new line. reads the whole line until ENTER
read v2 # same
read v3 # same
fi
# Parse $v1, $v2, $v3 as needed and run your script
echo ""
echo "Got |$v1|, |$v2|, |$v3|"
When you type automate.sh and hit enter the script is started, having received no arguments. With no arguments ($# == 0) the first read is executed, which prints a new line, waits, and gets the line typed in (once enter is hit) into $v1. The control goes back to the script, the next read gets the next typed line ... after the last one it drops out of if-else and continues. Parse your variables and run the script.
Session:
> automate.sh Enter
typed line Enter
more items Enter
yet more Enter
Got |typed line| |more items| |yet more|
>
You don't need Control-Enter or F5, it continues after 3 (three) lines.
This also allows you to provide both behaviors. Add an else, which will be executed if there are some arguments. You can then use the script by either supplying arguments on the first line (invocation you have so far), or in this new way.
If you need an unspecified number of arguments this approach will need more work.
Read words in input line into variables
If read is followed by variable names, like read v1 v2, then it reads each word into a variable, and the last variable gets everything that may have remained on the line. So replace read lines with
read k1 p1 s1
read k2 p2 s2
read k3 p3 s3
Now $k1 contains the first word (from), and $k2 and $k3 have the first words on their lines; then $p1 (etc) have the second word (:), and $s1 (etc) have everything else to the end of their lines (a1, a2, 'next change'). So you don't need those single quotes. All this is simple to modify if you want the script to print something on each line before input.
Based on the clarification in the comment, it is indeed desirable to not have to enter the whole strings, as one might think. This is "simple to modify"
read -p 'from :' s1
read -p 'to :' s2
read -p 'msg :' s3
Now the user only needs to enter the part after :, captured in $s variables. All else is the same.
See, for example: The section on user input in the Bash Guide; Their Advanced Guide (special variables); For a far more involved user interaction, this post. And, of course, man read.
It will be hard to bind Control-Enter in bash. If you are ok to change it for Control-D then everything might look like:
#!/usr/local/bin/bash
read -p 'From: ' from
read -p 'To: ' to
read -p 'Msg: ' msg
read keystroke
if [ "$keystroke" == "^D" ]; then
echo "$from $to $msg"
# do something else
fi

matlab batch parallelization in bash

I'm trying to run a piece of code on a large computer cluster in order to analyze different parts of the data.
I created 2 loops to assign the jobs to different nodes and the cpu's that the nodes contain.
The analysis function I wrote, 'chnJob()', just needs to take an index to know what part of the data it needs to analyze (it's the shell variable called 'chn' in this case).
the loop is like this:
for NODE in $NODES; do # Loop through nodes
for job_idx in {1..$PROCS_PER_NODE}; do # Loop through jobs per node (8 per node)
echo "this is the channel $chn"
ssh $NODE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob($chn); quit'" &
let chn++
sleep 2
done
done
Even though I see that chn variable is being incremented properly, the value of chn that is passed to the matlab function is always the last value of the chn.
This is probably because matlab takes a lot of time to open on each node and bash finishes the loops by then. So the value that is being passed to each matlab instance is only the last value.
Is there a way to circumvent that? Can I 'bake' the value of that variable when I'm calling the function?
Or is the problem entirely different?
I don't think that's what's happening. Can you try running this:
cnt=0
for a in 1 2; do
for b in 1 2; do
echo --- $cnt
ssh somehost "echo result: '$cnt'" &
let cnt++
done
done
Replace somehost with some host where you have sshd running. This prints numbers 0 - 3 getting back from echo result: '$cnt' getting executed remotely. Thus, executing itself works OK.
One thing that I can suggest is for you to move your command (matlab ...) into some script in a known folder, then run that script in the above loops by giving a full path to that script. Something like:
ssh $NOTE "/path/to/script.sh $cnt"
In the script, $1 will give you the value you want (i.e. $cnt from the loop). You can use echo $1 >> /tmp/values at the beginning of your script to collect all the values in file /tmp/values. Of course, rm /tmp/values before you start. This will confirm whether you are getting all the values as you want them.
Bash can't handle variables in brace range expressions. They have to be literals: {1..10}. Because of the way you have it now, the inner loop is always executed exactly once per iteration of the outer loop instead of eight times (or whatever the value of PROCS_PER_NODE is). As a result, chn goes from its initial value to that plus NODES when it should go from Original_chn to NODES * PROCS_PER_NODE.
Use a C-style for loop instead:
for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++))
You could increment both job_idx and chn in the for (if that doesn't give you off-by-one problems):
for ((job_idx=1; job_idx<=$PROCS_PER_NODE; job_idx++, chn++))
If $PBS_NODEFILE contains the filename with the list of nodes (one per line) then this should work:
seq 1 100 | parallel --slf $PBS_NODEFILE "matlab -nodisplay -nodesktop -nojvm -nosplash -r 'cd $WORK_DIR; chnJob({}); quit'"
Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

How to deal with NFS latency in shell scripts

I'm writing shell scripts where quite regularly some stuff is written
to a file, after which an application is executed that reads that file. I find that through our company the network latency differs vastly, so a simple sleep 2 for example will not be robust enough.
I tried to write a (configurable) timeout loop like this:
waitLoop()
{
local timeout=$1
local test="$2"
if ! $test
then
local counter=0
while ! $test && [ $counter -lt $timeout ]
do
sleep 1
((counter++))
done
if ! $test
then
exit 1
fi
fi
}
This works for test="[ -e $somefilename ]". However, testing existence is not enough, I sometimes need to test whether a certain string was written to the file. I tried
test="grep -sq \"^sometext$\" $somefilename", but this did not work. Can someone tell me why?
Are there other, less verbose options to perform such a test?
You can set your test variable this way:
test=$(grep -sq "^sometext$" $somefilename)
The reason your grep isn't working is that quotes are really hard to pass in arguments. You'll need to use eval:
if ! eval $test
I'd say the way to check for a string in a text file is grep.
What's your exact problem with it?
Also you might adjust your NFS mount parameters, to get rid of the root problem. A sync might also help. See NFS docs.
If you're wanting to use waitLoop in an "if", you might want to change the "exit" to a "return", so the rest of the script can handle the error situation (there's not even a message to the user about what failed before the script dies otherwise).
The other issue is using "$test" to hold a command means you don't get shell expansion when actually executing, just evaluating. So if you say test="grep \"foo\" \"bar baz\"", rather than looking for the three letter string foo in the file with the seven character name bar baz, it'll look for the five char string "foo" in the nine char file "bar baz".
So you can either decide you don't need the shell magic, and set test='grep -sq ^sometext$ somefilename', or you can get the shell to handle the quoting explicitly with something like:
if /bin/sh -c "$test"
then
...
Try using the file modification time to detect when it is written without opening it. Something like
old_mtime=`stat --format="%Z" file`
# Write to file.
new_mtime=$old_mtime
while [[ "$old_mtime" -eq "$new_mtime" ]]; do
sleep 2;
new_mtime=`stat --format="%Z" file`
done
This won't work, however, if multiple processes try to access the file at the same time.
I just had the exact same problem. I used a similar approach to the timeout wait that you include in your OP; however, I also included a file-size check. I reset my timeout timer if the file had increased in size since last it was checked. The files I'm writing can be a few gig, so they take a while to write across NFS.
This may be overkill for your particular case, but I also had my writing process calculate a hash of the file after it was done writing. I used md5, but something like crc32 would work, too. This hash was broadcast from the writer to the (multiple) readers, and the reader waits until a) the file size stops increasing and b) the (freshly computed) hash of the file matches the hash sent by the writer.
We have a similar issue, but for different reasons. We are reading s file, which is sent to an SFTP server. The machine running the script is not the SFTP server.
What I have done is set it up in cron (although a loop with a sleep would work too) to do a cksum of the file. When the old cksum matches the current cksum (the file has not changed for the determined amount of time) we know that the writes are complete, and transfer the file.
Just to be extra safe, we never overwrite a local file before making a backup, and only transfer at all when the remote file has two cksums in a row that match, and that cksum does not match the local file.
If you need code examples, I am sure I can dig them up.
The shell was splitting your predicate into words. Grab it all with $# as in the code below:
#! /bin/bash
waitFor()
{
local tries=$1
shift
local predicate="$#"
while [ $tries -ge 1 ]; do
(( tries-- ))
if $predicate >/dev/null 2>&1; then
return
else
[ $tries -gt 0 ] && sleep 1
fi
done
exit 1
}
pred='[ -e /etc/passwd ]'
waitFor 5 $pred
echo "$pred satisfied"
rm -f /tmp/baz
(sleep 2; echo blahblah >>/tmp/baz) &
(sleep 4; echo hasfoo >>/tmp/baz) &
pred='grep ^hasfoo /tmp/baz'
waitFor 5 $pred
echo "$pred satisfied"
Output:
$ ./waitngo
[ -e /etc/passwd ] satisfied
grep ^hasfoo /tmp/baz satisfied
Too bad the typescript isn't as interesting as watching it in real time.
Ok...this is a bit whacky...
If you have control over the file: you might be able to create a 'named pipe' here.
So (depending on how the writing program works) you can monitor the file in an synchronized fashion.
At its simplest:
Create the named pipe:
mkfifo file.txt
Set up the sync'd receiver:
while :
do
process.sh < file.txt
end
Create a test sender:
echo "Hello There" > file.txt
The 'process.sh' is where your logic goes : this will block until the sender has written its output. In theory the writer program won't need modifiying....
WARNING: if the receiver is not running for some reason, you may end up blocking the sender!
Not sure it fits your requirement here, but might be worth looking into.
Or to avoid synchronized, try 'lsof' ?
http://en.wikipedia.org/wiki/Lsof
Assuming that you only want to read from the file when nothing else is writing to it (ie, the writing process has finished) - you could check whether nothing else has file handle to it ?

Resources