How to deal with NFS latency in shell scripts - bash

I'm writing shell scripts where quite regularly some stuff is written
to a file, after which an application is executed that reads that file. I find that through our company the network latency differs vastly, so a simple sleep 2 for example will not be robust enough.
I tried to write a (configurable) timeout loop like this:
waitLoop()
{
local timeout=$1
local test="$2"
if ! $test
then
local counter=0
while ! $test && [ $counter -lt $timeout ]
do
sleep 1
((counter++))
done
if ! $test
then
exit 1
fi
fi
}
This works for test="[ -e $somefilename ]". However, testing existence is not enough, I sometimes need to test whether a certain string was written to the file. I tried
test="grep -sq \"^sometext$\" $somefilename", but this did not work. Can someone tell me why?
Are there other, less verbose options to perform such a test?

You can set your test variable this way:
test=$(grep -sq "^sometext$" $somefilename)
The reason your grep isn't working is that quotes are really hard to pass in arguments. You'll need to use eval:
if ! eval $test

I'd say the way to check for a string in a text file is grep.
What's your exact problem with it?
Also you might adjust your NFS mount parameters, to get rid of the root problem. A sync might also help. See NFS docs.

If you're wanting to use waitLoop in an "if", you might want to change the "exit" to a "return", so the rest of the script can handle the error situation (there's not even a message to the user about what failed before the script dies otherwise).
The other issue is using "$test" to hold a command means you don't get shell expansion when actually executing, just evaluating. So if you say test="grep \"foo\" \"bar baz\"", rather than looking for the three letter string foo in the file with the seven character name bar baz, it'll look for the five char string "foo" in the nine char file "bar baz".
So you can either decide you don't need the shell magic, and set test='grep -sq ^sometext$ somefilename', or you can get the shell to handle the quoting explicitly with something like:
if /bin/sh -c "$test"
then
...

Try using the file modification time to detect when it is written without opening it. Something like
old_mtime=`stat --format="%Z" file`
# Write to file.
new_mtime=$old_mtime
while [[ "$old_mtime" -eq "$new_mtime" ]]; do
sleep 2;
new_mtime=`stat --format="%Z" file`
done
This won't work, however, if multiple processes try to access the file at the same time.

I just had the exact same problem. I used a similar approach to the timeout wait that you include in your OP; however, I also included a file-size check. I reset my timeout timer if the file had increased in size since last it was checked. The files I'm writing can be a few gig, so they take a while to write across NFS.
This may be overkill for your particular case, but I also had my writing process calculate a hash of the file after it was done writing. I used md5, but something like crc32 would work, too. This hash was broadcast from the writer to the (multiple) readers, and the reader waits until a) the file size stops increasing and b) the (freshly computed) hash of the file matches the hash sent by the writer.

We have a similar issue, but for different reasons. We are reading s file, which is sent to an SFTP server. The machine running the script is not the SFTP server.
What I have done is set it up in cron (although a loop with a sleep would work too) to do a cksum of the file. When the old cksum matches the current cksum (the file has not changed for the determined amount of time) we know that the writes are complete, and transfer the file.
Just to be extra safe, we never overwrite a local file before making a backup, and only transfer at all when the remote file has two cksums in a row that match, and that cksum does not match the local file.
If you need code examples, I am sure I can dig them up.

The shell was splitting your predicate into words. Grab it all with $# as in the code below:
#! /bin/bash
waitFor()
{
local tries=$1
shift
local predicate="$#"
while [ $tries -ge 1 ]; do
(( tries-- ))
if $predicate >/dev/null 2>&1; then
return
else
[ $tries -gt 0 ] && sleep 1
fi
done
exit 1
}
pred='[ -e /etc/passwd ]'
waitFor 5 $pred
echo "$pred satisfied"
rm -f /tmp/baz
(sleep 2; echo blahblah >>/tmp/baz) &
(sleep 4; echo hasfoo >>/tmp/baz) &
pred='grep ^hasfoo /tmp/baz'
waitFor 5 $pred
echo "$pred satisfied"
Output:
$ ./waitngo
[ -e /etc/passwd ] satisfied
grep ^hasfoo /tmp/baz satisfied
Too bad the typescript isn't as interesting as watching it in real time.

Ok...this is a bit whacky...
If you have control over the file: you might be able to create a 'named pipe' here.
So (depending on how the writing program works) you can monitor the file in an synchronized fashion.
At its simplest:
Create the named pipe:
mkfifo file.txt
Set up the sync'd receiver:
while :
do
process.sh < file.txt
end
Create a test sender:
echo "Hello There" > file.txt
The 'process.sh' is where your logic goes : this will block until the sender has written its output. In theory the writer program won't need modifiying....
WARNING: if the receiver is not running for some reason, you may end up blocking the sender!
Not sure it fits your requirement here, but might be worth looking into.
Or to avoid synchronized, try 'lsof' ?
http://en.wikipedia.org/wiki/Lsof
Assuming that you only want to read from the file when nothing else is writing to it (ie, the writing process has finished) - you could check whether nothing else has file handle to it ?

Related

Source configuration file avoiding any execution

I'm working on a program to process requests in bash which are requested by users in a WebInterface. To give users flexibility they can specify several parameters per each job, at the end the request is saved in a file with a specific name, so the bash script could perform the requested task.
This file at the end is filled like this:
ENVIRONMENT="PRO"
INTEGRATION="J050_provisioning"
FILE="*"
DIRECTORY="out"
So the bash script will source this file to perform the needed tasks user requested. And it works great so far, but I see a security issue with this, if user enters malicious data, something like:
SOMEVAR="GONNAHACK $(rm -f some_important_file)"
OTHERVAR="DANGEROUZZZZZZ `while :; do sleep 10 & done`"
This will cause undesirable effects when sourcing the file :). Is there a way to prevent a source file execute any code but variable initializations? Or the only way would be just grep the source file before sourcing it to check it is not dangerous?
Just do not source it. Make it a configuration file composed of name=value lines (without the double quotes), read each name/value pair and assign value to name. In order not to overwrite critical variables like PATH, prefix the name with CONF_ for example.
Crude code:
while IFS='=' read -r conf_name conf_value; do
printf -v "CONF_$conf_name" '%s' "$conf_value" \
|| echo "Invalid configuration name '$conf_name'" >&2
done < your_configuration_file.conf
Test it works:
$ echo "${!CONF_*}"
CONF_DIRECTORY CONF_ENVIRONMENT CONF_FILE CONF_INTEGRATION CONF_OTHERVAR CONF_SOMEVAR
$ printf '%s\n' "$CONF_SOMEVAR"
GONNAHACK $(rm -f some_important_file)

batch job submission upon completion of job

I would like to write a script to execute the steps outlined below. If someone can provide simple examples on how to modify files and search through folders using a script (not necessarily solving my problem below), I will greatly appreciate it.
submit job MyJob in currentDirectory using myJobShellFile.sh to a queue
upon completion of MyJob, goto to currentDirectory/myJobDataFolder.
In myJobDataFolder, there are folders
myJobData.0000 myJobData.0001 myJobData.0002 myJobData.0003
I want to find the maximum number maxIteration of all the listed folders. Here it would be maxIteration=0003.\
In file myJobShellFile.sh, at the last line says
mpiexec ./main input myJobDataFolder
I want to append this line to
'mpiexec ./main input myJobDataFolder 0003'
I want to submit MyJob to the que while maxIteration < 10
Upon completion of MyJob, find the new maxIteration and change this number in myJobShellFile.sh and goto step 4.
I think people write python scripts typically to do this stuff, but am having a hard time finding out how. I probably don't know the correct terminology for this procedure. I am also aware that the script will vary slightly depending on the queing system, but any help will be greatly appreciated.
Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to
'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash tag instead of python.
In the following script, I put a ### at the end of any line where I am guessing what you are talking about. Lines ending with ### may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1} line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt} form, and the “Arithmetic Expansion” section re $((expression)) form. The 1$maxIter and similar forms are used to change text like 0018 (which is not a valid bash number because 8 is not an octal digit) to 10018.
#!/bin/sh
./myJobShellFile.sh MyJob ###
maxIter=0
while true; do
waitforjobcompletion ###
cd ./myJobDataFolder
maxFile= $(ls myJobData* | tail -1)
maxIter= ${maxFile#myJobData.} #Get max extension
# If you want to increment maxIter, uncomment next line
# t=$((1$maxIter+1)); maxIter=${t#1}
cd ..
if [[ 1$maxIter -lt 11000 ]] ; then
./myJobShellFile.sh MyJobDataFolder $maxIter
else
break
fi
done
Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)
#!/bin/sh
for iter in {0000..1000}; do
./myJobShellFile.sh MyJobDataFolder $iter
waitforjobcompletion ###
done
(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion in the above scripts. Otherwise, if the system has a command jobisrunning that returns true if a job is still running, replace waitforjobcompletion with something like the following:
while jobisrunning ; do sleep 15; done
This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:
while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done
The second line's test could be [ -f abc ] instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while statement's test, replace the word while with until. For example, while [ ! -f abc ]; ... is equivalent to until [ -f abc ]; ....

Bash: Slow redirection and filter

I have a bash script that calls a program which generates a humongous amount of output. A lot of this data is coming from a Python package that I have not created and whose output I can't really control, nor interests me.
I tried to filter the output generated by that external Python package and redirect the "cleaned" output to a log file. If I used regular pipes and grep expressions, I lost many chunks of information. I read that is something that can actually happen with the redirections (1 and 2).
In order to fix that, I made the redirections like this:
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | while IFS='' read -r thingy ; do
if [[ ! "$thingy" =~ $regexTxnFilterer ]] && [[ ! "$thingy" =~ $regexThreadPoolFilterer ]]; then
echo "$thingy" >> "/var/log/myOutput.log"
fi
done
Which doesn't lose any information (at least not that I could tell) and filters the strings I don't need (using the two regular expressions above).
The issue is that it has rendered the application (the bin/paster thing I'm executing) unbearably slow. Is there any way to achieve the same effect but with a better performance?
Thank you in advance!
Update #2012-04-13: As shellter pointed out in one of the comments to this question, it may be useful to provide examples of the outputs I want to filter. Here's a bunch of them:
2012-04-13 19:30:37,996 DEBUG [txn.-1220917568] new transaction
2012-04-13 19:30:37,997 DEBUG [txn.-1220917568] commit <zope.sqlalchemy.datamanager.SessionDataManager object at 0xbf4062c>
2012-04-13 19:30:37,997 DEBUG [txn.-1220917568] commit
Starting server in PID 18262.
2012-04-13 19:30:38,292 DEBUG [paste.httpserver.ThreadPool] Started new worker -1269716112: Initial worker pool
2012-04-13 19:33:08,158 DEBUG [txn.-1244144784] new transaction
2012-04-13 19:33:08,158 DEBUG [txn.-1244144784] commit
2012-04-13 19:32:06,980 DEBUG [paste.httpserver.ThreadPool] Added task (0 tasks queued)
2012-04-13 19:32:06,980 INFO [paste.httpserver.ThreadPool] kill_hung_threads status: 10 threads (0 working, 10 idle, 0 starting) ave time N/A, max time 0.00sec, killed 0 workers
There's a few more different messages involving the ThreadPool though, but I couldn't catch any.
For one thing -- you're reopening the log file every time you want to append a line. That's silly.
Instead of this:
while ...; do
echo "foo" >>filename
done
Do this (which opens the output file on a new, non-stdout file handle, such that you still have a clear line to stdout should you wish to write to it):
exec 4>>filename
while ...; do
echo "foo" >&4
done
It's also possible to redirect stdout for the whole loop:
while ...; do
echo "foo"
done >filename
...notably, this will impact more than just the "echo" line, and thus have slightly different semantics from the original.
Or, better yet -- Configure the Python logging module to filter output to only what you care about, and don't bother with shell-script postprocessing at all.
If the version of Paste you're using is sufficiently similar to modern Pyramid, you can put this in your ini file (currently parts/etc/debug.ini):
[logger_paste.httpserver.ThreadPool]
level = INFO
[logger_txn]
level = INFO
...and anything below INFO level (including the DEBUG messages) will be excluded.
It may be faster to use a grep-based solution to this
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | grep -vf <(echo "$regexTxnFilterer"; echo "$regexThreadPoolFilterer") >> "/var/log/myOutput.log"
Your loop may be slow because the echo "$thingy" >> "/var/log/myOutput.log" line is opening and closing the log file every time it executes. I wouldn't expect there to be a big performance difference between grep's regex matching and bash's, but if there was it wouldn't surprise me.
Late Edit
There's a far simpler way to fix the performance issue caused by opening/closing the output once per line. Why this didn't occur to me before, I have no idea. Just move the >> to outside your loop
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | while IFS='' read -r thingy ; do
if [[ ! "$thingy" =~ $regexTxnFilterer ]] && [[ ! "$thingy" =~ $regexThreadPoolFilterer ]]; then
echo "$thingy"
fi
done >> "/var/log/myOutput.log"
I can't see any compelling reason why this would be either faster or slower than the grep solution, but it's a lot closer to the original code and a little less cryptic.

How to parametrize verbosity of debug output (BASH)?

During the process of writing a script, I will use the command's output in varying ways, and to different degrees - in order to troubleshoot the task at hand.. For example, in this snippet, which reads an Application's icon resource and returns whether or not it has the typical .icns extension...
icns=`defaults read /$application/Contents/Info CFBundleIconFile`
if ! [[ $icns =~ ^(.*)(.icns)$ ]]; then
echo -e $icns "is NOT OK YOU IDIOT! **** You need to add .icns to "$icns"."
else
echo -e $icns "\t Homey, it's cool. That shits got its .icns, proper."
fi
Inevitably, as each bug is squashed, and the stdout starts relating more to the actual function vs. the debugging process, this feedback is usually either commented out, silenced, or deleted - for obvious reasons.
However, if one wanted to provide a simple option - either hardcoded, or passed as a parameter, to optionally show some, all, or none of "this kind" of message at runtime - what is the best way to provide that simple functionality? I am looking to basically duplicate the functionality of set -x but instead of a line-by rundown, it would only print the notifications that I had architected specificically.
It seems excessive to replace each and every echo with an if that checks for a debug=1|0, yet I've been unable to find a concise explanation of how to implement a getopts/getopt scheme (never can remember which one is the built-in), etc. in my own scripts. This little expression seemed promising, but there is very little documentation re: 2>$1 out there (although I'm sure this is key to this puzzle)
[ $DBG ] && DEBUG="" || DEBUG='</dev/null'
check_errs() {
# Parameter 1 is the return code Para. 2 is text to display on failure.
if [ "${1}" -ne "0" ]; then
echo "ERROR # ${1} : ${2}"
else
echo "SUCESSS "
fi }
Any concise and reusable tricks to this trade would be welcomed, and if I'm totally missing the boat, or if it was a snake, and it would be biting me - I apologize.
One easy trick is to simply replace your "logging" echo comamnd by a variable, i.e.
TRACE=:
if test "$1" = "-v"; then
TRACE=echo
shift
fi
$TRACE "You passed the -v option"
You can have any number of these for different types of messages if you wish so.
you may check a common open source trace library with support for bash.
http://sourceforge.net/projects/utalm/
https://github.com/ArnoCan/utalm
WKR
Arno-Can Uestuensoez

How to improve this FTP (shell) function?

I have about a ton of scripts using the following function:
# Copies files over using FTP.
# Configurations set at the beggining of the script
# #param $1 = FTP Host
# $2 = FTP User
# $3 = FTP User password
# $4 = Source file name
# $5 = destination directory
# $6 = local directory
doftp() {
log_message_file "INFO" "Starting FTP"
ftp_hst=$1
ftp_usr=$2
ftp_pwd=$3
sourcefile=$4
destdir=$5
locdir=$6
ftp -nv $FTPH << EOF 2> ftp.err.$$
quote USER $ftp_usr
quote PASS $ftp_pwd
cd $destdir
lcd $locdir
bin
put $sourcefile
bye
EOF
if [ "$(wc ftp.err.$$|cut -d" " -f8)" != 0 ] ; then
log_message_file "ERROR" "Problem uploading files: $(cat ftp.err.$$)"
else
log_message_file "INFO" "FTP finished"
fi
rm ftp.err.$$
}
It works, does it's job, unless ftp fails. Fortunately for me, the scripts are quite precise and the FTP almost never fails. But this is one of these rare moments when one has the chance (time) to go back and review code marked in the TODO list. Only problem is that I'm not too sure on how to improve it... I'd take you guys recommendations on what to change there.
One obvious problem is the error parsing from the ftp, which is a totally lame. But I'll also take thoughts in other parts of the function :)
Worth mentioning this is run in a AIX server? Oh, and no, I cannot use SFTP :(
Thanks for any input!
ps.: log_message_file is just a basic logging... has no effect in the function.
good documentation
good variable names
good indentation, you might want to read about <<-EOF (the '-' being the item I intend to highlight with this comment. Using that feature allows you to indent (must use tab chars) the whole here-document to match indenting of rest of script.
good use of tmp filename with $$ (depending on how much this function gets used, you might want to append the parent script name as part of the tmp name, to further disambiguate, but low priority)
good use of $(cat ftp.err.$$) i.e. actually showing the error message, rather than just a message like 'error occured' (I see this all the time, what error? what was the msg?!)
you could extend the ftp service to use mput, but then you have to understand the vagaries of your particular ftp client AND make a note to your self that any time there is a change of ftp client you'll need to check if your mput ${fileNames} variable still works as you expect.
possibly the one place to think about improvement would be to use a case statement to parse the STDERR output, but again, the added benefit may not be worth the maintenance cost in the future.
errMsgs="$(cat ftp.err.$$)"
case "${errMsgs}" in
*warningStrings* ) print "warning found, msg was ${errMsg} ;;
*errorStrings* ) print "error found, msg was ${errMsg} ;;
*fatalStrings* ) pring "fatal error found, can't continue, msg was ${errMsg} ;;
esac
I hope this helps.

Resources