Watching folder for changes - bash efficiency - bash

I've got this:
#!/bin/bash
while :
do
SUM=$(tree | md5sum)
if [ "$SUMOLD" != "$SUM" ]; then
# do something here
SUMOLD=$SUM
sleep 1
fi
done
Which works just fine. But, the problem is that it consumes 50% of the CPU, a Core2 Duo T8300. Why is that? How to improve the efficiency?

This is a job for inotifywait. inotify is Linux's event-based system for monitoring files and directories for changes. Goodbye polling loops!
NAME
inotifywait - wait for changes to files using inotify
SYNOPSIS
inotifywait [-hcmrq] [-e <event> ] [-t <seconds> ] [--format <fmt> ]
[--timefmt <fmt> ] <file> [ ... ]
DESCRIPTION
inotifywait efficiently waits for changes to files using Linux's inotify(7) interface. It is suitable for waiting for changes to
files from shell scripts. It can either exit once an event occurs, or continually execute and output events as they occur.
Here's how you could write a simple loop which detects whenever files are added, modified, or deleted from a directory:
inotifywait -mq /dir | while read event; do
echo "something happened in /dir: $event"
done
Take a look at the man page for more options. If you only care about modifications and want to ignore files simply being read, you could use -e to limit the types of events.

While a OS specific solution like inotify would be better, you can dramatically improve your script by moving the sleep out of the if statement:
#!/bin/bash
while :
do
SUM=$(tree | md5sum)
if [ "$SUMOLD" != "$SUM" ]; then
# do something here
SUMOLD=$SUM
# move sleep from here
fi
sleep 1 # to here
done
CPU usage should drop dramatically now that you're always checking once per second, instead of as often as possible when there are no changes. You can also replace tree with find, and sleep for longer between each check.

The reason is that the script is continuously running even when the command sleep is called. My recommendation is to launch your script using the "inotifywait" or the "watch" (Alternative solution) command and avoid to use the while loop.
See: http://linux.die.net/man/1/watch
An example taken from the MAN pages:
To watch the contents of a directory change, you could use
watch -d ls -l
Watch is going to launch your script periodically but without continuously execute your script.

Related

whether a shell script can be executed if another instance of the same script is already running

I have a shell script which usually runs nearly 10 mins for a single run,but i need to know if another request for running the script comes while a instance of the script is running already, whether new request need to wait for existing instance to compplete or a new instance will be started.
I need a new instance must be started whenever a request is available for the same script.
How to do it...
The shell script is a polling script which looks for a file in a directory and execute the file.The execution of the file takes nearly 10 min or more.But during execution if a new file arrives, it also has to be executed simultaneously.
the shell script is below, and how to modify it to execute multiple requests..
#!/bin/bash
while [ 1 ]; do
newfiles=`find /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -newer /afs/rch/usr$
touch /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/.my_marker
if [ -n "$newfiles" ]; then
echo "found files $newfiles"
name2=`ls /afs/rch/usr8/fsptools/WWW/cgi-bin/upload/ -Art |tail -n 2 |head $
echo " $name2 "
mkdir -p -m 0755 /afs/rch/usr8/fsptools/WWW/dumpspace/$name2
name1="/afs/rch/usr8/fsptools/WWW/dumpspace/fipsdumputils/fipsdumputil -e -$
$name1
touch /afs/rch/usr8/fsptools/WWW/dumpspace/tempfiles/$name2
fi
sleep 5
done
When writing scripts like the one you describe, I take one of two approaches.
First, you can use a pid file to indicate that a second copy should not run. For example:
#!/bin/sh
pidfile=/var/run/$(0##*/).pid
# remove pid if we exit normally or are terminated
trap "rm -f $pidfile" 0 1 3 15
# Write the pid as a symlink
if ! ln -s "pid=$$" "$pidfile"; then
echo "Already running. Exiting." >&2
exit 0
fi
# Do your stuff
I like using symlinks to store pid because writing a symlink is an atomic operation; two processes can't conflict with each other. You don't even need to check for the existence of the pid symlink, because a failure of ln clearly indicates that a pid cannot be set. That's either a permission or path problem, or it's due to the symlink already being there.
Second option is to make it possible .. nay, preferable .. not to block additional instances, and instead configure whatever it is that this script does to permit multiple servers to run at the same time on different queue entries. "Single-queue-single-server" is never as good as "single-queue-multi-server". Since you haven't included code in your question, I have no way to know whether this approach would be useful for you, but here's some explanatory meta bash:
#!/usr/bin/env bash
workdir=/var/tmp # Set a better $workdir than this.
a=( $(get_list_of_queue_ids) ) # A command? A function? Up to you.
for qid in "${a[#]}"; do
# Set a "lock" for this item .. or don't, and move on.
if ! ln -s "pid=$$" $workdir/$qid.working; then
continue
fi
# Do your stuff with just this $qid.
...
# And finally, clean up after ourselves
remove_qid_from_queue $qid
rm $workdir/$qid.working
done
The effect of this is to transfer the idea of "one at a time" from the handler to the data. If you have a multi-CPU system, you probably have enough capacity to handle multiple queue entries at the same time.
ghoti's answer shows some helpful techniques, if modifying the script is an option.
Generally speaking, for an existing script:
Unless you know with certainty that:
the script has no side effects other than to output to the terminal or to write to files with shell-instance specific names (such as incorporating $$, the current shell's PID, into filenames) or some other instance-specific location,
OR that the script was explicitly designed for parallel execution,
I would assume that you cannot safely run multiple copies of the script simultaneously.
It is not reasonable to expect the average shell script to be designed for concurrent use.
From the viewpoint of the operating system, several processes may of course execute the same program in parallel. No need to worry about this.
However, it is conceivable, that a (careless) programmer wrote the program in such a way that it produces incorrect results, when two copies are executed in parallel.

Using inotifywait (or alternative) to wait for rsync transfer to complete before running script?

I would like to setup inotifywait so that it monitors a folder and when something is copied to this folder (lsyncd which uses rsync) I would like inotifywait to sit tight and wait until rsync is done before calling a script to process the new folder.
I have been researching online to see if someone is doing this but I am not finding much.
I'm not the most well versed with bash scripting though I understand some basics.
Here is a little script I found that pauses for a second but it still triggers a dozen events per transfer:
EVENTS="CLOSE_WRITE,MOVED_TO"
if [ -z "$1" ]; then
echo "Usage: $0 cmd ..."
exit -1;
fi
inotifywait -e "$EVENTS" -m -r --format '%:e %f' . | (
WAITING="";
while true; do
LINE="";
read -t 1 LINE;
if test -z "$LINE"; then
if test ! -z "$WAITING"; then
echo "CHANGE";
WAITING="";
fi;
else
WAITING=1;
fi;
done) | (
while true; do
read TMP;
echo $#
$#
done
)
I would be happy to provide more details or information.
Thank you.
Depending on what action you want to take, you may want to take a look at the tools provided by Watchman.
There are two that might be most useful to you:
If you want to initiate some action after the files are synced up, you may want to try using watchman-make. This is most appropriate if the action is to run a tool like make where the tool itself will look over the tree and produce its output (in other words: where you don't need to pass the precise list of changed files directly to your tool). You can have it run some other tool instead of make. There is a --settle option that you can use to have it wait a few moments after the latest file change notification before executing your tool.
watchman-make --make='process-folder.sh' -p '**/*.*'
watchman-wait is more closely related to inotifywait. It also waits for changes to settle before reporting files as changed, but because this tool doesn't coalesce multiple different file changes into a single event, the settle period is configured as a property of the tree being watched rather than as a command line parameter
Disclaimer: I'm the creator of Watchman

How to get a full copy of a dump file while it is still being created?

Every hour the main machine takes a minute to produce a dump file of
about 100MB.
A backup machine copies that, using scp, also hourly.
Both actions are triggered by cron to start at the same minutes past the hour.
The copy often contains only part of the dump file.
Although I could change the cron on the backup machine to happen 5 minutes later, that rather smells of cheating.
What is the correct way around this problem?
Leverage fuser to determine if the file is in-use; something like:
#!/bin/sh
MYFILE=myfile #...change as necessary
while true
do
if [ -z "$(/sbin/fuser ${MYFILE} 2> /dev/null)" ]; then
break
fi
echo "...waiting..."
sleep 10 #...adjust as necessary
done
echo "Process the file..."
Assuming you can modify the source code of the programs, you can have the "dumper" output a second file indicating that it is done dumping.
1. Dumper deletes signal file
2. Dumper dumps file
3. Dumper creates signal file
4. Backup program polls periodically until the signal file is created
5. Backup program deletes signal file
6. Backup program has complete file
Not the best form of IPC but as its on another machine... I suppose you could also open a network connection to do the polling.
Very quick and very dirty:
my_file=/my/filename
while [[ -n "$(find $my_file)" -a -n "$(find $my_file -mmin -1)" ]] ; do
echo "File '$my_file' exists, but has been recently modified. Waiting..."
sleep 10
done
scp ... # if the file does not exist at all, we don't wait for it, but let scp fail
Try to copy the file from the main node to the backup node in one script:
1. dumpfile
2. scp
Then it's guaranteed that the file is finished before the scp call.

Shell script that continuously checks a text file for log data and then runs a program

I have a java program that stops often due to errors which is logged in a .log file. What can be a simple shell script to detect a particular text in the last/latest line say
[INFO] Stream closed
and then run the following command
java -jar xyz.jar
This should keep on happening forever(possibly after every two minutes or so) because xyz.jar writes the log file.
The text stream closed can arrive a lot of times in the log file. I just want it to take an action when it comes in the last line.
How about
while [[ true ]];
do
sleep 120
tail -1 logfile | grep -q "[INFO] Stream Closed"
if [[ $? -eq 1 ]]
then
java -jar xyz.jar &
fi
done
There may be condition where the tailed last log "Stream Closed" is not the real last log and the process is still logging the messages. We can avoid this condition by checking if the process is alive or not. If the process exited and the last log is "Stream Closed" then we need to restart the application.
#!/bin/bash
java -jar xyz.jar &
PID=$1
while [ true ]
do
tail -1 logfile | grep -q "Stream Closed" && kill -0 $PID && sleep 20 && continue
java -jar xyz.jar &
PID=$1
done
I would prefer checking whether the corresponding process is still running and restart the program on that event. There might be other errors that cause the process to stop. You can use a cronjob to periodically (like every minute) perform such a check.
Also, you might want to improve your java code so that it does not crash that often (if you have access to the code).
i solved this using a watchdog script that checks directly (grep) if program(s) is(are) running. by calling watchdog every minute (from cron under ubuntu), i basically guarantee (programs and environment are VERY stable) that no program will stay offline for more than 59 seconds.
this script will check a list of programs using the name in an array and see if each one is running, and, in case not, start it.
#!/bin/bash
#
# watchdog
#
# Run as a cron job to keep an eye on what_to_monitor which should always
# be running. Restart what_to_monitor and send notification as needed.
#
# This needs to be run as root or a user that can start system services.
#
# Revisions: 0.1 (20100506), 0.2 (20100507)
# first prog to check
NAME[0]=soc_gt2
# 2nd
NAME[1]=soc_gt0
# 3rd, etc etc
NAME[2]=soc_gp00
# START=/usr/sbin/$NAME
NOTIFY=you#gmail.com
NOTIFYCC=you2#mail.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/bin/mail
RM=/bin/rm
for nameTemp in "${NAME[#]}"; do
$PS -ef|$GREP -v grep|$GREP $nameTemp >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
echo "$nameTemp is RUNNING OK. Relax."
$NOP
;;
1)
echo "$nameTemp is NOT RUNNING. Starting $nameTemp and sending notices."
START=/usr/sbin/$nameTemp
$START 2>&1 >/dev/null &
NOTICE=/tmp/watchdog.txt
echo "$NAME was not running and was started on `$DATE`" > $NOTICE
# $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac
done
exit
i do not use the log verification, though you could easily incorporate that into your own version (just change grep for log check, for example).
if you run it from command line (or putty, if you are remotely connected), you will see what was working and what wasnt. have been using it for months now without a hiccup. just call it whenever you want to see what's working (regardless of it running under cron).
you could also place all your critical programs in one folder, do a directory list and check if every file in that folder has a program running under the same name. or read a txt file line by line, with every line correspoding to a program that is supposed to be running. etcetcetc
A good way is to use the awk command:
tail -f somelog.log | awk '/.*[INFO] Stream Closed.*/ { system("java -jar xyz.jar") }'
This continually monitors the log stream and when the regular expression matches its fires off whatever system command you have set, which is anything you would type into a shell.
If you really wanna be good you can put that line into a .sh file and run that .sh file from a process monitoring daemon like upstart to ensure that it never dies.
Nice and clean =D

Bash script to show updating information in terminal

I'm trying to write a bash script that displays the output from a python script. I want the output refreshed every second, so my script looks like this (run.sh):
#!/bin/bash
export INTERVAL=1
export SCRIPT="something.py"
while [ true ]
do
clear
python ${SCRIPT}
sleep ${INTERVAL}
done
The screen, however, flickers while the python script works (there's some web access involved). How can I make this more sophisticated and wait for the script to finish before clearing what I used to have?
Thanks in advance!
Use watch. It will only update the screen when the entire script is done, and it'll take care of things like clearing the screen, and dealing with output that is larger than a single screen.
watch -n ${INTERVAL} 'python ${SCRIPT}'
If you want to see an example of how watch works with long-running tasks, do this:
watch 'date; echo; echo Long running task...; sleep 3; echo; date'
A quick way is to establish a temporary file:
tmpf=`mktemp`
while [ condition ]
do
python ${SCRIPT} > $tmpf
clear
cat $tmpf
sleep ${INTERVAL}
done
rm $tmpf
This requires you to do some cleanup on exit, though. Other than that I would suggest moving the whole loop into python because really, why not? You can use subprocess to fork out another shell and even get a more generic program.
Supplement:
You can make do with the trap builtin (here's an article on it) to do the cleanup automatically when you kill your script.
It's an ugly hack but doesn't use temporary files or fifo's:
a=$(clear; python ${SCRIPT})
echo $a
but seriously: the best way is to incorporate the screen clearing in your script. Give it a switch -clear or something like it.

Resources