Make inotifywait group multiple file updates into one? - shell

I have a folder with Sphinx docs that I watch with inotifywait (from inotify-tools). The script re-builds the html & singlehtml and refreshes Chrome.
#!/bin/sh
inotifywait -mr source --exclude _build -e close_write -e create -e delete -e move | while read file event; do
make html singlehtml
xdotool search --name Chromium key --window %# F5
done
This works fine when I save a single file. However, when I hg update to an old revision or paste multiple files in source folder, it fires the script for every single file.
Is there a simple workaround (without writing custom python scripts -- this I can do) to make it wait a fraction of a second before firing the script?

I made a bit more complex shell script and posted it in the article:
inotifywait -mr source --exclude _build -e close_write -e create -e delete -e move --format '%w %e %T' --timefmt '%H%M%S' | while read file event tm; do
current=$(date +'%H%M%S')
delta=`expr $current - $tm`
if [ $delta -lt 2 -a $delta -gt -2 ] ; then
sleep 1 # sleep 1 set to let file operations end
make html singlehtml
xdotool search --name Chromium key --window %# F5
fi
done
It makes inotifywait log not only filename & action, but also timestamp. The script compares the timestamp with current unixtime and if the delta is less than 2 sec, it runs make html. But before that it sleeps 1 second to let file operations end. For the next modified files the timestamp will be old, the delta will be more than 2 seconds, and nothing will be done.
I found this way was the least CPU consuming and the most reliable.
I also tried running a simple Python script, but this meant if I pasted something as big as jQueryUI into the folder, a thousand processes were spawned and then became zombies.

Try this:
last_update=0
inotifywait -mr source --exclude _build -e close_write -e create \
-e delete -e move --format '%T' --timefmt '%s' |
while read timestamp; do
if test $timestamp -ge $last_update; then
sleep 1
last_update=$(date +%s)
make html singlehtml
xdotool search --name Chromium key --window %# F5
fi
done
--format '%T' --timefmt '%s' causes a timestamp to be output for each event.
test $timestamp -ge $last_update compares the event timestamp with the timestamp of the last update. Any events that occurred during the sleep are thus skipped.
sleep 1 is added to wait for events to accumulate. A shorter duration might be good here, like sleep 0.5, but it would be less portable.
last_update=$(date +%s%N) sets a timestamp for the last update to be compared with the next event's timestamp. In this way, any additional events that occur during the sleep 1 are discarded during the next iteration of the loop.
Note, there is a race condition here because strftime() does not support nanoseconds. This example may run make twice if a group of events crosses a second boundary. To instead risk missing events, replace -ge with -gt.

Related

Using bash, how do you get alerts when a file stops being updated?

I want to be alerted, say by email, if a file does not change for x minutes.
For context, an application is running 24/7 collecting tweets on an unattended system. If the file storing the tweets doesn't change for x minutes, I need to go and check the application hasn't terminated unexpectedly.
Any ideas please? I considered watch but I am new to bash and Linux in general.
Use inotifywait
#!/usr/bin/env sh
MONITOREDFILE=/path/to/monitoredfile
TIMEOUT=600 # 600s or 10 minutes
EMAIL=user#example.com
lastmodified="monitoring started on $(date -R)"
while inotifywait \
--quiet \
--timeout $TIMEOUT \
--event 'MODIFY' \
"$MONITOREDFILE"
do
printf '%s has been modified before %s seconds timeout\n' \
"$MONITOREDFILE" $TIMEOUT
lastmodified=$(date -R)
done
printf '!!! ALERT !!!\nFile %s has not been modified since %s seconds\n' \
"$MONITOREDFILE" $TIMEOUT >&2
mailx -s "Stalled file $MONITOREDFILE" "$EMAIL" <<ENDOFMAIL
Monitored file $MONITOREDFILE has not been modified since $lastmodified.
ENDOFMAIL
A different approach to get file last modification with GNU date, and having the loop empty:
#!/usr/bin/env sh
MONITOREDFILE=/path/to/monitoredfile
TIMEOUT=600 # 600s or 10 minutes
EMAIL=user#example.com
while inotifywait --quiet --timeout $TIMEOUT --event 'MODIFY' "$MONITOREDFILE"
do :;done
lastmodified=$(date --utc --iso-8601=seconds --reference="$MONITOREDFILE")
mailx -s "Stalled file $MONITOREDFILE" "$EMAIL" <<ENDOFMAIL
Monitored file $MONITOREDFILE has not been modified since $lastmodified.
ENDOFMAIL
With for example a file named "tweets" in the directory "tmp", we could run a script incorporating find with the mmin flag. We can signify any changes to the file in the last 5 minutes with -5. If the count of the files returned by the find command is not 0 (through piping through to wc -l) we then run the command to email.
#!/bin/bash
if [[ "$(find /tmp -name "*tweets" -mmin -5 | wc -l)" != "0" ]]
then
echo "There is an issue" | mailx -s alert someone#someemail.com
fi
This can then be set up to run every 5 minutes off a cron job.
Well, if you say the service is not that critical, you can just create a cronjob to check the modification time of the certain file, and call your alerting script if some condition was satisfied.
If that is the case, google some keywords like "crontab", "find mmin", and build your cronjob.
Otherwise, IMO, a good way could be using something like grafana. There you can define how do you or your team get informed when some event occurred.
Your program needs to somehow register its status. E.g. Prometheus metrics.
In this way, your alert/monitoring is separated from the server your application is running on. Also you can track all the historical status of the service.
Think about if you run a cronjob or shell script on the server to check the file modification timestamp and alert on some event. In case the server is down, you won't get alert, and of course, you think the service is running well.
Again, it all depends on how important is your service.

Monitor Pre-existing and new files in a directory with bash

I have a script using inotify-tool.
This script notifies when a new file arrives in a folder. It performs some work with the file, and when done it moves the file to another folder. (it looks something along these line):
inotifywait -m -e modify "${path}" |
while read NEWFILE
work on/with NEWFILE
move NEWFILE no a new directory
done
By using inotifywait, one can only monitor new files. A similar procedure using for OLDFILE in path instead of inotifywait will work for existing files:
for OLDFILE in ${path}
do
work on/with OLDFILE
move NEWFILE no a new directory
done
I tried combining the two loops. By first running the second loop. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running. These files will then not be captured by neither loop.
Given that files already exists in a folder, and that new files will arrive quickly inside the folder, how can one make sure that the script will catch all files?
Once inotifywait is up and waiting, it will print the message Watches established. to standard error. So you need to go through existing files after that point.
So, one approach is to write something that will process standard error, and when it sees that message, lists all the existing files. You can wrap that functionality in a function for convenience:
function list-existing-and-follow-modify() {
local path="$1"
inotifywait --monitor \
--event modify \
--format %f \
-- \
"$path" \
2> >( while IFS= read -r line ; do
printf '%s\n' "$line" >&2
if [[ "$line" = 'Watches established.' ]] ; then
for file in "$path"/* ; do
if [[ -e "$file" ]] ; then
basename "$file"
fi
done
break
fi
done
cat >&2
)
}
and then write:
list-existing-and-follow-modify "$path" \
| while IFS= read -r file
# ... work on/with "$file"
# move "$file" to a new directory
done
Notes:
If you're not familiar with the >(...) notation that I used, it's called "process substitution"; see https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution for details.
The above will now have the opposite race condition from your original one: if a file is created shortly after inotifywait starts up, then list-existing-and-follow-modify may list it twice. But you can easily handle that inside your while-loop by using if [[ -e "$file" ]] to make sure the file still exists before you operate on it.
I'm a bit skeptical that your inotifywait options are really quite what you want; modify, in particular, seems like the wrong event. But I'm sure you can adjust them as needed. The only change I've made above, other than switching to long options for clarity/explicitly and adding -- for robustness, is to add --format %f so that you get the filenames without extraneous details.
There doesn't seem to be any way to tell inotifywait to use a separator other than newlines, so, I just rolled with that. Make sure to avoid filenames that include newlines.
By using inotifywait, one can only monitor new files.
I would ask for a definition of a "new file". The man inotifywait specifies a list of events, which also lists events like create and delete and delete_self and inotifywait can also watch "old files" (beeing defined as files existing prior to inotifywait execution) and directories. You specified only a single event -e modify which notifies about modification of files within ${path}, it includes modification of both preexisting files and created after inotify execution.
... how can one make sure that the script will catch all files?
Your script is just enough to catch all the events that happen inside the path. If you have no means of synchronization between the part that generates files and the part that receives, there is nothing you can do and there always be a race condition. What if you script receives 0% of CPU time and the part that generates the files will get 100% of CPU time? There is no guarantee of cpu time between processes (unless using certified real time system...). Implement a synchronization between them.
You can watch some other event. If the generating sites closes files when ready with them, watch for the close event. Also you could run work on/with NEWFILE in parallel in background to speed up execution and reading new files. But if the receiving side is slower then the sending, if your script is working on NEWFILEs slower then the generating new files part, there is nothing you can do...
If you have no special characters and spaces in filenames, I would go with:
inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
lock "${path}"
work on "${path}/${file}"
ex. mv "${path}/${file}" ${new_location}
unlock "${path}"
done
where lock and unlock is some locking mechanisms implemented between your script and the generating part. You can create a communication between the-creation-of-files-process and the-processing-of-the-files-process.
I think you can use some transaction file system, that would let you to "lock" a directory from the other scripts until you are ready with the work on it, but I have no experience in that field.
I tried combining the two loops. But if files arrive quickly and in large numbers there is a change that the files will arrive wile the second loop is running.
Run the process_new_file_loop in background prior to running the process_old_files_loop. Also it would be nice to make sure (ie. synchronize) that inotifywait has successfully started before you continue to the processing-existing-files-loop so that there is also no race conditions between them.
Maybe a simple example and/or startpoint would be:
work() {
local file="$1"
some work "$file"
mv "$file" "$predefiend_path"
}
process_new_files_loop() {
# let's work on modified files in parallel, so that it is faster
trap 'wait' INT
inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
work "${path}/${file}" &
done
}
process_old_files_loop() {
# maybe we should parse in parallel here too?
# maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?
find "${path}" -type f |
while IFS= read -r file; do
work "${file}"
done
}
process_new_files_loop &
child=$!
sleep 1
if ! ps -p "$child" >/dev/null 2>&1; then
echo "ERROR running processing-new-file-loop" >&2
exit 1
fi
process_old_files_loop
wait # wait for process_new_file_loop
If you really care about execution speeds and want to do it faster, change to python or to C (or to anything but shell). Bash is not fast, it is a shell, should be used to interconnect two processes (passing stdout of one to stdin of another) and parsing a stream line by line while IFS= read -r line is extremely slow in bash and should be generally used as a last resort. Maybe using xargs like xargs -P0 -n1 sh -c "work on $1; mv $1 $path" -- or parallel would be a mean to speed things up, but an average python or C program probably will be nth times faster.
A simpler solution is to add an ls in front of the inotifywait in a subshell, with awk to create output that looks like inotifywait.
I use this to detect and process existing and new files:
(ls ${path} | awk '{print "'${path}' EXISTS "$1}' && inotifywait -m ${path} -e close_write -e moved_to) |
while read dir action file; do
echo $action $dir $file
# DO MY PROCESSING
done
So it runs the ls, format the output and sends it to stdout, then runs the inotifywait in the same subshell sending the output also to stdout for processing.

inotifywait triggering event twice while converting docx to PDF

I have shell script with inotifwait set up as under:
inotifywait -r -e close_write,moved_to -m "<path>/upload" --format '%f######%e######%w'
There are some docx files residing in watched directory and some script converts docx to PDF via below command:
soffice --headless --convert-to pdf:writer_pdf_Export <path>/upload/somedoc.docx --outdir <path>/upload/
Somehow event is triggered twice as soon as PDF is generated. Entries are as under:
somedoc.pdf######CLOSE_WRITE,CLOSE######<path>/upload/
somedoc.pdf######CLOSE_WRITE,CLOSE######<path>/upload/
What else is wrong here?
Regards
It's triggered twice because this is how soffice appears to behave internally.
One day it may start writing it 10 times and doing sleep 2 between such writes during a single run, our program can't and I believe shouldn't anticipate it and depend on it.
So I'd try solving the problem from a different angle - lets just put the converted file into a temporary directory and then move it to the target dir, like this:
soffice --headless --convert-to pdf:writer_pdf_Export <path>/upload/somedoc.docx --outdir <path>/tempdir/ && mv <path>/tempdir/somedoc.pdf <path>/upload/
and use inotifywait in the following way:
inotifywait -r -e moved_to -m "<path>/upload" --format '%f######%e######%w'
The advantage is that you no longer depend on soffice's internal logic.
If you can't adjust behavior of the script producing the pdf files then indeed you'll need to resort to a workaround like #Tarun suggested.
I don't think you can control the external program as such. But I assume you are using this output for a pipe and then inputing it some place else. In that case you can avoid a event that happens continuously with a span of few seconds
So we add %T to --format and --timefmt "%s" to get the epoch time. Below is the updated command
$ inotifywait -r -e close_write,moved_to --timefmt "%s" -m "/home/vagrant" --format '%f######%e######%w##T%T' -q | ./process.sh
test.txt######CLOSE_WRITE,CLOSE######/home/vagrant/
Skipping this event as it happend within 2 seconds. TimeDiff=2
test.txt######CLOSE_WRITE,CLOSE######/home/vagrant/
This was done by using touch test.txt, multiple time every second. And as you can see second even was skipped. The process.sh is a simple bash script
#!/bin/bash
LAST_EVENT=
LAST_EVENT_TIME=0
while read line
do
DEL="##T"
EVENT_TIME=$(echo "$line" | awk -v delimeter="$DEL" '{split($0,a,delimeter)} END{print a[2]}')
EVENT=$(echo "$line" | awk -v delimeter="$DEL" '{split($0,a,delimeter)} END{print a[1]}')
TIME_DIFF=$(( $EVENT_TIME - $LAST_EVENT_TIME))
if [[ "$EVENT" == "$LAST_EVENT" ]]; then
if [[ $TIME_DIFF -gt 2 ]]; then
echo "$EVENT"
else
echo "Skipping this event as it happend within 2 seconds. TimeDiff=$TIME_DIFF"
fi
else
echo $EVENT
LAST_EVENT_TIME=$EVENT_TIME
fi
LAST_EVENT=$EVENT
done < "${1:-/dev/stdin}"
In your actual script you will disable the echo in if, this one was just for demo purpose

Batch processing files in a watched directory tree (inotifywait) with a delay

There's a directory tree which is watched by inotifywait. What I want to do is trigger a script (e.g. with which I can move the files from it) with a delay (e.g. 10sec), so this script won't be triggered at any event but in a "groupped" event.
The batch script (which is part of a bigger script, sending email at the end, etc.) moves files to the corresponding directory of an another directory manage_all.sh :
#!/bin/bash
TEMPDIR="/mnt/foo/temp"
QUEUEDIR="/mnt/foo/queue"
SLOTSLEFTINQUEUE=5
for FILEPATH in $(ls -1tr $(find "$TEMPDIR" -type f -iname \*.txt) | head -n$SLOTSLEFTINQUEUE) ; do
FILESUBPATH="${FILEPATH#$TEMPDIR/}"
mv -f "$FILEPATH" "$QUEUEDIR/$FILESUBPATH"
done
This runs in cron now every 5 mins, and working great. But I want to use inotifywait, not to wait for another 5 mins.
I have tried this, but it's not good, since it triggers the manage_all.sh script with every event:
(echo start; inotifywait -mr -e close_write,moved_to,modify "/mnt/foo/temp") | while read line; do ./manage_all.sh; done
Is it possible (without rewriting the script) to "group events together" that will fire up the script only once in every 10 seconds?
Thanks

Addressable timers in Bash

I am using inotifywait to run a command when a filesystem event happens. I would like this to wait for 5 seconds to see if another filesystem event happens and if another one does, I would like to reset the timer back to five seconds and wait some more. Make sense?
My problem is I'm attacking this in Bash and I don't know how I would do this. In JavaScript, I'd use setTimeout with some code like this:
function doSomething() { ... }
var timer;
function setTimer() {
window.clearTimeout(timer)
timer = window.setTimeout(doSomething, 5000);
}
// and then I'd just plug setTimer into the inotifywait loop.
But are there addressable, clearable background timers in Bash?
One idea I've had rattling around is forking out a subshell that sleeps and then runs my desired end command, and then stuffing that in the background. If it's run again, it'll pick up the previous PID and try to nuke it.
As a safety feature, after the sleep has finished, the subshell clears $PID to avoid the command being killed mid-execution
PID=0
while inotifywait -r test/; do
[[ $PID -gt 0 ]] && kill -9 $PID
{ sleep 5; PID=0; command; } & PID=$!
done
It's a bit messy but I've tested it and it works. If I create new files in ./test/ it sees that and if $PID isn't zero, it'll kill the previous sleeping command and reset the timer.
I provide this answer to illustrate a similar but more complex use case. Note that the code provided by #Oli is included in my answer.
I want to post process a file when it has changed. Specifically I want to invoke dart-sass on a scss file to produce a css file and its map file. Then the css file is compressed.
My problem is that editing/saving the scss source file could be done directly through vim (which uses a backup copy when writing the file) or through SFTP (specifically using macOS Transmit). That means the change could be seen with inotifywait as a pair CREATE followed by CLOSE_WRITE,CLOSE or as a single CREATE (due to the RENAME cmd through SFTP I think). So I have to launch the processing if I see a CLOSE_WRITE,CLOSE or a CREATE which is not followed by something.
Remarks:
It has to handle multiple concurrent edit/save.
The temporary files used by Transmit of the form <filename>_safe_save_<digits>.scss must not be taken into account.
The version of inotify-tools is 3.20.2.2 and has been compiled from the source (no package manager) to get a recent version with the include option.
#!/usr/bin/bash
declare -A pids
# $1: full path to source file (src_file_full)
# $2: full path to target file (dst_file_full)
function launch_dart() {
echo "dart"
/opt/dart-sass/sass "$1" "$2" && /usr/bin/gzip -9 -f -k "$2"
}
inotifywait -e close_write,create --include "\.scss$" -mr assets/css |
grep -v -P '(?:\w+)_safe_save_(?:\d+)\.scss$' --line-buffered |
while read dir action file; do
src_file_full="$dir$file"
dst_dir="${dir%assets/css/}"
dst_file="${file%.scss}.css"
dst_file_full="priv/static/css/${dst_dir%/}${dst_file}"
echo "'$action' on file '$file' in directory '$dir' ('$src_file_full')"
echo "dst_dir='$dst_dir', dst_file='$dst_file', dst_file_full='$dst_file_full'"
# if [ "$action" == "DELETE" ]; then
# rm -f "$dst_file_full" "${dst_file_full}.gz" "${dst_file_full}.map"
if [ "$action" == "CREATE" ]; then
echo "create. file size: " $(stat -c%s "$src_file_full")
{ sleep 1; pids[$src_file_full]=0; launch_dart "$src_file_full" "$dst_file_full"; } & pids[$src_file_full]=$!
elif [ "$action" == "CLOSE_WRITE,CLOSE" ]; then
[[ ${pids[$src_file_full]} -gt 0 ]] && kill -9 ${pids[$src_file_full]}
launch_dart "$src_file_full" "$dst_file_full"
fi
done

Resources