monitoring and searching a file with inotify, and command line tools - bash

Log files is written line by line by underwater drones on a server. TWhen at surface, the drones speak slowly to the server (say ~200o/s on a phone line which is not stable) and only from time to time (say every ~6h). Depending on the messages, I have to execute commands on the server while the drones are online and when they hang up other commands. Other processes may be looking at the same files with similar tasks.
A lot can be found on this website on somewhat similar problems but the solution I have built on is still unsatisfactory. Presently I'm doing this with bash
while logfile_drone=`inotifywait -e create --format '%f' log_directory`; do
logfile=log_directory/${logfile_drone}
while action=`inotifywait -q -t 120 -e modify -e close --format '%e' ${logfile} ` ; do
exidCode=$?
lastLine=$( tail -n2 ${logFile} | head -n1 ) # because with tail -n1 I can got only part of the line. this happens quite often
match =$( # match set to true if lastLine matches some pattern )
if [[ $action == 'MODIFY' ]] && $match ; then # do something ; fi
if [[ $( echo $action | cut -c1-5 ) == 'CLOSE' ]] ; then
# do something
break
fi
if [[ $exitCode -eq 2 ]] ; then break ; fi
done
# do something after the drone has hang up
done # wait for a new call from the same or another drone
The main problems are :
the second inotify misses lines, may be because of the other processes looking at the same file.
the way I catch the time out doesn't seem to work.
I can't monitor 2 drones simultaneously.
Basically the code works more or less but isn't very robust. I wonder if problem 3 can be managed by putting the second while loop in a function which is put in background when called. Finally, I wonder if a higher level language (I'm familiar with php which has a PECL extension for inotify) would not do this much better. However, I imagine that php will not solve problem 3 better than than bash.
Here is the code where I'm facing the problem of abrupt exit from the while loop, implemented according to Philippe's answer, which works fine otherwise:
while read -r action ; do
...
resume=$( grep -e 'RESUMING MISSION' <<< $lastLine )
if [ -n "$resume" ] ; then
ssh user#another_server "/usr/bin/php /path_to_matlab_command/matlabCmd.php --drone=${vehicle}" &
fi
if [ $( echo $action | cut -c1-5 ) == 'CLOSE' ] ; then ... ; sigKill=true ; fi
...
if $sigKill ; then break; fi
done < <(inotifywait -q -m -e modify -e close_write --format '%e' ${logFile})
When I comment the line with ssh the script can exit properly with a break triggered by CLOSE, otherwise the while loop finishes abruptly after the ssh command. The ssh is put in background because the matlab code runs for long time.

monitor mode (-m) of inotifywait may serve better here :
inotifywait -m -q -e create -e modify -e close log_directory |\
while read -r dir action file; do
...
done
monitor mode (-m) does not buffer, it just print all events to standard output.
To preserve the variables :
while read -r dir action file; do
echo $dir $action $file
done < <(inotifywait -m -q -e create -e modify -e close log_directory)
echo "End of script"

Related

Speed up shell script/Performance enhancement of shell script

Is there a way to speed up the below shell script? It's taking me a good 40 mins to update about 150000 files everyday. Sure, given the volume of files to create & update, this may be acceptable. I don't deny that. However, if there is a much more efficient way to write this or re-write the logic entirely, I'm open to it. Please I'm looking for some help
#!/bin/bash
DATA_FILE_SOURCE="<path_to_source_data/${1}"
DATA_FILE_DEST="<path_to_dest>"
for fname in $(ls -1 "${DATA_FILE_SOURCE}")
do
for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
do
FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
then
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
else
if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
then
sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
fi
fi
done
done
There are still parts of your published script that are unclear like the sed command. Although I rewrote it with saner practices and much less external calls witch should really speed it up.
#!/usr/bin/env sh
DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"
for fname in "$DATA_FILE_SOURCE/"*; do
while IFS=, read -r a b content || [ "$a" ]; do
destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
if grep -Fxq "$content" "$destfile"; then
sed -i "/$1/d" "$destfile"
fi
printf '%s\n' "$content" >>"$destfile"
done < "$fname"
done
Make it parallel (as much as you can).
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
# ...
for fname in "${DATA_FILE_SOURCE}/"*; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
while IFS= read -r line; do
FILE_TO_WRITE_TO="..."
# ...
done < "${fname}" & # forking here
pids[$!]="${fname}"
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Here’s a directly runnable skeleton showing how the harness above works (with 36 items to process and 20 parallel processes at most):
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
do_something_and_maybe_fail() {
sleep $((RANDOM % 10))
return $((RANDOM % 2 * 5))
}
for fname in some_name_{a..f}{0..5}.txt; do # 36 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${fname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Strictly avoid external processes (such as awk, grep and cut) when processing one-liners for each line. fork()ing is extremely inefficient in comparison to:
Running one single awk / grep / cut process on an entire input file (to preprocess all lines at once for easier processing in bash) and feeding the whole output into (e.g.) a bash loop.
Using Bash expansions instead, where feasible, e.g. "${line/,/.}" and other tricks from the EXPANSION section of the man bash page, without fork()ing any further processes.
Off-topic side notes:
ls -1 is unnecessary. First, ls won’t write multiple columns unless the output is a terminal, so a plain ls would do. Second, bash expansions are usually a cleaner and more efficient choice. (You can use nullglob to correctly handle empty directories / “no match” cases.)
Looping over the output from cat is a (less common) useless use of cat case. Feed the file into a loop in bash instead and read it line by line. (This also gives you more line format flexibility.)

Display output of command in terminal while using command substitution

So I'm trying to check for the output of a command, but I also want to be able display the output directly in the terminal.
#!/bin/bash
while :
do
OUT=$(streamlink -o "$NAME" "$STREAM" best)
echo "$OUT"
if [[ $OUT == *"No playable streams"* ]]; then
echo "Delaying!"
sleep 15s
fi
done
This is what I tried to do.
The code checks if the output of a command contains that error substring, if so it'd add a delay. It works well on that part.
But it doesn't work well when the command is actually successfully downloading a file as it won't perform that echo until it is finished with the download (which would take hours). So until then I have no way of personally checking the output of the command
Plus the output of this particular command displays and updates the speed and filesize in real-time, something echo wouldn't be able to replicate.
So is there a way to be able to display the output of a command in real-time, while also command substituting them in order to check the output for substrings after the command is finished?
Use a temporary file:
TEMP=$(mktemp) || exit 1
while true
do
streamlink -o "$NAME" "$STREAM" best |& tee "$TEMP"
OUT=$( cat "$TEMP" )
#echo "$OUT" # not longer needed
if [[ $OUT == *"No playable streams"* ]]; then
echo "Delaying!"
sleep 15s
fi
done
# not really needed here because of endless loop
rm -f "$TEMP"

Creating a for loop in a trap doesn't work in Shell script

I have been trying to create a trap in a script to basically create some logs of a script that has been running in the background.
Whenever I introduce a for loop in the trap, the script stops doing what it is supposed to do:
trap 'terminate' 10
...
write_log(){
local target=$1
local file="/tmp/"$target"_log.txt"
local lines=$(cat /tmp/"$target"_log.txt | wc -l)
printf "Log for $target\n" >> "log.txt" # This line is printed
for ((i=1;i<=$lines;i++)); # Nothing in this loop happens
do
local start_date=$(date -d "$(sed -n ""$i"p") $file | cut -f1")
local end_date=$(date -d "$sed -n ""$i"p") $file | cut -f2")
printf "Logged in $start_date, logged out $end_date" > "log.txt"
done
}
terminate(){
for target
do
echo "In the for loop!"
end_session "$target"
write_log "$target"
done
exit 0
}
When I run my script in the background and kill it with
kill -10 (process_id)
the script stops, and starts doing the cleanup, until the point where it finds a for loop. When I remove the for loop in terminate() and instead do individual calls to end_session() and write_log(), end_session() works just fine, and write_log() works fine--until it reaches the for loop.
I am probably missing something basic, but I have looked at this for a while now and can't seem to figure out what is happening. Is there any limitation to for loops in traps?
No arguments are passed to terminate when it is invoked by the trap, so its loop executes zero times (because for target; do …; done is a shorthand for for target in "$#"; do …; done, and in a function, "$#" is the list of arguments to the function, not to the shell script as a whole).
If that's not what you want to have happen, you have to arrange to pass the relevant arguments to terminate in the trap. For example, you could pass all the arguments to the script via a global array:
args=( "$#" )
and inside terminate:
for target in "${args[#]}"
However, what's best depends on what you want to achieve.
The function is hanging because the parentheses are messed up in the date commands. Try this:
local start_date=$(date -d "$(sed -n ${i}p "$file" | cut -f1)")
local end_date=$(date -d "$(sed -n ${i}p "$file" | cut -f2)")

How to wait till a particular line appears in a file

Is it possible to write a script that does not proceed till a given line appears in a particular file?
For example I want to do something like this:
CANARY_LINE='Server started'
FILE='/var/logs/deployment.log'
echo 'Waiting for server to start'
.... watch $FILE for $CANARY_LINE ...
echo 'Server started'
Basically, a shell script that watches a file for line (or regex).
tail -n0 -f path_to_my_log_file.log | sed '/particular_line/ q'
You can use the q flag while parsing the input via sed. Then sed will interrupt tail as soon as Server started appears in /var/logs/deployment.log.
tail -f /var/logs/deployment.log | sed '/Server started/ q'
Another way to do the same thing
( tail -f -n0 /var/logs/deployment.log & ) | grep -q "Server Started"
Previous answer (works but not as efficient than this one)
We have to be careful with loops.
For example if you want to check for a file to start an algorithm you've probably have to do something like that:
FILE_TO_CHECK="/var/logs/deployment.log"
LINE_TO_CONTAIN="Server started"
SLEEP_TIME=10
while [ $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") ]
do
sleep ${SLEEP_TIME}
done
# Start your algorithm here
But, in order to prevent an infinite loop you should add some bound:
FILE_TO_CHECK="/var/logs/deployment.log"
LINE_TO_CONTAIN="Server started"
SLEEP_TIME=10
COUNT=0
MAX=10
while [ $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") -a ${COUNT} -lt ${MAX} ]
do
sleep ${SLEEP_TIME}
COUNT=$(($COUNT + 1))
done
if [ ! $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") ]
then
echo "Let's go, the file is containing what we want"
# Start your algorithm here
else
echo "Timed out"
exit 10
fi
CANARY_LINE='Server started'
FILE='/var/logs/deployment.log'
echo 'Waiting for server to start'
grep -q $CANARY_LINE <(tail -f $FILE)
echo 'Server started'
Source: adapted from How to wait for message to appear in log in shell
Try this:
#!/bin/bash
canary_line='Server started'
file='/var/logs/deployment.log'
echo 'Waiting for server to start'
until grep -q "${canary_line}" "${file}"
do
sleep 1s
done
echo 'Server started'
Adjust sleep's parameter to your taste.
If the line in the file needs to match exactly, i.e. the whole line, change grep's second parameter to "^${canary_line}$".
If the line contains any characters that grep thinks are special, you're going to have to solve that... somehow.

Improvements to this bash script to simulate "tail --follow"

I need to remote tail log files such that the tailing continues working even when the file is rolled over.
My attempts to do so, started by directly using the tail command over ssh:
ssh root#some-remote-host tail -1000f /some-directory/application.log | tee /c/local-directory/applicaiton.log
That allows me to filter through /c/local-directory/applicaiton.log locally using Otroslogviewer (which was the original purpose of trying to tail the log file locally).
The above stops tailing when the remote log file is rolled over (which happens at every 10MB). I do not have the access required to change the roll over settings.
Unfortunately, none of the tail versions on the remote OS (Solaris) support a --follow (or -f) option which can handle file rollovers, so I had to write the following tailer.sh script to simulate that option:
<!-- language: lang-bash -->
#!/bin/bash
function startTail {
echo "Path: $1"
tail -199999f "$1" 2>/dev/null & #Try to tail the file
while [[ -f $1 ]]; do sleep 1; done #Check every second if the file still exists
echo "***** File Not Available as of: $(date)" #If not then log a message and,
kill "$!" 2>/dev/null #Kill the tail process, then
echo "Waiting for file to appear" #Wait for the file to re-appear
while [ ! -f "$1" ]
do
echo -ne "." #Show progress bar
sleep 1
done
echo -ne '\n' #Advance to next line #File has appeared again
echo "Starting Tail again"
startTail "$1"
}
startTail "$1"
I am relatively happy with the above script. However, it suffers from one issue stemming from the limitation of the sleep command on the remote OS. It can only accept whole numbers, so sleep 1 is the smallest amount of time I can wait before checking for the existence of the file again. That time period is enough to detect a file rollover sometimes, but fails enough number of times to be a problem I want to fix.
The only other way I can think of is to implement a file-rollover check by checking for the file size. So, check for the filesize every one second, if it's less than the previously recorded size then the file was rolled over. Then, re-start the tail.
I checked for other more reliable alternatives like inotifywait, inotify but they are not available on the remote server(s) and I do not have the access to install them.
Can you think of any other way to detect a file rollover with a bash script?
Edit: Based on Hema's answer below, the modified (working!) script is as follows:
#!/bin/bash
function startTail {
echo "Path: $1"
tail -199999f "$1" 2>/dev/null & #Try to tail the file
#Check every second if the file still exists
while [[ -f $1 ]]
do
perl -MTime::HiRes -e "Time::HiRes::sleep(0.1)"
done
echo "***** File Not Available as of: $(date)" #If not then log a message and,
kill $! 2>/dev/null #Kill the tail process, then
echo "Waiting for file to appear" #Wait for the file to re-appear
while [ ! -f $1 ]
do
echo -ne "." #Show progress bar
sleep 1
done
echo -ne '\n' #Advance to next line #File has appeared again
echo "Starting Tail again"
startTail "$1"
}
startTail "$1"
For sleeping in microseconds, you can use
perl -MTime::HiRes -e "Time::HiRes::usleep(1)" ;
perl -MTime::HiRes -e "Time::HiRes::sleep(0.001)" ;
Unfortunately, none of the tail versions on the remote OS (Solaris)
support the --follow option
That's a little harsh.
Just use -f (rather than --follow) on both Solaris and Linux. On Linux you can use --follow as a synonym for -f. On Solaris you can't.
But anyway, to be more precise: you want a follow option that handles rollovers. GNU tail (i.e. Linux) has that natively by the way of the -F (capital F) option. Solaris doesn't. The GNU tail -F option can handle that the file is rolled over as long as it keeps the name name. In other words on Solaris you would have to use gtail command to force the use of GNU tail.
If you are a prudent Solaris site then such GNU tool would just be there, without you having to worry about it. You shouldn't accept a Solaris install from your SysAdmin where he/she has deliberately neglected to make sure the basic GNU tools are there. On Solaris 11 (as an example) he really has to go out of his way to make that happen.
You would make your script OS independent by the well known method:
TAILCMD="tail"
# We need GNU tail, not any other implementation of 'tail'
if [ "$(uname -s)" == "SunOS" ]; then
TAILCMD="gtail"
fi
$TAILCMD -F myfile.log

Resources