Why does bash script stop working - bash

The script monitors incoming HTTP messages and forwards them to a monitoring application called zabbix, It works fine, however after about 1-2 days it stops working. Heres what I know so far:
Using pgrep i see the script is still running
the logfile file gets updated properly (first command of script)
The FIFO pipe seems to be working
The problem must be somewhere in WHILE loop or tail command.
Im new at scripting so maybe someone can spot the problem right away?
#!/bin/bash
tcpflow -p -c -i enp2s0 port 80 | grep --line-buffered -oE 'boo.php.* HTTP/1.[01]' >> /usr/local/bin/logfile &
pipe=/tmp/fifopipe
trap "rm -f $pipe" EXIT
if [[ ! -p $pipe ]]; then
mkfifo $pipe
fi
tail -n0 -F /usr/local/bin/logfile > /tmp/fifopipe &
while true
do
if read line <$pipe; then
unset sn
for ((c=1; c<=3; c++)) # c is no of max parameters x 2 + 1
do
URL="$(echo $line | awk -F'[ =&?]' '{print $'$c'}')"
if [[ "$URL" == 'sn' ]]; then
((c++))
sn="$(echo $line | awk -F'[ =&?]' '{print $'$c'}')"
fi
done
if [[ "$sn" ]]; then
hosttype="US2G_"
host=$hosttype$sn
zabbix_sender -z nuc -s $host -k serial -o $sn -vv
fi
fi
done

You're inputting from the fifo incorrectly. By writing:
while true; do read line < $pipe ....; done
you are closing and reopening the fifo on each iteration of the loop. The first time you close it, the producer to the pipe (the tail -f) gets a SIGPIPE and dies. Change the structure to:
while true; do read line; ...; done < $pipe
Note that every process inside the loop now has the potential to inadvertently read from the pipe, so you'll probably want to explicitly close stdin for each.

Related

Speed up shell script/Performance enhancement of shell script

Is there a way to speed up the below shell script? It's taking me a good 40 mins to update about 150000 files everyday. Sure, given the volume of files to create & update, this may be acceptable. I don't deny that. However, if there is a much more efficient way to write this or re-write the logic entirely, I'm open to it. Please I'm looking for some help
#!/bin/bash
DATA_FILE_SOURCE="<path_to_source_data/${1}"
DATA_FILE_DEST="<path_to_dest>"
for fname in $(ls -1 "${DATA_FILE_SOURCE}")
do
for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
do
FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
then
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
else
if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
then
sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
fi
fi
done
done
There are still parts of your published script that are unclear like the sed command. Although I rewrote it with saner practices and much less external calls witch should really speed it up.
#!/usr/bin/env sh
DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"
for fname in "$DATA_FILE_SOURCE/"*; do
while IFS=, read -r a b content || [ "$a" ]; do
destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
if grep -Fxq "$content" "$destfile"; then
sed -i "/$1/d" "$destfile"
fi
printf '%s\n' "$content" >>"$destfile"
done < "$fname"
done
Make it parallel (as much as you can).
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
# ...
for fname in "${DATA_FILE_SOURCE}/"*; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
while IFS= read -r line; do
FILE_TO_WRITE_TO="..."
# ...
done < "${fname}" & # forking here
pids[$!]="${fname}"
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Here’s a directly runnable skeleton showing how the harness above works (with 36 items to process and 20 parallel processes at most):
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
do_something_and_maybe_fail() {
sleep $((RANDOM % 10))
return $((RANDOM % 2 * 5))
}
for fname in some_name_{a..f}{0..5}.txt; do # 36 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${fname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Strictly avoid external processes (such as awk, grep and cut) when processing one-liners for each line. fork()ing is extremely inefficient in comparison to:
Running one single awk / grep / cut process on an entire input file (to preprocess all lines at once for easier processing in bash) and feeding the whole output into (e.g.) a bash loop.
Using Bash expansions instead, where feasible, e.g. "${line/,/.}" and other tricks from the EXPANSION section of the man bash page, without fork()ing any further processes.
Off-topic side notes:
ls -1 is unnecessary. First, ls won’t write multiple columns unless the output is a terminal, so a plain ls would do. Second, bash expansions are usually a cleaner and more efficient choice. (You can use nullglob to correctly handle empty directories / “no match” cases.)
Looping over the output from cat is a (less common) useless use of cat case. Feed the file into a loop in bash instead and read it line by line. (This also gives you more line format flexibility.)

netcat inside a while read loop returning immediately

I am making a menu for myself, because sometimes I need to search (Or NMAP which port).
I want to do the same as running the command in the command line.
Here is a piece of my code:
nmap $1 | grep open | while read line; do
serviceport=$(echo $line | cut -d' ' -f1 | cut -d'/' -f1);
if [ $i -eq $choice ]; then
echo "Running command: netcat $1 $serviceport";
netcat $1 $serviceport;
fi;
i=$(($i+1));
done;
It is closing immediately after it scanned everything with nmap.
Don't use FD 0 (stdin) for both your read loop and netcat. If you don't distinguish these streams, netcat can consume content emitted by the nmap | grep pipeline rather than leaving that content to be read by read.
This has a few undesirable effects: Further parts of the while/read loop don't get executed, and netcat sees a closed stdin stream and exits when the pipeline's contents are consumed (so you don't get interactive control of netcat, if that's what you're trying to accomplish). An easy way to work around this issue is to feed the output of your nmap pipeline in on a non-default file descriptor; below, I'm using FD 3.
There's a lot wrong with this code beyond the scope of the question, so please don't consider the parts I've copied-and-pasted an endorsement, but:
while read -r -u 3 line; do
serviceport=${line%% *}; serviceport=${serviceport##/*}
if [ "$i" -eq "$choice" ]; then
echo "Running command: netcat $1 $serviceport"
netcat "$1" "$serviceport"
fi
done 3< <(nmap "$1" | grep open)

How to wait till a particular line appears in a file

Is it possible to write a script that does not proceed till a given line appears in a particular file?
For example I want to do something like this:
CANARY_LINE='Server started'
FILE='/var/logs/deployment.log'
echo 'Waiting for server to start'
.... watch $FILE for $CANARY_LINE ...
echo 'Server started'
Basically, a shell script that watches a file for line (or regex).
tail -n0 -f path_to_my_log_file.log | sed '/particular_line/ q'
You can use the q flag while parsing the input via sed. Then sed will interrupt tail as soon as Server started appears in /var/logs/deployment.log.
tail -f /var/logs/deployment.log | sed '/Server started/ q'
Another way to do the same thing
( tail -f -n0 /var/logs/deployment.log & ) | grep -q "Server Started"
Previous answer (works but not as efficient than this one)
We have to be careful with loops.
For example if you want to check for a file to start an algorithm you've probably have to do something like that:
FILE_TO_CHECK="/var/logs/deployment.log"
LINE_TO_CONTAIN="Server started"
SLEEP_TIME=10
while [ $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") ]
do
sleep ${SLEEP_TIME}
done
# Start your algorithm here
But, in order to prevent an infinite loop you should add some bound:
FILE_TO_CHECK="/var/logs/deployment.log"
LINE_TO_CONTAIN="Server started"
SLEEP_TIME=10
COUNT=0
MAX=10
while [ $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") -a ${COUNT} -lt ${MAX} ]
do
sleep ${SLEEP_TIME}
COUNT=$(($COUNT + 1))
done
if [ ! $(cat FILE_TO_CHECK | grep "${LINE_TO_CONTAIN}") ]
then
echo "Let's go, the file is containing what we want"
# Start your algorithm here
else
echo "Timed out"
exit 10
fi
CANARY_LINE='Server started'
FILE='/var/logs/deployment.log'
echo 'Waiting for server to start'
grep -q $CANARY_LINE <(tail -f $FILE)
echo 'Server started'
Source: adapted from How to wait for message to appear in log in shell
Try this:
#!/bin/bash
canary_line='Server started'
file='/var/logs/deployment.log'
echo 'Waiting for server to start'
until grep -q "${canary_line}" "${file}"
do
sleep 1s
done
echo 'Server started'
Adjust sleep's parameter to your taste.
If the line in the file needs to match exactly, i.e. the whole line, change grep's second parameter to "^${canary_line}$".
If the line contains any characters that grep thinks are special, you're going to have to solve that... somehow.

FTP File Transfers Using Piping Safely

I have a file forwarding system where a bunch of files are downloaded to a directory, de-multiplexed and copied to individual machines.
The files are forwarded when they are received by the master server. And files normally arrive in bursts. (Auth by ssh keys)
This script creates the sftp session, and uses a pipe to watch the head of a fifo pipe.
HOST=$1
pipe=/tmp/pipes/${HOST%%.*}
ps aux | grep -v grep | grep sftp | grep "user#$HOST" > /dev/null
if [[ $? == 0 ]]; then
echo "FTP is Running on this Server"
exit
else
pid=`ps aux | grep -v grep | grep tail | tr -s ' ' | grep $pipe`
[[ $? == 0 ]] && kill -KILL `echo $pid | cut -f2 -d' '`
fi
if [[ ! -p $pipe ]]; then
mkfifo $pipe
fi
tail -n +1 -f $pipe | sftp -o 'ServerAliveInterval 60' user#$HOST > /dev/null &
echo cd /tmp/data >>$pipe #Sends Command to Host
echo "Started FTP to $HOST"
Update: I ended up changing the cleanup code to use "ps aux" to see if an ftp session is running, and subsequently if the tail -f is still running. Grep by user#host and the name of the pipe respectively. This is done when the script is called, and the script is called whenever I try to upload a file.
IE:
FILENAME=`basename $1`
function transfer {
echo cd /apps/data >> $2 # For Safety
echo put $1 .$FILENAME >> $2
echo rename .$FILENAME $FILENAME >> $2
echo chmod 0666 $FILENAME >> $2
}
./ftp.sh host
[ -p $pipedir/host ] && transfer $1 $pipedir/host
Files received on the master server are caught by Incron which writes a put command and the available file's location to the fifo pipe, to be sent by sftp (rename is also preformed).
My question is, is this safe? Could this crash on ftp errors/events. Not really worried about login errors.
The goal is to reduce the number of ftp logins. Single Session/Minute(or more) intervals.
And allow files to be forwarded as they're received. Dynamic Commands.
I'd prefer to use standard ubuntu libraries, if possible.
EDIT: After testing and working through some issues the server simply runs with
[[ -p $pipe ]] && echo FTP is Running on this Server
ln -s $pipe $lock &> /dev/null || (echo FTP is Running on this Server && exit)
[[ ! -p $pipe ]] && mkfifo $pipe
( tail -n +1 -F $pipe & echo $! > $pipe.pid ) | tee >
( sed "/tail:/ q" >/dev/null && kill $(cat $pipe.pid) |& rm -f $pipe >/dev/null; )
| sftp -i ~/.ssh/$HOST.rsa -oServerAliveInterval=60 user#$HOST &
rm -f $lock
Its rather simple but works nicely.
you might be intrested in setting up a more simpler(and robust) syncronization infrastructure:
if a given host is not connected when a file arrives...it never recieves it (if i understand correctly your code)
i would do something like
rsync -a -e ssh user#host:/apps/data pathToLocalDataStore
on the client machines either periodically or by event...rsync is intelligently syncronizes the files by their timestamp and size (-a contains -t)
the event would be some process termination like:
client does(configure private key usage in ~/.ssh/config for host):
#!/bin/bash
while :;do
ssh user#host /srv/bin/sleepListener 600
rsync -a -e ssh user#host:/apps/data pathToLocalDataStore
done
on the server
/srv/bin/sleepListener is a symbolic link to /bin/sleep
server after recieving new file:
killall sleepListener
note: every 10 minutes a full check is performed...if nodes go offline/online it doesn't matter...

nice way to kill piped process?

I want to process each stdout-line for a shell, the moment it is created. I want to grab the output of test.sh (a long process). My current approach is this:
./test.sh >tmp.txt &
PID=$!
tail -f tmp.txt | while read line; do
echo $line
ps ${PID} > /dev/null
if [ $? -ne 0 ]; then
echo "exiting.."
fi
done;
But unfortunately, this will print "exiting" and then wait, as the tail -f is still running. I tried both break and exit
I run this on FreeBSD, so I cannot use the --pid= option of some linux tails.
I can use ps and grep to get the pid of the tail and kill it, but thats seems very ugly to me.
Any hints?
why do you need the tail process?
Could you instead do something along the lines of
./test.sh | while read line; do
# process $line
done
or, if you want to keep the output in tmp.txt :
./test.sh | tee tmp.txt | while read line; do
# process $line
done
If you still want to use an intermediate tail -f process, maybe you could use a named pipe (fifo) instead of a regular pipe, to allow detaching the tail process and getting its pid:
./test.sh >tmp.txt &
PID=$!
mkfifo tmp.fifo
tail -f tmp.txt >tmp.fifo &
PID_OF_TAIL=$!
while read line; do
# process $line
kill -0 ${PID} >/dev/null || kill ${PID_OF_TAIL}
done <tmp.fifo
rm tmp.fifo
I should however mention that such a solution presents several heavy problems of race conditions :
the PID of test.sh could be reused by another process;
if the test.sh process is still alive when you read the last line, you won't have any other occasion to detect its death afterwards and your loop will hang.

Resources