I am try to make a system backup script with trap "" ERR. I realized the trap doesn't get called when commands are part of pipes |.
Heres are some parts of my code that don't work with trap "" ERR ...
OpenFiles=$(lsof "$Source" | wc -l)
PackagesList=$(dpkg --get-selections | awk '!/deinstall|purge|hold/ {print $1}' | tee "$FilePackagesList")
How can I get this to work without using if [ "$?" -eq 0 ]; then, or similar coding ? Because this is the reason I declared a trap this way.
Here is the script ...
root#Lian-Li:~# cat /usr/local/bin/create_incremental_backup_of_system.sh
#!/bin/bash
# Create an incremental GNU-standard backup of important system-files.
# This script works with Debian Jessie and newer systems.
# Created for my lian-li NAS 2016-11-27.
MailTo="admin#example.com" # Mail Address of an admin
Source="boot etc root usr/local usr/lib/cgi-bin var/www"
BackupDirectory=/media/hdd1/backups/lian-li
SubDir="system.d"
FileTimeStamp=$(date "+%Y%m%d%H%M%S")
FileName=$(uname -n)
File="${BackupDirectory}/${SubDir}/${FileName}-${FileTimeStamp}.tgz"
FileIncremental="${BackupDirectory}/${SubDir}/${FileName}.gtar"
FilePackagesList="${BackupDirectory}/${SubDir}/installed_packages_on_${FileName}.txt"
# have2do ...
# Backup rotate
MailContent="None"
TimeStamp=$(date "+%F %T") # This format "2011-12-31 23:59:59" is needed to read the journal
exec 1> >(logger -i -s -t "$0" -p 3) 2>&1 # all error messages are redirected to syslog journal and after that to stdout
trap "BriefExit" ERR # Provide information for an admin (via sendmail) when an error occurred and exit the script
function BriefExit(){
rm -f "$File"
if [ "$MailContent" = "None" ]
then
case "$LANG" in
de_DE.UTF-8)
echo "Beende Skript, aufgrund vorherige Fehler." 1>&2
;;
*)
echo "Stopping script because of previous error(s)." 1>&2
;;
esac
MailContent=$(journalctl -p 3 -o "short" --since="$TimeStamp" --no-pager)
ScriptName="${0##*/}"
SystemName=$(uname -n)
MailSubject="${SystemName}: ${ScriptName}"
echo -e "Subject: ${MailSubject}\n\n${MailContent}\n" | sendmail "$MailTo"
fi
exit 1
}
if [ ! -d "${BackupDirectory}/${SubDir}" ]
then
mkdir -p "${BackupDirectory}/${SubDir}"
fi
LoopCount=0
OpenFiles=1
cd /
while [ "$OpenFiles" -ne 0 ]
do
if [ "$LoopCount" -le 180 ]
then
sleep 1
OpenFiles=$(lsof $Source | wc -l)
LoopCount=$(($LoopCount + 1))
else
echo "Closing Script. Reason: Can't create incremental backup, because some files are open." 1>&2
BriefExit
fi
done
tar -cpzf "$File" -g "$FileIncremental" $Source
chmod 0700 "$File"
PackagesList=$(dpkg --get-selections | awk '!/deinstall|purge|hold/ {print $1}' | tee "$FilePackagesList")
while read -r PackageName
do
case "$PackageName" in
minidlna)
# Code ...
;;
slapd)
# Code ...
;;
esac
done <<< "$PackagesList"
exit 0
This isn't a problem with ERR traps at all, or with command substitutions, but with pipelines.
false | true
returns true, unless the pipefail option is set.
Thus in OpenFiles=$(lsof "$Source" | wc -l), only a failure in wc will cause the pipeline to be considered a failure, or in PackagesList=$(dpkg --get-selections | awk '!/deinstall|purge|hold/ {print $1}' | tee "$FilePackagesList"), only a failure in tee will cause the command as a whole to be considered failed.
Put the command set -o pipefail at the top of your script if you want a failure from any pipeline component (as opposed to the last component alone) to cause the command as a whole to be considered failed -- and note the other caveats for ERR traps given in BashFAQ #105.
Another alternative is to look at the status for each stage in the pipeline:
# cat test_bash_return.bash
true | true | false | true
echo "${PIPESTATUS[#]}"
# ./test_bash_return.bash
0 0 1 0
Related
I'm trying to capture a live tcpdump from a custom designed linux system. The command I'm using so far is:
plink.exe -ssh user#IP -pw PW "shell nstcpdump.sh -s0 -U -w - not port 22 and not host 127.0.0.1" | "C:\Program Files\Wireshark\wireshark" -i -
This will fail since when executing the command on the remote system, this (custom) shell will output "Done" before sending data. I tried to find out a way to remove the "Done" message from the shell but doesn't appear to be any.
So I came up with this (added findstr -V):
plink.exe -ssh user#IP -pw PW "shell nstcpdump.sh -s0 -U -w - not port 22 and not host 127.0.0.1" | findstr -V "Done" | "C:\Program Files\Wireshark\wireshark" -i -
This works more or less fine since I will get some errors and the live capture will stop. I believe it might have something to do with the buffers, but I'm not sure.
Does anyone know of any other method of removing/bypassing the first bytes/chars of the output from plink/remote shell?
[edit]
as asked, nstcpdump.sh is a wrapper for tcpdump. as mentioned before, this system is highly customized. nstcpdump.sh code:
root#hostname# cat /netscaler/nstcpdump.sh
#!/bin/sh
# piping the packet trace to the tcpdump
#
# FILE: $Id: //depot/main/rs_120_56_14_RTM/usr.src/netscaler/scripts/nstcpdump.sh#1 $
# LAST CHECKIN: $Author: build $
# $DateTime: 2017/11/30 02:14:38 $
#
#
# Options: any TCPDUMP options
#
TCPDUMP_PIPE=/var/tmp/tcpdump_pipe
NETSCALER=${NETSCALER:-`cat /var/run/.NETSCALER`}
[ -r ${NETSCALER}/netscaler.conf ] && . ${NETSCALER}/netscaler.conf
TIME=${TIME:-3600}
MODE=${MODE:-6}
STARTCMD="start nstrace -size 0 -traceformat PCAP -merge ONTHEFLY -filetype PIPE -skipLocalSSH ENABLED"
STOPCMD="stop nstrace "
SHOWCMD="show nstrace "
NSCLI_FILE_EXEC=/netscaler/nscli
NSTRACE_OUT_FILE=/tmp/nstrace.out
NS_STARTTRACE_PIDFILE=/tmp/nstcpdump.pid
TRACESTATE=$(nsapimgr -d allvariables | grep tracestate | awk '{ print $2}')
trap nstcpdump_exit 1 2 15
nstcpdump_init()
{
echo "##### WARNING #####"
echo "This command has been deprecated."
echo "Please use 'start nstrace...' command from CLI to capture nstrace."
echo "trace will now start with all default options"
echo "###################"
if [ ! -d $NSTRACE_DIR ]
then
echo "$NSTRACE_DIR directory doesn't exist."
echo "Possible reason: partition is not mounted."
echo "Check partitions using mount program and try again."
exit 1
fi
if [ ! -x $NSCLI_FILE_EXEC ]
then
echo "$NSCLI_FILE_EXEC binary doesn't exist"
exit 1
fi
if [ -e $NSTRACE_OUT_FILE ]
then
rm $NSTRACE_OUT_FILE
echo "" >> $NSTRACE_OUT_FILE
fi
}
nstcpdump_start_petrace()
{
sleep 0.5;
$NSCLI_FILE_EXEC -U %%:.:. $STARTCMD >/tmp/nstcpdump.sh.out
rm -f ${NS_STARTTRACE_PIDFILE}
}
nstcpdump_start()
{
# exit if trace is already running
if [ $TRACESTATE -ne 0 ]
then
echo "Error: one instance of nstrace is already running"
exit 2
fi
nstcpdump_start_petrace &
echo $! > ${NS_STARTTRACE_PIDFILE}
tcpdump -n -r - $TCPDUMPOPTIONS < ${TCPDUMP_PIPE}
nstcpdump_exit
exit 1
}
nstcpdump_exit()
{
if [ -f ${NS_STARTTRACE_PIDFILE} ]
then
kill `cat ${NS_STARTTRACE_PIDFILE}`
rm ${NS_STARTTRACE_PIDFILE}
fi
$NSCLI_FILE_EXEC -U %%:.:. $STOPCMD >> /dev/null
exit 1
}
nstcpdump_usage()
{
echo `basename $0`: utility to view/save/sniff LIVE packet capture on NETSCALER box
tcpdump -h
echo
echo NOTE: tcpdump options -i, -r and -F are NOT SUPPORTED by this utility
exit 0
}
########################################################################
while [ $# -gt 0 ]
do
case "$1" in
-h )
nstcpdump_usage
;;
-i )
nstcpdump_usage
;;
-r )
nstcpdump_usage
;;
-F )
nstcpdump_usage
;;
esac
break;
done
TCPDUMPOPTIONS="$#"
check_ns nstcpdump
#nstcpdump_init
#set -e
if [ ! -e ${TCPDUMP_PIPE} ]
then
mkfifo $TCPDUMP_PIPE
if [ $? -ne 0 ]
then
echo "Failed creating pipe [$TCPDUMP_PIPE]"
exit 1;
fi
fi
nstcpdump_start
Regards
I saw an interesting 'read' loop in an init.d script for CentOS that can be boiled down to essentially this structure:
cat "somefile" | while read var1 var2; do
# do something with vars 1 and 2
done 3<&1
I experimentally took off the "3<&1" redirect and nothing changed in the execution or behavior... What does the final redirect "3<&1" achieve, and why is it done specifically at the end of the loop?
Below you'll find full init script, it's for Gazzang's zNcrypt service that handles key management for encrypted filesystems. The part I'm interested in occur towards the end of the 'start' and 'stop' cases.
#! /bin/sh
#
# zncrypt This script mount and umount all zncrypt directories
#
# chkconfig: - 64 36
# description: zNcrypt start script.
. /etc/rc.d/init.d/functions
if [ -r /usr/lib/zncrypt/zncrypt.functions ]; then
. /usr/lib/zncrypt/zncrypt.functions
else
echo "/usr/lib/zncrypt/zncrypt.functions: File does not exist."
exit 0
fi
ZNCRYPT_LOG_DIR="/var/log/zncrypt"
ZNCRYPT_LOG_ACCESS_FILE=$ZNCRYPT_LOG_DIR"/access.log"
# create zncrypt log directory
mkdir -p "$ZNCRYPT_LOG_DIR"
# create access log file for the kernel module
touch "$ZNCRYPT_LOG_ACCESS_FILE"
case "$1" in
start)
echo "Starting zNcrypt directories"
egrep -v "^[[:space:]]*(#|$)" "$ZTABFILE" | while read mnt src type opts; do
if ! df "$mnt" | grep "$mnt$" >/dev/null; then
action $" * Mounting $src ... " do_mount "$src" "$mnt" "$type" "$opts" < /dev/tty
fi
done 3<&1
;;
stop)
echo "Stopping zNcrypt directories"
egrep -v "^[[:space:]]*(#|$)" "$ZTABFILE" | while read mnt src type opts; do
if df "$mnt" | grep "$mnt$" >/dev/null; then
action $" * Umounting $src ... " do_umount "$mnt"
fi
done 3<&1
if /sbin/lsmod | grep ^zncryptfs &>/dev/null; then
action $" * Unloading module ... " /sbin/rmmod zncryptfs 2>/dev/null && rm /dev/zncrypt 2>/dev/null
fi
;;
status)
show_status
;;
restart)
/bin/bash $0 stop
sleep 1
/bin/bash $0 start
;;
reload|force-reload)
;;
force-start)
;;
*)
echo "Usage: `basename $0` {start|stop|status|restart}"
exit 1
;;
esac
3<&1 tells the bash shell to redirect anything coming from stdout (file descriptor 1) to file descriptor 3. File descriptor 3 would correspond to some file or device opened in the context of the cat/while construct. See this article on standard file descriptors. Also see this related post.
I'm not used to writing code in bash but I'm self teaching myself. I'm trying to create a script that will query info from the process list. I've done that but I want to take it further and make it so:
The script runs with one set of commands if A OS is present.
The script runs with a different set of commands if B OS is present.
Here's what I have so far. It works on my Centos distro but won't work on my Ubuntu. Any help is greatly appreciated.
#!/bin/bash
pid=$(ps -eo pmem,pid | sort -nr -k 1 | cut -d " " -f 2 | head -1)
howmany=$(lsof -l -n -p $pid | wc -l)
nameofprocess=$(ps -eo pmem,fname | sort -nr -k 1 | cut -d " " -f 2 | head -1)
percent=$(ps -eo pmem,pid,fname | sort -k 1 -nr | head -1 | cut -d " " -f 1)
lsof -l -n -p $pid > ~/`date "+%Y-%m-%d-%H%M"`.process.log 2>&1
echo " "
echo "$nameofprocess has $howmany files open, and is using $percent"%" of memory."
echo "-----------------------------------"
echo "A log has been created in your home directory"
echo "-----------------------------------"
echo " "
echo ""$USER", do you want to terminate? (y/n)"
read yn
case $yn in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
Here's my version of your script. It works with Ubuntu and Debian. It's probably safer than yours in some regards (I clearly had a bug in yours when a process takes more than 10% of memory, due to your awkward cut). Moreover, your ps are not "atomic", so things can change between different calls of ps.
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
mapfile -t openfiles < <(lsof -l -n -p $pid)
howmany=${#openfiles[#]}
printf '%s\n' "${openfiles[#]}" > ~/$(date "+%Y-%m-%d-%H%M.process.log")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
First, check that your version of ps has the --sort flag and the h option:
--sort=-pmem tells ps to sort wrt decreasing pmem
h tells ps to not show any header
All this is given to the read bash builtin, which reads space-separated fields, here the fields pmem, pid, fname and puts these values in the corresponding variables percent, pid and nameofprocess.
The mapfile command reads standard input (here the output of the lsof command) and puts each line in an array field. The size of this array is computed by the line howmany=${#openfiles[#]}. The output of lsof, as stored in the array openfiles is output to the corresponing file.
Then, instead of the many echos, we use a cat <<EOF, and then the read is use with the -p (prompt) option.
I don't know if this really answers your question, but at least, you have a well-written bash script, with less multiple useless command calls (until your case statement, you called 16 processes, I only called 4). Moreover, after the first ps call, things can change in your script (even though it's very unlikely to happen), not in mine.
You might also like the following which doesn't put the output of lsof in an array, but uses an extra wc command:
#!/bin/bash
read percent pid nameofprocess < <(ps -eo pmem,pid,fname --sort=-pmem h)
logfilename="~/$(date "+%Y-%m-%d-%H%M.process.log")
lsof -l -n -p $pid > "$logfilename"
howmany=$(wc -l < "$logfilename")
cat <<EOF
$nameofprocess has $howmany files open, and is using $percent% of memory.
-----------------------------------
A log has been created in your home directory ($logfilename)
-----------------------------------
EOF
read -p "$USER, do you want to terminate? (y/n) "
case $REPLY in
[yY] | [yY][Ee][Ss] )
kill -15 $pid
;;
[nN] | [n|N][O|o] )
echo "Not killing. Powering down."
echo "......."
sleep 2
;;
*) echo "Does not compute"
;;
esac
You could achieve this for example by (update)
#!/bin/bash
# place distribution independent code here
# dist=$(lsb_release -is)
if [[ -f /etc/redheat-release ]];
then # this is a RedHead based distribution like centos, fedora, ...
dist="redhead"
elif [[ -f /etc/issue.net ]];
then
# dist=$(cat /etc/issue.net | cut -d' ' -f1) # debian, ubuntu, ...
dist="ubuntu"
else
dist="unknown"
fi
if [[ $dist == "ubuntu" ]];
then
# use your ubuntu command set
elif [[ $dist == "redhead" ]];
then
# use your centos command set
else
# do some magic here
fi
# place distribution independent code here
I have a script that I only want to be running one time. If the script gets called a second time I'm having it check to see if a lockfile exists. If the lockfile exists then I want to see if the process is actually running.
I've been messing around with pgrep but am not getting the expected results:
#!/bin/bash
COUNT=$(pgrep $(basename $0) | wc -l)
PSTREE=$(pgrep $(basename $0) ; pstree -p $$)
echo "###"
echo $COUNT
echo $PSTREE
echo "###"
echo "$(basename $0) :" `pgrep -d, $(basename $0)`
echo sleeping.....
sleep 10
The results I'm getting are:
$ ./test.sh
###
2
2581 2587 test.sh(2581)---test.sh(2587)---pstree(2591)
###
test.sh : 2581
sleeping.....
I don't understand why I'm getting a "2" when only one process is actually running.
Any ideas? I'm sure it's the way I'm calling it. I've tried a number of different combinations and can't quite seem to figure it out.
SOLUTION:
What I ended up doing was doing this (portion of my script):
function check_lockfile {
# Check for previous lockfiles
if [ -e $LOCKFILE ]
then
echo "Lockfile $LOCKFILE already exists. Checking to see if process is actually running...." >> $LOGFILE 2>&1
# is it running?
if [ $(ps -elf | grep $(cat $LOCKFILE) | grep $(basename $0) | wc -l) -gt 0 ]
then
abort "ERROR! - Process is already running at PID: $(cat $LOCKFILE). Exitting..."
else
echo "Process is not running. Removing $LOCKFILE" >> $LOGFILE 2>&1
rm -f $LOCKFILE
fi
else
echo "Lockfile $LOCKFILE does not exist." >> $LOGFILE 2>&1
fi
}
function create_lockfile {
# Check for previous lockfile
check_lockfile
#Create lockfile with the contents of the PID
echo "Creating lockfile with PID:" $$ >> $LOGFILE 2>&1
echo -n $$ > $LOCKFILE
echo "" >> $LOGFILE 2>&1
}
# Acquire lock file
create_lockfile >> $LOGFILE 2>&1 \
|| echo "ERROR! - Failed to acquire lock!"
The argument for pgrep is an extended regular expression pattern.
In you case the command pgrep $(basename $0) will evaluate to pgrep test.sh which will match match any process that has test followed by any character and lastly followed by sh. So it wil match btest8sh, atest_shell etc.
You should create a lock file. If the lock file exists program should exit.
lock=$(basename $0).lock
if [ -e $lock ]
then
echo Process is already running with PID=`cat $lock`
exit
else
echo $$ > $lock
fi
You are already opening a lock file. Use it to make your life easier.
Write the process id to the lock file. When you see the lock file exists, read it to see what process id it is supposedly locking, and check to see if that process is still running.
Then in version 2, you can also write program name, program arguments, program start time, etc. to guard against the case where a new process starts with the same process id.
Put this near the top of your script...
pid=$$
script=$(basename $0)
guard="/tmp/$script-$(id -nu).pid"
if test -f $guard ; then
echo >&2 "ERROR: Script already runs... own PID=$pid"
ps auxw | grep $script | grep -v grep >&2
exit 1
fi
trap "rm -f $guard" EXIT
echo $pid >$guard
And yes, there IS a small window for a race condition between the test and echo commands, which can be fixed by appending to the guard file, and then checking that the first line is indeed our own PID. Also, the diagnostic output in the if can be commented out in a production version.
I've written (well, remixed to arrive at) this Bash script
# pkill.sh
trap onexit 1 2 3 15 ERR
function onexit() {
local exit_status=${1:-$?}
echo Problem killing $kill_this
exit $exit_status
}
export kill_this=$1
for X in `ps acx | grep -i $1 | awk {'print $1'}`; do
kill $X;
done
it works fine but any errors are shown to the display. I only want the echo Problem killing... to show in case of error. How can I "catch" (hide) the error when executing the kill statement?
Disclaimer: Sorry for the long example, but when I make them shorter I inevitably have to explain "what I'm trying to do."
# pkill.sh
trap onexit 1 2 3 15 ERR
function onexit() {
local exit_status=${1:-$?}
echo Problem killing $kill_this
exit $exit_status
}
export kill_this=$1
for X in `ps acx | grep -i $1 | awk {'print $1'}`; do
kill $X 2>/dev/null
if [ $? -ne 0 ]
then
onexit $?
fi
done
You can redirect stderr and stdout to /dev/null via something like pkill.sh > /dev/null 2>&1. If you only want to suppress the output from the kill command, only apply it to that line, e.g., kill $X > /dev/null 2>&1;
What this does is take send the standard output (stdout) from kill $X to /dev/null (that's the > /dev/null), and additionally send stderr (the 2) into stdout (the 1).
For my own notes, here's my new code using Paul Creasey's answer:
# pkill.sh: this is dangerous and should not be run as root!
trap onexit 1 2 3 15 ERR
#--- onexit() -----------------------------------------------------
# #param $1 integer (optional) Exit status. If not set, use `$?'
function onexit() {
local exit_status=${1:-$?}
echo Problem killing $kill_this
exit $exit_status
}
export kill_this=$1
for X in `ps acx | grep -i "$1" | awk {'print $1'}`; do
kill $X 2>/dev/null
done
Thanks all!