Building a killer script in bash - bash

I've been trying to learn the syntax of logic statements in bash, how to do if/else, pipes and stuff. I'm trying to build a bash script, but I fail miserably after 3 hours of not getting how this stuff works.
Now I need this little script, I'll try to explain it using a generalized code, or call it whatever you want. Here you go:
while variable THRESHOLD = 10
{
if netstat -anltp contains a line with port 25565
then set variable THRESHOLD to 0 and variable PROCNUM to the process number,
else add 1 to variable THRESHOLD
sleep 5 seconds
}
kill the process No. PROCNUM
restart the script
Basically, what it does is, that once the socket closes, after a few tries, it kills the process which was listening on that port.
I'm pretty sure it's possible, but I can't figure out how to do it properly. Mostly because I don't understand pipes and am not really familiar with grep. Thank you for your help, in advance.

Don't want be offensive, but if you can write a "generalized" program all you need is learn th syntax of the while, if for bash and read the man pages of the grep and kill and so on...
And the pipes are the same as in your garden. Having two things: tap and pond. You can fill your pond with many ways (e.g. with rain). Also, you can open your tap getting water. But if you want fill the pond with the water from a tap, need a pipe. That's all. Syntax:
tap | pond
the output from a tap
connect with a pipe
to the (input) of the pond
e.g.
netstat | grep
the output from a netstat
connect with a pipe
to the input of the grep
that's all magic... :)
About the syntax: You tagged your question as bash.
So googling for a bash while syntax will show to you, this Beginners Bash guide
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_09_02.html
to, and you can read about the if in the same website.
Simply can't believe than after 3 hours you cannot understand basic while and if syntax to write your program with a bash syntax - especially, when you able write an "generalized" program...
is is not to hard (with modifying the 1st example in the above page) to write:
THRESHOLD="0"
while [ $THRESHOLD -lt 10 ]
do
#do the IF here
THRESHOLD=$[$THRESHOLD+1]
done
and so on...

#!/bin/bash
# write a little function
function do_error {
echo "$#" 1>&2
exit 1
}
# make the user pass in the path to the executable
if [ "$1" == "" ]; then
do_error "USAGE: `basename $0` <path to your executable>"
fi
if [ ! -e $1 ]; then
do_error "Unable to find executable at $1"
fi
if [ ! -x $1 ]; then
do_error "$1 is not an executable"
fi
PROC="$1"
PROCNAME=`basename $PROC`
# forever
while [ 1 ]; do
# check whether the process is up
proc=`ps -ef | grep $PROCNAME 2>/dev/null`
# if it is not up, start it in the background (unless it's a daemon)
if [ "$proc" == "" ]; then
$PROC &
fi
# reinitialize the threshold
threshold=0
# as long as we haven't tried 10 time, continue trying
while [ threshold -lt 10 ]; do
# run netstat, look for port 25565, and see if the connection is established.
# it would be better to checks to make sure
# that the process we expect is the one that established the connection
output=`netstat -anp | grep 25565 | grep ESTABLISHED 2>/dev/null`
# if netstat found something, then our process was able to establish the connection
if [ "$output" != "" ]; then
threshold = 0
else
# increment the threshold
threshold=$((threshold + 1))
fi
# i would sleep for one second
sleep 1
done
kill -9 $PROCNUM
done

Related

Can I use timeout's command in bash for a block of code?

I would like to use timeout command, but It supports only to timeout a single command.
My goal is to do something like this - waiting untill a list of ports are up:
timeout 60 for port in $ports
do
while ! nc -zv localhost $port; do
sleep 1
done
done
if [[ $? -ne 0 ]]; then
echo not all ports up on time
fi
I want the for loop to stop if 60 seconds have passed, and check if it was success or not.
I understand that I can achieve this by using something like:
timeout 60 bash -c "..."
But this will be very unreadable. I thought maybe a bash function would work as a command but it didn't work..
Any ideas?
After some tests, I successfully found a way to implement this using bash functions, although it looks a bit weird, it is still more readable. Here is an example:
function my_loop() {
for port in $ports; do
while ! nc -zv localhost $port; do
sleep 1
done
done
}
export -f my_loop
timeout 60 bash -c "my_loop"
if [[ $? -ne 0 ]]; then
echo not all ports up on time
fi

How to wait until Kubernetes assigned an external IP to a LoadBalancer service?

Creating a Kubernetes LoadBalancer returns immediatly (ex: kubectl create -f ... or kubectl expose svc NAME --name=load-balancer --port=80 --type=LoadBalancer).
I know a manual way to wait in shell:
external_ip=""
while [ -z $external_ip ]; do
sleep 10
external_ip=$(kubectl get svc load-balancer --template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}")
done
This is however not ideal:
Requires at least 5 lines Bash script.
Infinite wait even in case of error (else requires a timeout which increases a lot line count).
Probably not efficient; could use --wait or --wait-once but using those the command never returns.
Is there a better way to wait until a service external IP (aka LoadBalancer Ingress IP) is set or failed to set?
Just to add to the answers here, the best option right now is to use a bash script. For convenience, I've put it into a single line that includes exporting an environmental variable.
Command to wait and find Kubernetes service endpoint
bash -c 'external_ip=""; while [ -z $external_ip ]; do echo "Waiting for end point..."; external_ip=$(kubectl get svc NAME_OF_YOUR_SERVICE --template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}"); [ -z "$external_ip" ] && sleep 10; done; echo "End point ready-" && echo $external_ip; export endpoint=$external_ip'
I've also modified your script so it only executes a wait if the ip isn't available. The last bit will export an environment variable called "endpoint"
Bash Script to Check a Given Service
Save this as check-endpoint.sh and then you can execute $sh check-endpoint.sh SERVICE_NAME
#!/bin/bash
# Pass the name of a service to check ie: sh check-endpoint.sh staging-voting-app-vote
# Will run forever...
external_ip=""
while [ -z $external_ip ]; do
echo "Waiting for end point..."
external_ip=$(kubectl get svc $1 --template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}")
[ -z "$external_ip" ] && sleep 10
done
echo 'End point ready:' && echo $external_ip
Using this in a Codefresh Step
I'm using this for a Codefresh pipeline and it passes a variable $endpoint when it's done.
GrabEndPoint:
title: Waiting for endpoint to be ready
image: codefresh/plugin-helm:2.8.0
commands:
- bash -c 'external_ip=""; while [ -z $external_ip ]; do echo "Waiting for end point..."; external_ip=$(kubectl get svc staging-voting-app-vote --template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}"); [ -z "$external_ip" ] && sleep 10; done; echo "End point ready-" && echo $external_ip; cf_export endpoint=$external_ip'
This is little bit tricky by working solution:
kubectl get service -w load-balancer -o 'go-template={{with .status.loadBalancer.ingress}}{{range .}}{{.ip}}{{"\n"}}{{end}}{{.err}}{{end}}' 2>/dev/null | head -n1
We had a similar problem on AWS EKS and wanted to have a one-liner for that to use in our CI pipelines. kubectl wait would be ideal, but will not be able to wait on arbitrary jsonpath until v1.23 (see this PR).
Until then we can simply "watch" the output of a command until a particular string is observed and then exit using the until loop:
until kubectl get service/<service-name> --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
To avoid an infinite loop you could enhance it using timeout (brew install coreutils on a Mac):
timeout 10s bash -c 'until kubectl get service/<service-name> --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done'
Getting the ip after that is easy using:
kubectl get service/<service-name> --output=jsonpath='{.status.loadBalancer.ingress[0].ip}'
or when using a service like AWS EKS you most likely have hostname populated instead of ip:
kubectl get service/<service-name> --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Maybe this is not the solution that you're looking for but at least it has less lines of code:
until [ -n "$(kubectl get svc load-balancer -o jsonpath='{.status.loadBalancer.ingress[0].ip}')" ]; do
sleep 10
done
There's not really a "failed to set" condition because we will retry it forever. A failure might have been a transient error in the cloud provider or a quota issue that gets resolved over the course of hours or days, or any number of things. The only failure comes from "how long are you willing to wait?" - which only you can know.
We don't have a general "wait for expression" command because it ends up being arbitrarily complex and you're better off just coding that in a real language. Ergo the bash loop above. We could do better about having a 'watch' command, but it's still a timeout in the end.
Really just a clean-up of #Dan Garfield's working example; My OCD wouldn't let this slide. In this case:
on GCP
requesting an internal lb
with an annotation in a service definition
apiVersion: v1
kind: Service
metadata:
name: yo
annotations:
cloud.google.com/load-balancer-type: "Internal"
# external-dns.alpha.kubernetes.io/hostname: vault.stage.domain.tld.
...
NOTE: I've only been able to get external-dns to associate names to public IP addresses.
This has been scripted to accept a few arguments, now it's a library; example:
myServiceLB=$1
while true; do
successCond="$(kubectl get svc "$myServiceLB" \
--template="{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}")"
if [[ -z "$successCond" ]]; then
echo "Waiting for endpoint readiness..."
sleep 10
else
sleep 2
export lbIngAdd="$successCond"
pMsg """
The Internal LoadBalancer is up!
"""
break
fi
done
Later, $lbIngAdd can be used to set records. Seems like -o jsonpath="{.status.loadBalancer.ingress[*].ip}" would work as well; whatever works.
Thanks for getting us started Dan :-)
Here's a generic bash function to watch with timeout, for any regexp in the output of a given command:
function watch_for() {
CMD="$1" # Command to watch. Variables should be escaped \$
REGEX="$2" # Pattern to search
ATTEMPTS=${3:-10} # Timeout. Default is 10 attempts (interval of second)
COUNT=0;
echo -e "# Watching for /$REGEX/ during $ATTEMPTS seconds, on the output of command:\n# $CMD"
until eval "$CMD" | grep -m 1 "$REGEX" || [[ $COUNT -eq $ATTEMPTS ]]; do
echo -e "$(( COUNT++ ))... \c"
sleep 1
done
if [[ $COUNT -eq $ATTEMPTS ]]; then
echo "# Limit of $ATTEMPTS attempts has exceeded."
return 1
fi
return 0
}
And here's how I used it to wait until a worker node gets an external IP (which took more than a minute):
$ watch_for "kubectl get nodes -l node-role.kubernetes.io/worker -o wide | awk '{print \$7}'" \
"[0-9]" 100
0... 1... 2... 3... .... 63... 64... 3.22.37.41

Making bash script to check connectivity and change connection if necessary. Help me improve it?

My connection is flaky, however I have a backup one. I made some bash script to check for connectivity and change connection if the present one is dead. Please help me improve them.
The scripts almost works, except for not waiting long enough to receive an IP (it cycles to next step in the until loop too quick). Here goes:
#!/bin/bash
# Invoke this script with paths to your connection specific scripts, for example
# ./gotnet.sh ./connection.sh ./connection2.sh
until [ -z "$1" ] # Try different connections until we are online...
do
if eval "ping -c 1 google.com"
then
echo "we are online!" && break
else
$1 # Runs (next) connection-script.
echo
fi
shift
done
echo # Extra line feed.
exit 0
And here is an example of the slave scripts:
#!/bin/bash
ifconfig wlan0 down
ifconfig wlan0 up
iwconfig wlan0 key 1234567890
iwconfig wlan0 essid example
sleep 1
dhclient -1 -nw wlan0
sleep 3
exit 0
Here's one way to do it:
#!/bin/bash
while true; do
if ! [ "`ping -c 1 google.com; echo $?`" ]; then #if ping exits nonzero...
./connection_script1.sh #run the first script
sleep 10 #give it a few seconds to complete
fi
if ! [ "`ping -c 1 google.com; echo $?`" ]; then #if ping *still* exits nonzero...
./connection_script2.sh #run the second script
sleep 10 #give it a few seconds to complete
fi
sleep 300 #check again in five minutes
done
Adjust the sleep times and ping count to your preference. This script never exits so you would most likely want to run it with the following command:
./connection_daemon.sh 2>&1 > /dev/null & disown
Have you tried omitting the -nw option from the dhclient command?
Also, remove the eval and quotes from your if they aren't necessary. Do it like this:
if ping -c 1 google.com > /dev/null 2>&1
Trying using ConnectTimeout ${timeout} somewhere.

How do I make sure my bash script isn't already running?

I have a bash script I want to run every 5 minutes from cron... but there's a chance the previous run of the script isn't done yet... in this case, i want the new run to just exit. I don't want to rely on just a lock file in /tmp.. I want to make sure sure the process is actually running before i honor the lock file (or whatever)...
Here is what I have stolen from the internet so far... how do i smarten it up a bit? or is there a completely different way that's better?
if [ -f /tmp/mylockFile ] ; then
echo 'Script is still running'
else
echo 1 > /tmp/mylockFile
/* Do some stuff */
rm -f /tmp/mylockFile
fi
# Use a lockfile containing the pid of the running process
# If script crashes and leaves lockfile around, it will have a different pid so
# will not prevent script running again.
#
lf=/tmp/pidLockFile
# create empty lock file if none exists
cat /dev/null >> $lf
read lastPID < $lf
# if lastPID is not null and a process with that pid exists , exit
[ ! -z "$lastPID" -a -d /proc/$lastPID ] && exit
echo not running
# save my pid in the lock file
echo $$ > $lf
# sleep just to make testing easier
sleep 5
There is at least one race condition in this script. Don't use it for a life support system, lol. But it should work fine for your example, because your environment doesn't start two scripts simultaneously. There are lots of ways to use more atomic locks, but they generally depend on having a particular thing optionally installed, or work differently on NFS, etc...
You might want to have a look at the man page for the flock command, if you're lucky enough to get it on your distribution.
NAME
flock - Manage locks from shell scripts
SYNOPSIS
flock [-sxon] [-w timeout] lockfile [-c] command...
Never use a lock file always use a lock directory.
In your specific case, it's not so important because the start of the script is scheduled in 5min intervals. But if you ever reuse this code for a webserver cgi-script you are toast.
if mkdir /tmp/my_lock_dir 2>/dev/null
then
echo "running now the script"
sleep 10
rmdir /tmp/my_lock_dir
fi
This has a problem if you have a stale lock, means the lock is there but no associated process. Your cron will never run.
Why use a directory? Because mkdir is an atomic operation. Only one process at a time can create a directory, all other processes get an error. This even works across shared filesystems and probably even between different OS types.
Store your pid in mylockFile. When you need to check, look up ps for the process with the pid you read from file. If it exists, your script is running.
If you want to check the process's existence, just look at the output of
ps aux | grep your_script_name
If it's there, it's not dead...
As pointed out in the comments and other answers, using the PID stored in the lockfile is much safer and is the standard approach most apps take. I just do this because it's convenient and I almost never see the corner cases (e.g. editing the file when the cron executes) in practice.
If you use a lockfile, you should make sure that the lockfile is always removed. You can do this with 'trap':
if ( set -o noclobber; echo "locked" > "$lockfile") 2> /dev/null; then
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
echo "Locking succeeded" >&2
rm -f "$lockfile"
else
echo "Lock failed - exit" >&2
exit 1
fi
The noclobber option makes the creation of lockfile atomic, like using a directory.
As a one-liner and if you do not want to use a lockfile (e.g. b/c/ of a read only filesystem, etc)
test "$(pidof -x $(basename $0))" != $$ && exit
It checks that the full list of PID that bear the name of your script is equal to the current PID. The "-x" also checks for the name of shell scripts.
Bash makes it even shorter and faster:
[[ "$(pidof -x $(basename $0))" != $$ ]] && exit
In some cases, you might want to be able to distinguish between who is running the script and allow some concurrency but not all. In that case, you can use per-user, per-tty or cron-specific locks.
You can use environment variables such as $USER or the output of a program such as tty to create the filename. For cron, you can set a variable in the crontab file and test for it in your script.
you can use this one:
pgrep -f "/bin/\w*sh .*scriptname" | grep -vq $$ && exit
I was trying to solve this problem today and I came up with the below:
COMMAND_LINE="$0 $*"
JOBS=$(SUBSHELL_PID=$BASHPID; ps axo pid,command | grep "${COMMAND_LINE}" | grep -v $$ | g rep -v ${SUBSHELL_PID} | grep -v grep)
if [[ -z "${JOBS}" ]]
then
# not already running
else
# already running
fi
This relies on $BASHPID which contains the PID inside a subshell ($$ in the subshell is the parent pid). However, this relies on Bash v4 and I needed to run this on OSX which has Bash v3.2.48. I ultimately came up with another solution and it is cleaner:
JOBS=$(sh -c "ps axo pid,command | grep \"${COMMAND_LINE}\" | grep -v grep | grep -v $$")
You can always just:
if ps -e -o cmd | grep scriptname > /dev/null; then
exit
fi
But I like the lockfile myself, so I wouldn't do this without the lock file as well.
Since a socket solution has not yet been mentioned it is worth pointing out that sockets can be used as effective mutexes. Socket creation is an atomic operation, like mkdir is as Gunstick pointed out, so a socket is suitable to use as a lock or mutex.
Tim Kay's Perl script 'Solo' is a very small and effective script to make sure only one copy of a script can be run at any one time. It was designed specifically for use with cron jobs, although it works perfectly for other tasks as well and I've used it for non-crob jobs very effectively.
Solo has one advantage over the other techniques mentioned so far in that the check is done outside of the script you only want to run one copy of. If the script is already running then a second instance of that script will never even be started. This is as opposed to isolating a block of code inside the script which is protected by a lock. EDIT: If flock is used in a cron job, rather than from inside a script, then you can also use that to prevent a second instance of the script from starting - see example below.
Here's an example of how you might use it with cron:
*/5 * * * * solo -port=3801 /path/to/script.sh args args args
# "/path/to/script.sh args args args" is only called if no other instance of
# "/path/to/script.sh" is running, or more accurately if the socket on port 3801
# is not open. Distinct port numbers can be used for different programs so that
# if script_1.sh is running it does not prevent script_2.sh from starting, I've
# used the port range 3801 to 3810 without conflicts. For Linux non-root users
# the valid port range is 1024 to 65535 (0 to 1023 are reserved for root).
* * * * * solo -port=3802 /path/to/script_1.sh
* * * * * solo -port=3803 /path/to/script_2.sh
# Flock can also be used in cron jobs with a distinct lock path for different
# programs, in the example below script_3.sh will only be started if the one
# started a minute earlier has already finished.
* * * * * flock -n /tmp/path.to.lock -c /path/to/script_3.sh
Links:
Solo web page: http://timkay.com/solo/
Solo script: http://timkay.com/solo/solo
Hope this helps.
You can use this.
I'll just shamelessly copy-paste the solution here, as it is an answer for both questions (I would argue that it's actually a better fit for this question).
Usage
include sh_lock_functions.sh
init using sh_lock_init
lock using sh_acquire_lock
check lock using sh_check_lock
unlock using sh_remove_lock
Script File
sh_lock_functions.sh
#!/bin/bash
function sh_lock_init {
sh_lock_scriptName=$(basename $0)
sh_lock_dir="/tmp/${sh_lock_scriptName}.lock" #lock directory
sh_lock_file="${sh_lock_dir}/lockPid.txt" #lock file
}
function sh_acquire_lock {
if mkdir $sh_lock_dir 2>/dev/null; then #check for lock
echo "$sh_lock_scriptName lock acquired successfully.">&2
touch $sh_lock_file
echo $$ > $sh_lock_file # set current pid in lockFile
return 0
else
touch $sh_lock_file
read sh_lock_lastPID < $sh_lock_file
if [ ! -z "$sh_lock_lastPID" -a -d /proc/$sh_lock_lastPID ]; then # if lastPID is not null and a process with that pid exists
echo "$sh_lock_scriptName is already running.">&2
return 1
else
echo "$sh_lock_scriptName stopped during execution, reacquiring lock.">&2
echo $$ > $sh_lock_file # set current pid in lockFile
return 2
fi
fi
return 0
}
function sh_check_lock {
[[ ! -f $sh_lock_file ]] && echo "$sh_lock_scriptName lock file removed.">&2 && return 1
read sh_lock_lastPID < $sh_lock_file
[[ $sh_lock_lastPID -ne $$ ]] && echo "$sh_lock_scriptName lock file pid has changed.">&2 && return 2
echo "$sh_lock_scriptName lock still in place.">&2
return 0
}
function sh_remove_lock {
rm -r $sh_lock_dir
}
Usage example
sh_lock_usage_example.sh
#!/bin/bash
. /path/to/sh_lock_functions.sh # load sh lock functions
sh_lock_init || exit $?
sh_acquire_lock
lockStatus=$?
[[ $lockStatus -eq 1 ]] && exit $lockStatus
[[ $lockStatus -eq 2 ]] && echo "lock is set, do some resume from crash procedures";
#monitoring example
cnt=0
while sh_check_lock # loop while lock is in place
do
echo "$sh_scriptName running (pid $$)"
sleep 1
let cnt++
[[ $cnt -gt 5 ]] && break
done
#remove lock when process finished
sh_remove_lock || exit $?
exit 0
Features
Uses a combination of file, directory and process id to lock to make sure that the process is not already running
You can detect if the script stopped before lock removal (eg. process kill, shutdown, error etc.)
You can check the lock file, and use it to trigger a process shutdown when the lock is missing
Verbose, outputs error messages for easier debug

killall httpd for sleep process

this shell explain the issue ,
after executing the .sh file halt and nothing happen , any clue where is my mistake
its kill httpd if there is more than 10 sleep process and start the httpd with zero sleep process
#!/bin/bash
#this means loop forever
while [ 1 ];
do HTTP=`ps auwxf | grep httpd | grep -v grep | wc -l`;
#the above line counts the number of httpd processes found running
#and the following line says if there were less then 10 found running
if [ $[HTTP] -lt 10 ];
then killall -9 httpd;
#inside the if now, so there are less then 10, kill them all and wait 1 second
sleep 1;
#start apache
/etc/init.d/httpd start;
fi;
#all done, sleep for ten seconds before we loop again
sleep 10;done
Why would you kill the child processes? If you do that you killing all ongoing sessions. Would it not be easier to setup your Webserver configuration so that it matches your needs?
As Dennis has mentioned already your script should look like:
#!/bin/bash
BINNAME=httpd # Name of the process
TIMEOUT=10 # Seconds to wait until next loop
MAXPROC=10 # Maximum amount of procs for given daemon
while true
do
# Count number of procs
HTTP=`pgrep $BINNAME | wc -l`
# Check if more then $MAXPROC are running
if [ "$HTTP" -gt "$MAXPROC" ]
then
# Kill the procs
killall-9 $BINNAME
sleep 1
# start http again
/etc/init.d/httpd start
fi
sleep $TIMEOUT
done
Formating makes code more readable ;)
I can't see anything wrong with it.
This line:
if [ $[HTTP] -lt 10 ];
should probably be:
if [ ${HTTP} -lt 10 ];
even though yours works.
If you add this as the last line, you should never see its output since you're in an infinite while loop.
echo "At end"
If you do, then that's really weird.
Make your first line look like this and it will display the script line-by-line as it executes to help you see where it's going wrong:
#!/bin/bash -x
Watch out for killall if you are trying to write portable scripts. It doesn't mean the same thing on every system: while on linux it means "kill processes named like this" on some systems it means "kill every process I have permission to kill".
If you run the later version as root, one of the things you kill is init. Oops.

Resources