Is there any way except using counter to enable time based alerts - shell

Suppose that ps -ef | grep apache | wc -l gives output 2 which means 2 process are running.
In my server connection fluctuates, so I want to send an alert when the output of ps -ef| grep apache |wc -l is zero more than 5 Minutes.

I want to send an alert when the output of ps -ef| grep apache |wc -l is zero more than 5 Minutes.
First,ps -ef| grep apache can be unreliable because it may count grep apache as an apache process. To avoid that, use ps -ef | grep '[a]pache'. Better, try: pgrep apache. Also, if we are looking for zero or not zero processes, we don't need wc -l.
If your needs are simple, then here is a simple bash script that checks every minute to see if an apache process is running. If five successive tests show no such process, it sends an email to user#admin:
#!/bin/bash
count=0
while sleep 1m
do
if pgrep apache >/dev/null
then
count=0
else
((count++))
fi
if [ "$count" -ge 5 ]
then
echo "Houston, we have a problem: $count" | mail admin#host
count=0
fi
done
Since this only checks every minute, it could miss some process that starts and stops in less than a minute. You may need to adjust the timing. As Jonathan Leffler commented, doing this well is hard work. This script is only intended as a quick-and-simple solution.

Related

How to check whether tomcat server is started up

I want to check if tomcat server is really started up. When you start tomcat, you get an entry like "Server startup" in catalina.out. Once I got this, the script should go ahead.
That's my code
echo "Waiting for Tomcat"
if [ $(tail -f /home/0511/myapp/logs/catalina.out | grep "Server startup" | wc -l) -eq 1 ]; then
echo "Tomcat started"
fi
#further code...
Output:
Waiting for Tomcat
|
So, I am sure, after 30-60 seconds, the "tail .. | wc -l" gets 1. However, I cannot match it with my code above. Nothing happens, I have to interrupt per CTRL C.
What is wrong or is there any better solution for my intention?
Try this:
while true;do
if tail -n 100 /home/0511/myapp/logs/catalina.out | grep -q "Server
startup";then
echo "Tomcat started"
exit 0
fi
done
So you constantly check the last 100 lines from the log, and if match, exit with a message.
Another (more elegant) solution without a loop:
if tail -f /home/0511/myapp/logs/catalina.out | grep -q "Server
startup";then
echo "Tomcat started"
fi
You said "I want to check if tomcat server is really started up". If you check the log with tail and grep, in worst case scenario, you could detect an old start which ended with a crash.
Tomcat server when is started it is listening to a certain port(e.g. 8080). So you should check if tomcat server is listening on that port.
If you are using a different port replace 8080 in following lines with your custom port.
In order to display tomcat status you should use netstat. Example of a line returned by netstat:
TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING
In order to display only if Tomcat is started or not, you could use:
netstat -an|grep -e ":8080[^0-9].*[0-9].*LISTENING" && echo "Tomcat started" || echo "Tomcat not started"
The grep expression matches ":" followed by port 8080, followed by non-digit, followed by any characters, followed by digit, followed by any characters, followed by "LISTENING".
In order to wait for Tomcat you could use:
echo Waiting for Tomcat port to be ...
until netstat -an | grep -e ":8080[^0-9].*[0-9].*LISTENING" > /dev/null ; do
sleep 1 ;
done
echo Tomcat started
grep doesn't exit as soon as it found a match; it keeps reading for further matches.
To make grep not produce any output, but instead exit with 0 status as soon as a match is found, use the -q option:
if tail -f /home/0511/myapp/logs/catalina.out | grep -q "Server startup"; then
That said, a log message isn't the most reliable way to check if a server is actually up and serving. Instead, you could try, for example, repeatedly polling it with curl instead.

If condition based on tail and grep commands in Shell Script

I want to write an if condition in Shell script something like this:
if[ tail -10f logFile | grep -q "RUNNING" ]
So the idea is I have restarted my server and want to perform some action only after the server is started up(RUNNING). So I want to continuously tail the log and check if the server is in RUNNING mode again.
The issue with the above approach is it does not exits even after the server is RUNNING and goes into infinite loop. No code in if or else is printed.
What about?
while [ $(tail -10 logFile | grep -c RUNNING) -eq 0 ]; do sleep 1; done

Running bash script by itself vs within another script yields different variable value

I currently have 2 bash scripts:
1) tomcat.sh
#!/bin/bash
case "$1" in
'start')
/home/testuser/start.sh
;;
'status')
/home/testuser/status.sh
;;
esac
2) status.sh
#!/bin/bash
COUNT="$( ps -ef | grep tomcat| wc -l )"
echo ${COUNT}
if [ "${COUNT}" -eq 2 ]
then
echo "Tomcat is running."
else
echo "Tomcatis not running"
fi
When I check status via these two methods:
./tomcat.sh status: ${COUNT} echos a value of 4.
status.sh: ${COUNT} echos value of 2.
I'm not sure why there is a discrepancy. I'm expecting both values from echo to match since they are essentially executing status.sh. Am I missing something?
EDIT: Added in the actual search values I'm using.
Your tomcat.sh is still running when the ps -ef in status.sh is running. So, in case of using tomcat.sh ps finds at least these:
tomcat.sh
the tomcat you are looking for
grep tomcat (if you pipe, processes are started right to left, so when ps is running, your grep is also already running)
Right now I am not sure where the 4th is coming from.
In case of just running the status script, it is not finding tomcat.sh and thus you have less results. A solution could be to make your grep more specific for your use case, or use something that is more specific for your task, like pgrep (although pgrep java will possibly also give you other unwanted processes).
Possible solution:
COUNT="$( ps -ef | grep "tomcat" | grep "org.apache.catalina.startup.Bootstrap" | wc -l )"
Edit: Using a pidfile is of course also a way of doing it. In the question you show something that looks like a startup script. So writing a pidfile when starting and then reading and using that pidfile when querying, you can know if the service is started. It won't work though if someone uses another way to start the service.

How to get the correct number of background jobs running, f.ex. $(jobs | wc -l | xargs) returns 1 instead of 0

I have a jenkins build job that starts processes in the background. I need to write a function that checks wether there are still background processes running. To test it I came up with this:
#!/bin/bash -e
function waitForUploadFinish() {
runningJobs=$(jobs | wc -l | xargs)
echo "Waiting for ${runningJobs} background upload jobs to finish"
while [ "$(jobs | wc -l | xargs)" -ne 0 ];do
echo "$(jobs | wc -l | xargs)"
echo -n "." # no trailing newline
sleep 1
done
echo ""
}
for i in {1..3}
do
sleep $i &
done
waitForUploadFinish
The problem is it never comes down to 0. Even when the last sleep is done, there is still one job running?
mles-MBP:ionic mles$ ./jobs.sh
Waiting for 3 background upload jobs to finish
3
.2
.1
.1
.1
.1
.1
.1
Why I don't want to use wait here
In the Jenkins build job script where this snippet is for, i'm starting background upload processes for huge files. They don't run for 3 seconds like in the example here with sleep. They can take up to 30 minutes to proceed. If I use wait here, the user would see something like this in the log:
upload huge_file1.ipa &
upload huge_file2.ipa &
upload huge_file3.ipa &
wait
They would wonder why is nothing going on?
Instead I want to implement something like this:
upload huge_file1.ipa &
upload huge_file2.ipa &
upload huge_file3.ipa &
Waiting for 3 background upload jobs to finish
............
Waiting for 2 background upload jobs to finish
.................
Waiting for 1 background upload jobs to finish
.........
Upload done
That's why I need the loop with the current running background jobs.
This fixes it:
function waitForUploadFinish() {
runningJobs=$(jobs | wc -l | xargs)
echo "Waiting for ${runningJobs} background upload jobs to finish"
while [ `jobs -r | wc -l | tr -d " "` != 0 ]; do
jobs -r | wc -l | tr -d " "
echo -n "." # no trailing newline
sleep 1
done
echo ""
}
Note: you will only count the background processes that are started by this bash script, you will not see the background processes from the starting shell.
As the gniourf_gniourf commented: if you only need to wait and don't need to output then a simple wait after the sleeps is much simpler.
for i in {1..3}; do
sleep $i &
done
wait
Please consider comments made by gniourf_gniourf, as your design is not good to start with.
However, despite a much simpler and more efficient solution being possible, there is the question of why you are seeing what you are seeing.
I modified the first line of your loop, like so :
while [ "$(jobs | tee >(cat >&2) | wc -l | xargs)" -ne 0 ];do
The tee command takes its input and sends it to both standard out and to the file passed as argument. >(cat >&2) is syntax that, to explain it simply, provides a file to the tee command, but that file really is a named FIFO and anything written to that file will be sent to standard error, bypassing the pipeline and allowing you to see what jobs is spitting out, all while allowing the rest of the pipeline to operate normally.
If you do that, you will notice that the "job" that jobs keeps on returning is not a job, but a message stating some other job has finished. I do not know why it keeps on repeating that, but this is the cause of the problem.
You could replace :
while [ "$(jobs | wc -l | xargs)" -ne 0 ];do
With :
while [ "$(jobs -p | grep "^[0-9]" | wc -l | xargs)" -ne 0 ];do
This will cause jobs to echo PIDs, and filter out any line that does not begin with a number, so messages will not be counted.

find and kill process in ksh script (linux) not working

I have been trying to find and kill any stale process left after the stop in a ksh script on a linux machine and it doesnt seem to work. It works from the command line but in the script though
here is the code
echo "kill any process still running"
ps -ef | grep qpasa |grep -v grep | awk '{print $2}' |xargs kill
and here is the output from the script log
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]
can you you please let me know what am I doing wrong here
I think you call the script when no processes are running. Try kill without arguments and you get the same message.
You can redirect the error to /dev/null but I would try something else:
ps -ef | grep qpasa |grep -v grep | awk '{print $2}' | while read pid; do
echo "Killing ${pid}"
kill ${pid}
sleep 2
kill -9 ${pid} 2>/dev/null
done
The first kill gives qpasa the possibility to the stop controlled: Flush caches and close handles. Give qpasa 2 seconds for it.
When qpasa ignores the signal, kill it the hard way. Of course the process could have stopped already, so this time we want to ignore error messages.
When you have a lot of qpasa processes, you want to sleep 2 seconds only once.
First loop through all processes with a friendly kill, wait 5 seconds, and than hard kill the processes you find. When you make a function kill_qpasa_signal for the looping (and using $1 as kill signal), you can use
kill_qpasa_signal 15
sleep 5
kill_qpasa_signal 9

Resources