In the demo of consul, there are checks for disk utilization and memory utilization.
http://demo.consul.io/ui/#/ams2/nodes/ams2-server-1
How could you write a configuration to do what the demo shows? Warning at 10% and critical erros at 5% ?
Here is what I am trying
{
"check": {
"name": "Disk Util",
"script": "disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) | if [ $disk_util > 90 ] ; then echo 'Disk /dev/sda above 90% full' && exit 1; elif [ $disk_util > 80 ] ; then echo 'Disk /dev/sda above 80%' && exit 3; else exit 0; fi",
"interval": "2m"
}
}
Here is the same script, but more human readable
disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) |
if [ $disk_util > 90 ]
then echo 'Disk /dev/sda above 90% full' && exit 1
elif [ $disk_util > 80 ]
then echo 'Disk /dev/sda above 80%' && exit 3
else exit 0; fi
It seems like the check is working, but it doesn't print out any text. How can I verify this is working, and print output?
The output that you are seeing is produced by Nagios plugin check_disk (https://www.monitoring-plugins.org/doc/man/check_disk.html)
The "Output" field gets populated by stdout of the check. Your check runs cleanly and produces no output. So you see nothing.
To add some notes just add a "notes" field in the check definition as outlined in the documentation: https://www.consul.io/docs/agent/checks.html
Your check json file would look something like this:
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5%",
"interval": "2m"
}
}
Exit code for your warning state should be 1, for critical, 2 or higher. (See "Check Scripts" at https://www.consul.io/docs/agent/checks.html), so you likely want to swap your exit lines.
Your 'OK' state (disk use < 80%) does not give any output, which is most likely why you see blank output.
I second the notion of using nagios plugins rather than rolling your own. Many OSes will have a nagios-plugins package(s) that are a yum/apt install away.
Health checks rely on the exit code of the check. To test if the health checks are being read by the Consul server you could write a script that always exits with a 1, and then you will see the health check as failed. Then replace it with a script that always returns 0 and you should see the health check as passed.
If you want to return text to the ui, add an output field to the json.
It seem consul analyse stdout only and not stderr. I have tested with redirect ( 2>&1 ) in service check file configuration. That seem work !
JSON config
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5% 2>&1",
"interval": "2m"
}
}
Output result
Related
I'm trying to check that all instances attached to an AWS ELB are in a state of "InService",
For that, I created an AWS CLI command to check the status of the instances.
problem is that the JSON output returns the status of both instances.
So it is not that trivial to examine the output as I wish.
When I run the command:
aws elb describe-instance-health --load-balancer-name ELB-NAME | jq -r '.[] | .[] | .State'
The output is:
InService
InService
The complete JSON is:
{
"InstanceStates": [
{
"InstanceId": "i-0cc1e6d50ccbXXXXX",
"State": "InService",
"ReasonCode": "N/A",
"Description": "N/A"
},
{
"InstanceId": "i-0fc21ddf457eXXXXX",
"State": "InService",
"ReasonCode": "N/A",
"Description": "N/A"
}
]
}
What I've done so far is creating that one liner shell command:
export STR=$'InService\nInService'
if aws elb describe-instance-health --load-balancer-name ELB-NAME | jq -r '.[] | .[] | .State' | grep -q "$STR"; then echo 'yes'; fi
But I get "yes" as long as there is "InService" at the first command output
Is there a way I can get TRUE/YES only if I get twice "InService" as an output?
or any other way to determine that this is indeed what I got in return?
Without seeing an informative sample of the JSON it's not clear what the best solution would be, but the following meets the functional requirements as I understand them, without requiring any further post-processing:
jq -r '
def count(stream): reduce stream as $s (0; .+1);
if count(.[][] | select(.State == "InService")) > 1 then "yes" else empty end
'
I make a script what will check the asda home delivery slots from the api
Here it is, I call it get_slots.sh
You have to start tor or if you don't then you have to get rid of the line about sock5 hostname (you can see tor port number in command line with ps) but if you don't use tor they might cancel your account if they get narky about you polling their website
obviously u have to change the vars at the top
Query parameters and api url was kind of found out with inspector in chrome while using their normal java script thingy for joe public, top secret NOT
#!/bin/bash
my_postcode="SW1A1AA" # CHANGEME
account_id=18972357834 # JUST INVENT A NUMBER
order_id=22985263473 # LIKEWISE
ua='user_agent_I_want_to_fake'
my_tor_port=9150
#----------------
#ftype="POPUP"
#ftype="INSTORE_PICKUP"
ftype="DELIVERY"
format="%Y-%m-%dT00:00:00+01:00"
start_date=$(date "+$format")
end_date=$(date -d "+16 days" "+$format")
read -r -d '' data <<EOF
{
"data": {
"customer_info": {
"account_id": "$account_id"
},
"end_date": "$end_date",
"order_info": {
"line_item_count": 0,
"order_id": "$order_id",
"restricted_item_types": [],
"sub_total_amount": 0,
"total_quantity": 0,
"volume": 0,
"weight": 0
},
"reserved_slot_id": "",
"service_address": {"postcode":"$my_postcode"},
"service_info": {
"enable_express": false,
"fulfillment_type": "$ftype"
},
"start_date": "$start_date"
},
"requestorigin": "gi"
}
EOF
data=$(echo $data | tr -d ' ')
url='https://groceries.asda.com/api/v3/slot/view'
referer='https://groceries.asda.com/checkout/book-slot?origin=/account/orders'
curl -s \
--socks5-hostname localhost:$my_tor_port \
-H "Content-type: application/json; charset=utf-8" \
-H "Referer: $referer" \
-A "$ua" \
-d "$data" \
$url \
| python -m json.tool
anyway now i make another script to keep running it and mail me if any slots are available,
more vars u need 2 change at the top of this one
#!/bin/sh
me="my#email.address"
my_smtp_server="smtp.myisp.net:25"
#------------------------------------
mailed=0
ftmp=/tmp/slots.$$
while true
do
date
f=slots/`date +%Y%m%d/%H/%Y-%m-%d_%H%M%S`.json
d=`dirname $f`
[ -d $d ] || mkdir -p $d
./get_slots.sh > $f
if egrep -B1 'status.*"AVAILABLE"' $f > $ftmp
then
echo "found"
if [ $mailed -eq 0 ]
then
dates=`perl -nle '/start_time.*2020-(..-..T..):/ && print $1' $ftmp`
mailx \
-r "$me" -s "asda on $dates lol" \
-S smtp="$my_smtp_server" "$me" < $ftmp
echo "mailed"
mailed=1
fi
fi
sleep 120
done
so i kind of naughty here cos i need the timestamp for slots with status available to put in the email ... and i really cba to parse the json properly so i just rely on its in the line before the status
like the pretty printed json puts the stuff in alphfabetical order and comes out with something like
"slot_info": {
STUFF
"slot_type": null,
"start_time": "2020-06-10T19:00:00Z",
"status": "AVAILABLE",
"total_discount": 0.0,
"total_premium": 0.0,
MORE STUFF
so yeah all i do is egrep -B1
oh yeah i also naughty hard coded 2020 not do proper regex for the year, cos if this is all still going on after 2020 i might as well just starve anyway so dont want to over engineer it
anyway as you can see once it already mailed me it still keeps running cos i want to store the json files and may be analise them laters , it just dont mail me again after that unless i re start it
anyway my question is my script only check every two minutes and i want it to check more often so i can beat people.
okay sorted it the sleep 120 is 2 minutes i thought it was 1.2 minutes sorry forgot a minute is 60 seconds not 100 lol
oh yeah dont worry im not going to do this every 5 seconds like....!
just now i know the sleep is working properly i can change it to 60, still no more often than a lot of the people sat there re loading it manually believe me ......
Should able to process larger log files and provide exception message reports
After completion of log analysis, report notification trigger to specific mail id's.
And also please suggest which framework is the best for processing large files.[eg: spring boot/batch]
I would suggest to go with ELK stack. Stream the logs to elastic search and set up alerts in Kibana.
Can use sendmail client on system and run script on that system to send alert on any Exception.
exception="Exception" # "Error", "HTTP 1.1 \" 500", etc
ignoredException="ValidationException"
# log file to scan
logFileToScan=/var/log/tomcat8/log/application.log
# file where we will keep log of this script
logFilePath=/home/ec2-user/exception.log
# a file where we store till what line the log file has been scanned
# initalize it with 0
countPath=/home/ec2-user/lineCount
# subject with which you want to receive the mail regading Exception
subject="[ALERT] Exception"
# from whom do you want to send the mail regarding Exception
from="abc#abc.com"
# to whom do you want to send the mail
to="xyz#xyz.com"
# number of lines, before the line containing the word to be scanned, to be sent in the mail
linesBefore=1
# number of lines, before the line containing the word to be scanned, to be sent in the mail
linesAfter=4
# start line
fromLine=`cat $countPath`
# current line count in the file
toLine=`wc -l $logFileToScan | awk '{print $1}'`
#logs are rolling so if fromLine has a value greater than toLine then fromLine has to be set to 0
if [ "$fromLine" == "" ]; then
fromLine=0
echo `date` fromLine values was empty, set to 0 >> $logFilePath
elif [ $fromLine -gt $toLine ]; then
echo `date` logfile was rolled, updating fromLine from $fromLine to 0 >> $logFilePath
fromLine=0
fi
# if from n to lines are equal then no logs has been generated since last scan
if [ "$fromLine" == "$toLine" ]; then
echo `date` no logs genetared after last scan >> $logFilePath
else
echo `date` updating linecount to $toLine >> $logFilePath
echo $toLine > $countPath
logContent=`tail -n +"$fromLine" $logFileToScan | head -n "$((toLine - fromLine))" | grep -v $ignoredException | grep -A $linesAfter -B $linesBefore $exception`
logContent=`echo $logContent | cut -c1-2000`
if [ "$logContent" == "" ]; then
echo `date` no exception found >> $logFilePath
else
/usr/sbin/sendmail $to <<EOF
subject: $subject
from: $from
logContent=$logContent
EOF
fi
fi
I use github + travis-ci for continuous integration. I have a maven project with lot of tests. I want parse all console and find a special word by xpath. If this word is present x times my job is OK else my job is KO.
how parse console on travis-ci and find the number of occured word by xpath or other method?
Given log.txt as input, and desired input lines like these:
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.957 sec - in TestSuite
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
Assuming we just want to test the "10", this might do:
n=5
awk -F '[ ,]*' '/^Tests run:/ \
{ if ($3>'$n') { print "OK found " $3 ; x=$3 ; exit} } \
END {if (x<'$n') print "Fail."} ' log.txt
Output:
OK found 10
I try with travis API =>
travis logs
But with this API is not possible because I have a infinite loop (this command copy logs to logs so copy logs to logs so copy copy ...). This API is good if you read the logs of others builds only!!!
And I find a solution:
in .travis.yml file:
script:
- test/run.sh
in run.sh file:
curl -s "https://api.travis-ci.org/jobs/${TRAVIS_JOB_ID}/log.txt?deansi=true" > nonaui.log
expectation=`sed -n 's:.*<EXPECTED_RESULTS>\(.*\)</EXPECTED_RESULTS>.*:\1:p' nonaui.log | head -n 1`
nb_expectation=`sed -n ":;s/$expectation//p;t" nonaui.log | sed -n '$='`
# 3 = 1 (real) + 2 counters (Excel and CSV)
if [ "$nb_expectation" == "3" ]; then
echo "******** All counter is SUCCESS"
else
echo "******** All counter is FAIL"
exit 255
fi
exit 0
I have created bash scirpt that takes jstat metrics of my jvm instances!
Here is the output example :
demo.server1.sms.jstat.eden 24.34 0
demo.server1.lcms.jstat.eden 54.92 0
demo.server1.lms.jstat.eden 89.49 0
demo.server1.tms.jstat.eden 86.05 0
But when the Sensu-client runs this script it returns
Could not attach to 8584
Could not attach to 8588
Could not attach to 17141
Could not attach to 8628
demo.server1.sms.jstat.eden 0
demo.server1.lcms.jstat.eden 0
demo.server1.lms.jstat.eden 0
demo.server1.tms.jstat.eden 0
Here is the example of check_cron.json
{
"checks": {
"jstat_metrics": {
"type": "metric",
"handlers": ["graphite"],
"command": "/etc/sensu/plugins/jstat-metrics.sh",
"interval": 5,
"subscribers": [ "webservers" ]
}
}
}
And piece of my bash script
jvm_list=("sms:$sms" "lcms:$lcms" "lms:$lms" "tms:$tms" "ums:$ums")
for jvm_instance in ${jvm_list[#]}; do
project=${jvm_instance%%:*}
pid=${jvm_instance#*:}
if [ "$pid" ]; then
metric=`jstat -gc $pid|tail -n 1`
output=$output$'\n'"demo.server1.$project.jstat.eden"$'\t'`echo $metric |awk '{ print $3}'`$'\t0'
fi
done
echo "$output"
I find out that problem is with jstat and i tried to write full jstat path like /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/jstat -gc $pid|tail -n 1 but it didn't help!
By the way if i will comment this row the output like "Could not attach to 8584" disappears!
I'm not a Java or Sensu user, but I can guess what happens.
Most likely, sensu-client runs your script as a user different from the one you use when testing manually, which doesn't have permissions to "attach" (whatever that means) to your jvm instances.
To verify this you can add invocation of "whoami" to your script, run it from sensu-client again, see what user it runs your script under and, if it is different, try to run your script as that user.
Yes you're right sensu runs all script as sensu user. To use jstat you have to add sensu to a sudoers.
just add file /etc/sudoers.d/sensu
Example:
Defaults:sensu !requiretty
Defaults:sensu secure_path =
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
sensu ALL = NOPASSWD: /etc/sensu/plugins/jsat-metrics.rb