Can not take jstat metrics using bash as Sensu plugin - bash

I have created bash scirpt that takes jstat metrics of my jvm instances!
Here is the output example :
demo.server1.sms.jstat.eden 24.34 0
demo.server1.lcms.jstat.eden 54.92 0
demo.server1.lms.jstat.eden 89.49 0
demo.server1.tms.jstat.eden 86.05 0
But when the Sensu-client runs this script it returns
Could not attach to 8584
Could not attach to 8588
Could not attach to 17141
Could not attach to 8628
demo.server1.sms.jstat.eden 0
demo.server1.lcms.jstat.eden 0
demo.server1.lms.jstat.eden 0
demo.server1.tms.jstat.eden 0
Here is the example of check_cron.json
{
"checks": {
"jstat_metrics": {
"type": "metric",
"handlers": ["graphite"],
"command": "/etc/sensu/plugins/jstat-metrics.sh",
"interval": 5,
"subscribers": [ "webservers" ]
}
}
}
And piece of my bash script
jvm_list=("sms:$sms" "lcms:$lcms" "lms:$lms" "tms:$tms" "ums:$ums")
for jvm_instance in ${jvm_list[#]}; do
project=${jvm_instance%%:*}
pid=${jvm_instance#*:}
if [ "$pid" ]; then
metric=`jstat -gc $pid|tail -n 1`
output=$output$'\n'"demo.server1.$project.jstat.eden"$'\t'`echo $metric |awk '{ print $3}'`$'\t0'
fi
done
echo "$output"
I find out that problem is with jstat and i tried to write full jstat path like /usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/jstat -gc $pid|tail -n 1 but it didn't help!
By the way if i will comment this row the output like "Could not attach to 8584" disappears!

I'm not a Java or Sensu user, but I can guess what happens.
Most likely, sensu-client runs your script as a user different from the one you use when testing manually, which doesn't have permissions to "attach" (whatever that means) to your jvm instances.
To verify this you can add invocation of "whoami" to your script, run it from sensu-client again, see what user it runs your script under and, if it is different, try to run your script as that user.

Yes you're right sensu runs all script as sensu user. To use jstat you have to add sensu to a sudoers.
just add file /etc/sudoers.d/sensu
Example:
Defaults:sensu !requiretty
Defaults:sensu secure_path =
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
sensu ALL = NOPASSWD: /etc/sensu/plugins/jsat-metrics.rb

Related

How to make my currently running Autosys Job to FA [Failure] , when command script fails

I have a script which is in Autosys Job : JOB_ABC_S1
command : /ABC/script.sh
Scrpt.sh code
grep -w "ABC" /d/file1.txt
status=$?
if [ $status -eq 0 ]
then
echo "Passed"
else
echo "Failed"
exit 1
fi
My issue is even if the script failed or pass , the AutoSys job is marked as SU SUCCESS
I don't want it to mark it as success , if script fail's .. it should mark AutoSys as FA and if script pass then mark job to SU SUCCESS
What should i change in the script to make it happen ?
Job :
insert_job : JOB_ABC_S1
machine : XXXXXXXXXXX
owner : XXXXXXXX
box_name : BOX_ABC_S1
application : XXXX
permission : XXXXXXXXXXX
max_run_alarm : 60
alarm_if_fails : y
send_notification : n
std_out_file : XXXXX
std_err_file : XXXXX
command : sh /ABC/script.sh
At first look all seems to be fine.
However, i would suggest a script modification which you can try out.
By default Autosys fails the jobs if the exit code is non-zero unless specified.
JOB JIL seems to be fine.
Please update your script as below and check for 2 things:
Executed job EXIT-CODE: either it should be 1 or 2. We are trying to fail the job in both the cases.
Str log files
Script:
#!/bin/sh
srch_count=$(grep -cw ABC /d/file1.txt)
if [ $srch_count -eq 0 ]; then
echo "Passed"
#exit 0
exit 2
else
echo "Failed"
exit 1
fi
This way we can confirm if the exit code is correctly being captured by Autosys.

How to check the namenode status?

As a developer how can I check the current state of a given Namenode if it is active or standby? I have tried the getServiceState command but that is only intended for the admins with superuser access. Any command that can be run from the edge node to get the status of a provided namemnode??
Finally, I got an answer to this.
As a developer, one cannot execute dfsadmin commands due to the restriction. To check the namenode availability I used the below if loop in shellscript which did the trick. It wont tell you exactly the namenode is active but with the loop you can easily execute the desired program accordingly.
if hdfs dfs -test -e hdfs://namenodeip/* ; then
echo exist
else
echo not exist
fi
I tried your solution but that didn't work. Here's mine which works perfectly for me ( bash script ).
until curl http://<namenode_ip>:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus|grep -q 'active'; do
printf "Waiting for namenode!"
sleep 5
done
Explanation:
Running this curl request outputs namenode's status as json ( sample below ) which has a State flag indicating its status. So I'm simply checking for 'active' text in curl request output. For any other language, you just have to do a curl request and check its output.
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=NameNodeStatus",
"modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode",
"NNRole" : "NameNode",
"HostAndPort" : "<namenode_ip>:8020",
"SecurityEnabled" : false,
"LastHATransitionTime" : 0,
"State" : "active"
} ]
}

Consul - Alert if drive is full

In the demo of consul, there are checks for disk utilization and memory utilization.
http://demo.consul.io/ui/#/ams2/nodes/ams2-server-1
How could you write a configuration to do what the demo shows? Warning at 10% and critical erros at 5% ?
Here is what I am trying
{
"check": {
"name": "Disk Util",
"script": "disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) | if [ $disk_util > 90 ] ; then echo 'Disk /dev/sda above 90% full' && exit 1; elif [ $disk_util > 80 ] ; then echo 'Disk /dev/sda above 80%' && exit 3; else exit 0; fi",
"interval": "2m"
}
}
Here is the same script, but more human readable
disk_util=$(df -k | grep '/dev/sda1' | awk '{print $5}' | sed 's/[^0-9]*//g' ) |
if [ $disk_util > 90 ]
then echo 'Disk /dev/sda above 90% full' && exit 1
elif [ $disk_util > 80 ]
then echo 'Disk /dev/sda above 80%' && exit 3
else exit 0; fi
It seems like the check is working, but it doesn't print out any text. How can I verify this is working, and print output?
The output that you are seeing is produced by Nagios plugin check_disk (https://www.monitoring-plugins.org/doc/man/check_disk.html)
The "Output" field gets populated by stdout of the check. Your check runs cleanly and produces no output. So you see nothing.
To add some notes just add a "notes" field in the check definition as outlined in the documentation: https://www.consul.io/docs/agent/checks.html
Your check json file would look something like this:
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5%",
"interval": "2m"
}
}
Exit code for your warning state should be 1, for critical, 2 or higher. (See "Check Scripts" at https://www.consul.io/docs/agent/checks.html), so you likely want to swap your exit lines.
Your 'OK' state (disk use < 80%) does not give any output, which is most likely why you see blank output.
I second the notion of using nagios plugins rather than rolling your own. Many OSes will have a nagios-plugins package(s) that are a yum/apt install away.
Health checks rely on the exit code of the check. To test if the health checks are being read by the Consul server you could write a script that always exits with a 1, and then you will see the health check as failed. Then replace it with a script that always returns 0 and you should see the health check as passed.
If you want to return text to the ui, add an output field to the json.
It seem consul analyse stdout only and not stderr. I have tested with redirect ( 2>&1 ) in service check file configuration. That seem work !
JSON config
{
"check": {
"name": "disks",
"notes": "Critical 5%, warning 10% free",
"script": "/path/to/check_disk -w 10% -c 5% 2>&1",
"interval": "2m"
}
}
Output result

Expect fails but I don't see why

I have a bash script that gets info from Heroku so that I can pull a copy of my database. That script works fine in cygwin. But to run it in cron it halts because the shell that it uses halts as Heroku's authentication through Heroku Toolbelt.
Here is my crontab:
SHELL=/usr/bin/bash
5 8-18 * * 1-5 /cygdrive/c/Users/sam/work/push_db.sh >>/cygdrive/c/Users/sam/work/output.txt
I have read the Googles and the man page within cygwin to come up with this addition:
#!/usr/bin/bash
. /home/sam.walton/.profile
echo $SHELL
curl -H "Accept: application/vnd.heroku+json; version=3" -n https://api.heroku.com/
#. $HOME/.bash_profile
echo `heroku.bat pgbackups:capture --expire`
#spawn heroku.bat pgbackups:capture --expire
expect {
"Email:" { send -- "$($HEROKU_LOGIN)\r"}
"Password (typing will be hidden):" { send -- "$HEROKU_PW\r" }
timeout { echo "timed out during login"; exit 1 }
}
sleep 2
echo "first"
curl -o latest.dump -L "$(heroku.bat pgbackups:url | dos2unix)"
Here's the output from the output.txt
/usr/bin/bash
{
"links":[
{
"rel":"schema",
"href":"https://api.heroku.com/schema"
}
]
}
Enter your Heroku credentials. Email: Password (typing will be hidden): Authentication failed. Enter your Heroku credentials. Email: Password (typing will be hidden): Authentication failed. Enter your Heroku credentials. Email: Password (typing will be hidden): Authentication failed.
As you can see it appears that the output is not getting the result of the send command as it appears it's waiting. I've done so many experiments with the credentials and the expect statements. All stop here. I've seen few examples and attempted to try those out but I'm getting fuzzy eyed which is why I'm posting here. What am I not understanding?
Thanks to comments, I'm reminded to explicitly place my env variables in .bashrc:
[[ -s $USERPROFILE/.pik/.pikrc ]] && source "$USERPROFILE/.pik/.pikrc"
export HEROKU_LOGIN=myEmailHere
export HEROKU_PW=myPWhere
My revised script per #Dinesh's excellent example is below:
. /home/sam.walton/.bashrc echo $SHELL echo $HEROKU_LOGIN curl -H "Accept: application/vnd.heroku+json; version=3" -n https://api.heroku.com/
expect -d -c " spawn heroku.bat pgbackups:capture --expire --app gw-inspector expect {
"Email:" { send -- "myEmailHere\r"; exp_continue}
"Password (typing will be hidden):" { send -- "myPWhere\r" }
timeout { puts "timed out during login"; exit 1 } } " sleep 2 echo "first"
This should work but while the echo of the variable fails, giving me a clue that the variable is not being called, I am testing hardcoding the variables directly to eliminate that as a variable. But as you can see by my output not only is the echo yielding nothing, there is no clue that any diagnostics are being passed which makes me wonder if the script is even being called to run from expect, as well as the result of the spawn command. To restate, the heroku.bat command works outside the expect closure but the results are above. The result of the command directly above is:
/usr/bin/bash
{
"links":[
{
"rel":"schema",
"href":"https://api.heroku.com/schema"
}
]
}
What am I doing wrong that will show me diagnostic notes?
If you are going to use the expect code inside your bash script, instead of calling it separately, then you should have use the -c flag option.
From your code, I assume that you have the environmental variables HEROKU_LOGIN and HEROKU_PW declared in the bashrc file.
#!/usr/bin/bash
#Your code here
expect -c "
spawn <your-executable-process-here>
expect {
# HEROKU_LOGIN & HEROKU_PW will be replaced with variable values.
"Email:" { send -- "$HEROKU_LOGIN\r";exp_continue}
"Password (typing will be hidden):" { send "$HEROKU_PW\r" }
timeout { puts"timed out during login"; exit 1 }
}
"
#Your further bash code here
You should not use echo command inside expect code. Use puts instead. The option of spawning the process inside expect code will be more robust than spawning it outside.
Notice the use of double quotes with the expect -c flag. If you use single quotes, then bash script won't do any form of substitution. So, if you need bash variable substitution, you should use double quotes for the expect with -c flag.
To know about usage of -c flag, have a look at here
If you still have any issue, you can debug by appending -d with the following way.
expect -d -c "
our code here
"

Expect script clause evaluation

In my expect script, my goal is to send a command to show the properties of the two processors on a motherboard. Please assume the remote logging in is successful. It's where the send clause variables are not evaluated successfully.
I have a procedure and a variable:
set showcpu "show -d properties /SYS/MB/P\r"
I created a while loop to execute do a "send" if the "cpu" count starts at 0 and less than 2.
set cpu 0
while { $cpu < 2 } {
expect {
-re $prompt {send "${showcpu}${cpu}\r"; }
timeout {
my_puts "ILOM prompt timeout error-2" [ list $fh1 $fh3 stdout ]
exit 1
}
}
set cpu [ expr {$cpu + 1} ]
}
The execution result is this:
[BL0/SP]-> show -d properties /SYS/MB/P
show: Invalid target /SYS/MB/P
[BL0/SP]-> 0
Invalid command '0' - type help for a list of commands.
I wanted the script to combine the value $showcpu with $cpu and it should look like this:
show -d properties /SYS/MB/P0 and show -d properties /SYS/MB/P1.
Could someone please educate me on what I need to do to accomplish that?
The variable ${showcpu} itself already contains "\r" (according to 1.).
Either define it without "\r":
set showcpu "show -d properties /SYS/MB/P"
or use string trim (http://wiki.tcl.tk/10174):
send "[string trim ${showcpu}]${cpu}\r"
I would recommend to trim the white spaces on the place where the variable is set, not at the places where the variable is used.

Resources