Ping on shell scripts: Some packet loss, but error code $? equals to zero. How can I detect? - shell

Sometimes my DSL router fails in this strange manner:
luis#balanceador:~$ sudo ping 8.8.8.8 -I eth9
[sudo] password for luis:
PING 8.8.8.8 (8.8.8.8) from 192.168.3.100 eth9: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=47 time=69.3 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=3 ttl=47 time=68.0 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=5 ttl=47 time=68.9 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=47 time=67.2 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=8 ttl=47 time=67.2 ms
^C
--- 8.8.8.8 ping statistics ---
8 packets transmitted, 5 received, 37% packet loss, time 7012ms
rtt min/avg/max/mdev = 67.254/68.183/69.391/0.906 ms
luis#balanceador:~$ echo $?
0
As can be seen, error code $? is 0. So I can not simply detect if the command failed, as the output yields no error for any script.
What is the proper way to detect that there were some packet loss?
Do I need to parse the output with grep or there is some simpler method?

According to the man page, by default (on Linux), if ping does not receive any reply packets at all, it will exit with code 1. But if a packet count (-c) and deadline timeout (-w, seconds) are both specified, and fewer packets before timeout are received, it will also exit with code 1. On other errors it exits with code 2.
ping 8.8.8.8 -I eth9 -c 3 -w 3
So, the error code will be set if 3 packets are not received within 3 seconds.
As #mklement0 noted, ping on BSD behaves in a bit different way:
The ping utility exits with one of the following values:
0 - at least one response was heard from the specified host.
2 - the transmission was successful but no responses were received.
So, in this case one should try workaround it with sending one by one in a loop
ip=8.8.8.8
count=3
for i in $(seq ${count}); do
ping ${ip} -I eth9 -c 1
if [ $? -eq 2 ]; then
## break and retransmit exit code
exit 2
fi
done
Of course, if you need full statistics, just count codes "2" and "0" to some variables and print result / set error code after for loop if you need.

Related

How to make script in bash aware that a server is still busy installing/configuring and wait for reboot?

The issue / dilemma
I am currently busy creating a script to kickstart servers (with CentOS 6.x and CentOS 7.x) remotely. So far the script is working, but hangs on one minor thing. Well actually it does not hang, but it does not give detailed information about what is happening. In other words, I am not getting the correct information back in bash about the job being finished correctly.
I have tried various things, however it's hanging with the following message (which is being repeated endlessly):
servername is still installing and configuring packages...
PING 100.125.150.175 (100.125.150.175) 56(84) bytes of data.
64 bytes from 100.125.150.175: icmp_seq=1 ttl=63 time=0.152 ms
64 bytes from 100.125.150.175: icmp_seq=2 ttl=63 time=0.157 ms
64 bytes from 100.125.150.175: icmp_seq=3 ttl=63 time=0.157 ms
64 bytes from 100.125.150.175: icmp_seq=4 ttl=63 time=0.143 ms
64 bytes from 100.125.150.175: icmp_seq=5 ttl=63 time=0.182 ms
--- 100.125.150.175 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 120025ms
rtt min/avg/max/mdev = 0.143/0.158/0.182/0.015 ms
servername is still installing and configuring packages...
PING 100.125.150.175 (100.125.150.175) 56(84) bytes of data.
64 bytes from 100.125.150.175: icmp_seq=1 ttl=63 time=0.153 ms
64 bytes from 100.125.150.175: icmp_seq=2 ttl=63 time=0.132 ms
64 bytes from 100.125.150.175: icmp_seq=3 ttl=63 time=0.142 ms
etc....
So for some reason it does not contine to the next line of code or does the next action. Since it's only feedback to me (or another user), it's not a majorissue. But it would be nice to get this functional and providing (detailed) information back about the current progress or what the script/server is actually doing at the moment. This is not the case for the above (last) piece of code unfortunately.
This is the current code snippet I have (yes, it's a mess):
while true;
do
#ping -c3 -i3 $HWNODEIP > /dev/null
#ping -c5 -i30 $HWNODEIP > /dev/null
ping -c5 -i30 $HWNODEIP
if [ $? -eq 1 ] || [ $? -eq 2 ] || [ $? -eq 68 ]
then
echo -e " "
echo -e "Kickstart part II also done. $HOSTNAME will be rebooted one more time."
sleep 5
######return 0
echo -e " "
printf "%s" "Waiting for $HOSTNAME to come back online: "
while ! ping -c 1 -n -w 30 $HWNODEIP &> /dev/null
do
printf "%c" "."
#sleep 10
done
echo -e " "
echo -e "Reboot is done and $HOSTNAME is back online. Performing final check. Please wait..."
sleep 10
echo -e " "
sudo /usr/local/collectHWdata.pl $HWNODEIP
ssh root#$HWNODEIP "while ! test -e /root/kickstart-DONE; do sleep 3; done; echo KICKSTART IS DONE\!"
echo -e " "
exit
else
echo -e " "
echo -e "$HOSTNAME is still installing and configuring packages..."
fi
done
Sidenote: I removed > /dev/null #5 for debugging (not that it helped)
I am guessing I am using things incorrectly and I am by no means a experienced scripter; I can only do minor stuff, but ofcourse I am doing my best. I have been fooling around with this since last week and still no result on this part.
What am I trying to achieve?
The server is rebooted after the selected CentOS version, creating partitions and setting up the network. This all works. The above snippet is after that reboot. Now it will install packages I selected, configure various things (like Nagios) and install/compile certain PERL modules. And a few other minor things.
This is done correctly in the background. I wanted to make the script (the above piece of code) that the server is still busy with installing things and such. Since I lack the knowledge to do that, I decided for a different approach; check if the server is online (in other words that it's still installing). As long as the server is online, it's still installing/configuring things obviously. After that is done, the server will reboot once more to perform the final 2 commands (as seen in my snippet). However (here is the problem) it never does those commands, though the kickstart is completely done.
So I am guessing I am doing something wrong and even might messed up things (or got confused by doing so). Maybe someone has an idea, solution or a completely different approach to tackle and fix this problem (or at least I hope so).
Other things I have tried so far? Well I tried a various of ping commands and I also tried nc (netcat) but also without a good result. I every single time hit a brick wall with the last 2 commands and it keeps pinging instead of showing that the kickstart was done... I think I have spend several hours (since last week) on this already without getting anywhere.
So I am hoping someone can take a look at this and tell me what I am doing wrong and maybe there is a better approach (other than pinging a server) to see if it's still busy. Maybe a (remote) check on yum, perl or a service, so that the script knows it's still busy.
Sorry for the long post, but I know when I provide as much information as possible including code examples and results, this is more "appreciated". So I am hoping I provided adequate information. If not, let me know. I will try to add as much information as I can. As always I am always willing to learn or change my approach.
Thank you already for reading my post!
As noted in the comments under the question:
The server may already be rebooted by the time ping -c5 -i30 $HWNODEIP finishes. The command sends 5 packets (-c flag), waiting 30 seconds between each packet (-i interval flag). So thats's 5*30 = 150 seconds, which is a bit more than 2 minutes. A server could reboot just fine within 2 minutes, especially if there's SSD in use. So try lowering the total time it would take this command to complete.
[ $? -eq 68 ] is probably unnecessary. $HWNODEIP is just ip address, and exit code 68 is for domain name not being resolved, which doesn't apply to IP addresses.
The if statement could be simplified to
if ! ping -c5 -i30 "$HWNODEIP"
These are minor suggestions,probably not bulletproof. As confirmed by OP in the comments, lowering interval helps. There's other small improvements that could be done (like quoting variables), but that's outside the scope of the question, so I'll leave it for now.

ssh exec a simple command cost a few seconds

I find it costs one more seconds that ssh exec a simple command, does it normal? if not, how to speed up it?
[root#ops-test-vm-154:~]# time ssh root#10.17.1.155 'echo "hello,world!"'
hello,world!
real 0m1.805s
user 0m0.009s
sys 0m0.005s
there is low latency between vm-154 and vm-155
[root#ops-test-vm-154:~]# ping 10.17.1.155
PING 10.17.1.155 (10.17.1.155) 56(84) bytes of data.
64 bytes from 10.17.1.155: icmp_seq=1 ttl=64 time=0.142 ms
64 bytes from 10.17.1.155: icmp_seq=2 ttl=64 time=0.136 ms
64 bytes from 10.17.1.155: icmp_seq=3 ttl=64 time=0.129 ms
64 bytes from 10.17.1.155: icmp_seq=4 ttl=64 time=0.110 ms
^C
--- 10.17.1.155 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4421ms
rtt min/avg/max/mdev = 0.110/0.128/0.142/0.014 ms
BTW: I need check service status real time by executing a script in vm-155, so vm-154 execute command ssh vm-155 status.sh every second. But even a simple command echo helloworld cost one more second. So the solution is terrible. I hope speed up it, or may be a better solution.
Best Wishes!
There is vm-155 /etc/ssh/sshd_config, I add UseDNS no and execute service sshd restart, but still need one more second to echo hello,world!
Protocol 2
SyslogFacility AUTHPRIV
PasswordAuthentication yes
ChallengeResponseAuthentication no
GSSAPIAuthentication yes
GSSAPICleanupCredentials yes
UsePAM yes
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
AcceptEnv XMODIFIERS
X11Forwarding yes
UseDNS no
Subsystem sftp /usr/libexec/openssh/sftp-server
One thing that you could try is to run SSH in verbose mode and see at which stage it wastes the most time.
ssh -vvv root#10.17.1.155 'echo "hello,world!"'
And then based on your findings adopt your ssh config file to exclude slow cipher suites and other CPU intensive things. Some tips about that here.
However, you will not be able to achieve close to real-time performance over ssh if you establish a new connection every time. You could put your script/command into a loop and set seep value to 1s.
ssh root#10.17.1.155 'while true; do echo "Hello, world!"; sleep 1s; done'
But I would use something that is designed for such application like SNMP protocol. Here is an example configuration:
https://www.incredigeek.com/home/snmp-and-shell-script/
One source of delays during the SSH connection process is DNS lookups by the server. When a client connects to the server, the server can optionally look up the IP address of the client to get its hostname. Depending on a variety of issues, the query may take anywhere from a fraction of a second to ten seconds or more to complete.
The most widely deployed SSH server is OpenSSH. The OpenSSH sshd server has a setting named UseDNS which controls whether it performs DNS queries on incoming connections or not:
UseDNS
Specifies whether sshd(8) should look up the remote host name, and to check that the resolved host name for the remote IP address maps back to the very same IP address.
You should check that UseDNS is set to "no" on the server which you're connecting to.

How can I specify which protocol to use (IPv4 or IPv6) when pinging a website (bash)?

I currently have a shell script which simply takes a URL as an argument and then sends a ping request to it as follows:
ping -c 5 $1
It is required of me to ping to the site using IPv4 and IPv6 where possible, I will then compare results. I have read the man page of ping and cannot see a flag which specifies which protocol to use, I was expecting it to accept a flag -4 for IPv4 and -6 for IPv6 but this does not seem to be the case.
I came across the DNS lookup utility dig which looks promising but have not managed to implement it in my code. My script must take a URL as an argument and no other arguments. I hope this is clear and thanks for your help.
Use ping and ping6 that are available in most distributions.
/tmp $ dig google.com A google.com AAAA +short
172.217.4.174
2607:f8b0:4007:801::200e
/tmp $ ping -c 2 172.217.4.174
PING 172.217.4.174 (172.217.4.174): 56 data bytes
64 bytes from 172.217.4.174: icmp_seq=0 ttl=53 time=35.619 ms
64 bytes from 172.217.4.174: icmp_seq=1 ttl=53 time=34.220 ms
/tmp $ ping6 -c 2 2607:f8b0:4007:801::200e
PING6(56=40+8+8 bytes) 2602:306:b826:68a0:f40e:abca:efdb:71f --> 2607:f8b0:4007:801::200e
16 bytes from 2607:f8b0:4007:801::200e, icmp_seq=0 hlim=55 time=77.735 ms
16 bytes from 2607:f8b0:4007:801::200e, icmp_seq=1 hlim=55 time=81.518 ms

Kill system process in Thread ruby

How I can kill ping (or other VERY LONGEST without timeout and etc system process)(ping - it's just simple example) in ruby Thread:
a = Thread.new do
system 'ping localhost'
end
a.kill
a.exit
a.terminate
while true
sleep 5
p a.alive?
end
Output:=>
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_req=1 ttl=64 time=0.023 ms
....
true
64 bytes from localhost.localdomain (127.0.0.1): icmp_req=7 ttl=64 time=0.022 ms
.....
true
......
So I need stop ping process with Thread, but i don't know how to do it.
system does not give you pid.
Use Process::spawn instead. And use Process::kill to kill the process using the pid returned by Process::spawn.
For example:
pid = Process.spawn('ping localhost')
sleep 3
Process.kill(:TERM, pid)
Process.wait(pid)

Windows 7 ping general failure

I'm trying to understand the behaviour of ping command. Trying to experiment on a windows 7 PC.
On the command prompt, I issued the following command:
ping <some hostname> -l 4096
The output I get is
Pinging <some hostname> [xx.xx.xxx.xx] with 4096 bytes of data:
General failure.
General failure.
General failure.
General failure.
Ping statistics for xx.xx.xxx.xx:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
However, ping <same hostname> -l 32 works just fine.
So my question is why is the server behaving differently for different packet sizes? Is it related to thwart? Or is that my local ping program is configured by default in such a way so as to not sent bigger packets?
Note that -l flag lets you specify the ping req's buffer size.
Your ping packet is probably larger than the local media's MTU, and it's on a network type where fragmentation isn't allowed. Ethernet IPv6 would be one such configuration.

Resources