How to check remote servers can ping each other using a script? - bash

I need to write a script which checks whether 3 servers can ping each other. I run the script at my local linux host.
Here is what I plan to do:
ssh root#10.238.155.155 "ping -c 1 10.20.77.1"
echo $?
0
In the above example, 10.238.155.155 is one server, the command login to this server and ping 10.20.77.1 which is an interface at another server.
Then I check the command return value, $?, if it is 0, then it means ping is good.
ssh root#10.238.155.155 "ping -c 1 10.20.77.9"
echo $?
1
In this example, 10.20.77.9 does not exist, we can see $? is 1.
My script basically repeats running SSH login to each server, ping other servers and checks $?.
Do you think this is a reliable solution?

With echo $? you are checking the return of ssh, which is not what you want.
Try capturing the output of the compound ssh command in a variable, then parsing that variable as needed
myvar=$(ssh root#place.com "ping -c 5 server1.place.com")

Related

Check whether ssh is possible inside shell a script?

I need to print a count of a remote server which is written on '/REMOTE_DIR/DR_count'. But that remote server is not much reliable due to a network or OS failure. However I need to print the DR_count value from the local machine if that remote machine is not available. Here is my logic. Please correct me to how to check that if condition in correct way. I'm running this script on Solaris 11.3.
#!/bin/sh
if[check wether ssh user#host_name is possible]
then
op="cat /REMOTE_DIR/DR_count"
cmd="ssh user#host_name $op"
drlog=`$cmd`
else
drlog='cat /LOCAL_DIR/DR_count'
fi
echo $drlog
As I said in my comment, I would simply try to ssh, and if use its exit code to see whether it worked:
ssh -o ConnectTimeout=5 user#host_name cat /REMOTE_DIR/DR_count 2>/dev/null || cat /LOCAL_DIR/DR_count
you should check for exit code 255 to detect whether you have network error or not
#!/bin/bash
#EXIT STATUS
# ssh exits with the exit status of the remote command or with 255 if an error occurred.
cnt=`ssh -o ConnectTimeout=5 root#$host "cat /REMOTE_DIR/DR_count"`
exit_code=$?
if [ $exit_code -eq 255 ]; then
cnt=`cat /LOCAL_DIR/DR_count`
fi
it makes sense also to check for other (non 0 / 255) exit codes to check possible issues at remote side (like file missing on remote side)

Optimistic way to test port before executing sftp

I have a bash script which is doing very plain sftp to transfer data to production and uat servers. See my code below.
if [ `ls -1 ${inputPath}|wc -l` -gt 0 ]; then
sh -x wipprod.sh >> ${sftpProdLog}
sh -x wipdev.sh >> ${sftpDevLog}
sh -x wipdevone.sh >> ${sftpDevoneLog}
fi
sometimes the UAT server may go down. In those cases the number of scripts hanged are getting increased. If it reaches the user max. number of process the other scripts also getting affected. So I am thinking before executing each of the above script i have to test the port 22 availability on the destination server. Then I can execute the script.
Is this the right way? If yes what is the optimistic way to do that? If no what else is the best approach to avoid unnecessary sftp connection when destination not available? Thanks in advance.
Use sftp in batch mode together with ConnectTimeout option explicitely set. Sftp will take care about up/down problems by itself.
Note, that ConnectTimeout should be slightly higher if your network is slow.
Then put sftp commands into your wip*.sh backup scripts.
If UAT host is up:
[localuser#localhost tmp]$ sftp -b - -o ConnectTimeout=1 remoteuser#this_host_is_up <<<"put testfile.xml /tmp/"; echo $?
sftp> put testfile.xml /tmp/
Uploading testfile.xml to /tmp/testfile.xml
0
File is uploaded, sftp exits with exit code 0.
If UAT host is down, sftp exits wihin 1 second with exit code 255.
[localuser#localhost tmp]$ sftp -b - -o ConnectTimeout=1 remoteuser#this_host_is_down <<<"put testfile.xml /tmp/"; echo $?
ssh: connect to host this_host_is_down port 22: Connection timed out
Couldn't read packet: Connection reset by peer
255
It sounds reasonable - if the server is inaccessible you want to immediately report an error and not try and block.
The question is - why does the SFTP command block if the server is unavailable? If the server is down, then I'd expect the port open to fail almost immediately and you need only detect that the SFTP copy has failed and abort early.
If you want to detect a closed port in bash, you can simply as bash to connect to it directly - for example:
(echo "" > /dev/tcp/remote-host/22) 2>/dev/null || echo "failed"
This will open the port and immediately close it, and report a failure if the port is closed.
On the other hand, if the server is inaccessible because the port is blocked (in a firewall, or something, that drops all packets), then it makes sense for your process to hang and the base TCP test above will also hang.
Again this is something that should probably be handled by your SFTP remote copy using a timeout parameter, as suggested in the comments, but a bash script to detect blocked port is also doable and will probably look something like this:
(
(echo "" > /dev/tcp/remote-host/22) &
pid=$!
timeout=3
while kill -0 $pid 2>/dev/null; do
sleep 1
timeout=$(( $timeout - 1 ))
[ "$timeout" -le 0 ] && kill $pid && exit 1
done
) || echo "failed"
(I'm going to ignore the ls ...|wc business, other than to say something like find and xargs --no-run-if-empty are generally more robust if you have GNU find, or possibly AIX has an equivalent.)
You can perform a runtime connectivity check, OpenSSH comes with ssh-keyscan to quickly probe an SSH server port and dump the public key(s), but sadly it doesn't provide a usable exit code, leaving parsing the output as a messy solution.
Instead you can do a basic check with a bash one-liner:
read -t 2 banner < /dev/tcp/127.0.0.1/22
where /dev/tcp/127.0.0.1/22 (or /dev/tcp/hostname/ssh) indicates the host and port to connect to.
This relies on the fact that the SSH server will return an identifying banner terminated with CRLF. Feel free to inspect $banner. If it fails after the indicated timeout read will receive SIGALARM (exit code 142), and connection refused will result in exit code 1.
(Support for /dev/tcp and network redirection is enabled by default since before bash-2.05, though it can be disabled explicitly with --disable-net-redirections or with --enable-minimal-config at build time.)
To prevent such problems, an alternative is to set a timeout: with any of the ssh, scp or sftp commands you can set a connection timeout with the option -o ConnectTimeout=15, or, implicitly via ~/.ssh/config:
Host 1.2.3.4 myserver1
ConnectionTimeout 15
The commands will return non-zero on timeout (though the three commands may not all return the same exit code on timeout). See this related question: how to make SSH command execution to timeout
Finally, if you have GNU parallel you may use its sem command to limit concurrency to prevent this kind of problem, see https://unix.stackexchange.com/questions/168978/limit-maximum-number-of-concurrent-scp-processes-running-on-a-host .

Bash Script Quits After Exiting SSH

I'm trying to write a Bash script that logs into 2 different linux based power-strips (Ubiquiti Mpower Pros) and turns 2 different lights off (one on each strip). To do this I login to the 1st strip, change the appropriate file to 0 (thus turning off the light), and exit, repeating the same process on the next power-strip. However, after I exit the first SSH connection, the script stops working. Could someone please suggest a fix? My only idea would be to encase this script in a python program. Here's my code:
#!/bin/bash
ssh User#192.168.0.100
echo "0" > /proc/power/relay1
exit
# hits the enter key
cat <(echo "") | <command>
ssh User#192.168.0.103
echo "logged in"
echo "0" > /proc/power/relay1
exit
cat <(echo "") | <command>
ssh as an app BLOCKS while it's running, the echo and exit are executed by the local shell, not by the remote machine. so you are doing:
ssh to remote machine
exit remote shell
echo locally
exit locally
and boom, your script is dead. If that echo/exit is supposed to be run on the remote system, then you should be doing:
ssh user#host command
^^^^^---executed on the remote machine
e.g.
ssh foo#bar 'echo ... ; exit'
The commands you're apparently trying to run through ssh are actually being executed locally. You can just pass the command you want to run to ssh and it will do it (without needing an explicit exit)
ssh User#192.168.0.110 'echo "0" > /proc/power/relay1'
will do that, and similar for the other ssh command

How to write a script to detect the ssh connection to a series of remote machine?

I am working on a HPC, there are many nodes on it. Using Interactive qsub, I can logon one of these nodes. And when doing parallel computing, I have to make sure how many nodes are currently connectable, and configure my program. Because nodes are often broke down.
for example, the node name are bh001,bh002,bh003,.....and
ssh bh001
will logon bh001 node.
So How to write a script to detect the ssh connection to this series of nodes? I want the script give a list of currently connectable nodes as a txt file.
You could do something like this:
ping -c1 $server &>/dev/null && echo $server
That is, try to send 1 ping to $server, and if successful, print it, otherwise print nothing.
I could be wrong, but I have a feeling your system must have a standard way to get the list of nodes that are alive. Look in your manuals. It's an obvious feature, it must exist.
We are using 'nc' command to check ssh port for our centos/redhat base rocks clusters.
Normally nc package is available in DVD or default repository.
#!/bin/bash
IP=192.168.56.1
PORT=22
nc -z $IP $PORT &> /dev/null
if [ $? -eq 0 ];then
echo "$IP is connected"
## Do stuff for success
else
echo "$IP is unable to connected"
## Do stuff for fail
fi

Telnet inside a shell script

How can I run telnet inside a shell script and execute commands on the remote server?
I do not have expect installed on my solaris machine because of security reasons.
I also do not have the perl net::telnet module installed.
So with out using expect and perl how can I do it?
I tried the below thing but its not working.
#!/usr/bin/sh
telnet 172.16.69.116 <<!
user
password
ls
exit
!
When I execute it, this is what I am getting:
> cat tel.sh
telnet 172.16.69.116 <<EOF
xxxxxx
xxxxxxxxx
ls
exit
EOF
> tel.sh
Trying 172.16.69.116...
Connected to 172.16.69.116.
Escape character is '^]'.
Connection to 172.16.69.116 closed by foreign host.
>
Some of your commands might be discarded. You can achieve finer control with ordinary script constructs and then send required commands through a pipe with echo. Group the list of commands to make one "session":-
{
sleep 5
echo user
sleep 3
echo password
sleep 3
echo ls
sleep 5
echo exit
} | telnet 172.16.65.209
I had the same issue...however, at least in my environment it turned out being the SSL Certificate on the destination server was corrupted in some way and the server team took care of the issue.
Now, what I'm trying to do is to figure out how to get a script to run the exact same thing you're doing above except I want it to dump out the exact same scenario above into a file and then when it encounters a server in which it actually connects, I want it to provide the escape character (^]) and go on to the next server.

Resources