ssh timeout in shell script workaround - shell

I have a script that loops through a list of hosts
/usr/local/bin/ssh -q -o "StrictHostKeyChecking=no" -o "BatchMode=yes" -n ${host} "/usr/bin/uptime" > /dev/null 2>&1
if [ $? -ne 0 ]
then
echo ${host}:FAILED
fi
fi
done
The problem is that when it comes across a host that is offline it hangs for some time.
I am running SSH Tectia Server 4.4.12 on i686-pc-linux-gnu
And the ConnectTimeout option is not available.
Any ideas for a workaround?
Thanks

Perform a task for a given length of time by using the & operator, backgrounding the process and killing its PID.:
/usr/local/bin/ssh -q -o "StrictHostKeyChecking=no" -o "BatchMode=yes" -n ${host} "/usr/bin/uptime" & PID="$!"
# Allow 5 seconds to elapse before killing the command
sleep 5
kill "$PID"

Related

How can i exit, stop, kill autossh if connection timed out, ip, port not exists or response without using ssh and sshd config files?

I run autossh in a script for remote port forwarding and i need to exit, kill, stop the script if connection timed out, ip, port not exists or response, without the using of the ssh, sshd config files, is this possible?
No answer, found on stacksites or the manpage of autossh.
Example 1:
myautossh script
#!/bin/bash
/usr/bin/autossh -NT -o "ExitOnForwardFailure=yes" -R 5555:localhost:443 -l user 1.1.1.1
if [ $? -eq 0 ]; then
echo "SUCCESS" >> errorlog
else
echo "FAIL" >> errorlog
fi
Example 2:
myautossh script
#!/bin/bash
/usr/bin/autossh -f -NT -M 0 -o "ServerAliveInterval=5" -o "ServerAliveCountMax=1" -o "ExitOnForwardFailure=yes" -R 5555:localhost:443 -l user 1.1.1.1 2>> errorlog
if [ $? -eq 0 ]; then
echo "SUCCESS" >> errorlog
else
echo "FAIL" >> errorlog
kill $(ps aux | grep [m]yautossh | awk '{print $2}')
fi
IP 1.1.1.1 not exists in my network so it get a connection timeout, but the script and autossh is still running, checked with:
ps aux | grep [m]yautossh
or
ps x | grep [a]utossh
Can only terminate the script with ctrl+c
I want to run autossh in a script, try to connect to a not existing ip or port and terminate, exit, kill the process of autossh to continue my script, without config ssh & sshd config files, only with the options/commands of autossh and the using of -f for background, is this possible?
the use of timeout with --preserve-status is what you need
timeout allows you to run a cmmand with a time limit
Preserving the Exit Status, timeout with --preserve-status returns 124 when the time limit is reached. Otherwise, it returns the exit status of the managed command
this will terminate the command after 2 seconds and returns the exit status of your command if not equal 0, command not success, you could not establish a successful conection
#!/bin/bash
timeout --preserve-status 2 /usr/bin/autossh -NT -o "ExitOnForwardFailure=yes" -R 33333:localhost:443 -l user 1.1.1.1
if [ $? -eq 0 ]; then
echo "Connection success"
else
echo "Connection fail"
fi
https://linuxize.com/post/timeout-command-in-linux/

Using GNU timeout with SSH -t in a bash script to prevent hanging

I have a script that ssh's to some servers. Sometimes an unexpected problem causes ssh to hang indefinitely. I want to avoid this by killing ssh if it runs too long.
I'm also using a wrapper function for input redirection. I need to force a tty with the -t flag to make a process on the server happy.
function _redirect {
if [ "$DEBUG" -eq 0 ]; then
$* 1> /dev/null 2>&1
else
$*
fi
return $?
exit
}
SSH_CMD="ssh -t -o BatchMode=yes -l robot"
SERVER="192.168.1.2"
ssh_script=$(cat <<EOF
sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary
EOF
)
_redirect timeout 1m $SSH_CMD $SERVER "($ssh_script)"
The result is a timeout with this message printed:
tcsetattr: Interrupted system call
The expected result is either the output of the remote shell command, or a timeout and proper exit code.
when I type
timeout 1m ssh -t -o BatchMode=yes -o -l robot 192.168.1.2 \
"(sudo sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary)" \
1> /dev/null
I get the expected result.
I suspect these two things:
1)The interaction between GNU timeout and ssh is causing the tcsetattr system call to take a very long time (or hang), then timeout sends a SIGTERM to interrupt it and it prints that message. There is no other output because this call is one of the first things done. I wonder if timeout launches ssh in a child process that cannot have a terminal, then uses its main process to count time and kill its child.
I looked here for the reasons this call can fail.
2) _redirect needs a different one of $#, $*, "$#", "$*" etc. Some bad escaping/param munging breaks the arguments to timeout which causes this tcsetattr error. Trying various combinations of this has not yet solved the problem.
What fixed this was --foreground flag to timeout.

Unable to kill remote processes with ssh

I need to kill remote processes with a shell script as follows:
#!/bin/bash
ip="172.24.63.41"
user="mag"
timeout 10s ssh -q $user#$ip exit
if [ $? -eq 124 ]
then
echo "can not connect to $ip, timeout out."
else
echo "connected, executing commands"
scp a.txt $user#$ip://home/mag
ssh -o ConnectTimeout=10 $user#$ip > /dev/null 2>&1 << remoteCmd
touch b.txt
jobPid=`jps -l | grep jobserver | awk '{print $1}'`
if [ ! $jobPid == "" ]; then
kill -9 $jobPid
fi
exit
remoteCmd
echo "commands executed."
fi
After executed it I found the scp and touch clauses had been executed, but the kill clause had not been executed successful and the process is still there. If I run clauses from "jobPid= ..." to "fi" on remote machine the process can be killed. How to fix it?
I put a script on the remote machine which can find and kill the process, then I ran the script on local machine which execute the script on the remote machine with ssh. The script is as follows:
Local script:
#!/bin/bash
ip="172.24.63.41"
user="mag"
timeout 10s ssh -q $user#$ip exit
if [ $? -eq 124 ]
then
echo "can not connect to $ip, timeout out."
else
echo "connected, executing commands"
ssh -q $user#$ip "/home/mag/local.sh"
echo "commands executed."
fi
remote script:
#!/bin/bash
jobPid=`jps -l | grep jobserver | awk '{print $1}'`
if [ ! $jobPid == "" ]; then
kill -9 $jobPid
fi
Your script needs root access (WHICH IS NEVER A GOOD IDEA). Or make sure your program which is running, is running under your webuser/group

nohup doesn't work when used with double-ampersand (&&) instead of semicolon (;)

I have a script that uses ssh to login to a remote machine, cd to a particular directory, and then start a daemon. The original script looks like this:
ssh server "cd /tmp/path ; nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
This script appears to work fine. However, it is not robust to the case when the user enters the wrong path so the cd fails. Because of the ;, this command will try to run the nohup command even if the cd fails.
The obvious fix doesn't work:
ssh server "cd /tmp/path && nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
that is, the SSH command does not return until the server is stopped. Putting nohup in front of the cd instead of in front of the java didn't work.
Can anyone help me fix this? Can you explain why this solution doesn't work? Thanks!
Edit: cbuckley suggests using sh -c, from which I derived:
ssh server "nohup sh -c 'cd /tmp/path && java server 0</dev/null 1>master_stdout 2>master_stderr' 2>/dev/null 1>/dev/null &"
However, now the exit code is always 0 when the cd fails; whereas if I do ssh server cd /failed/path then I get a real exit code. Suggestions?
See Bash's Operator Precedence.
The & is being attached to the whole statement because it has a higher precedence than &&. You don't need ssh to verify this. Just run this in your shell:
$ sleep 100 && echo yay &
[1] 19934
If the & were only attached to the echo yay, then your shell would sleep for 100 seconds and then report the background job. However, the entire sleep 100 && echo yay is backgrounded and you're given the job notification immediately. Running jobs will show it hanging out:
$ sleep 100 && echo yay &
[1] 20124
$ jobs
[1]+ Running sleep 100 && echo yay &
You can use parenthesis to create a subshell around echo yay &, giving you what you'd expect:
sleep 100 && ( echo yay & )
This would be similar to using bash -c to run echo yay &:
sleep 100 && bash -c "echo yay &"
Tossing these into an ssh, and we get:
# using parenthesis...
$ ssh localhost "cd / && (nohup sleep 100 >/dev/null </dev/null &)"
$ ps -ef | grep sleep
me 20136 1 0 16:48 ? 00:00:00 sleep 100
# and using `bash -c`
$ ssh localhost "cd / && bash -c 'nohup sleep 100 >/dev/null </dev/null &'"
$ ps -ef | grep sleep
me 20145 1 0 16:48 ? 00:00:00 sleep 100
Applying this to your command, and we get
ssh server "cd /tmp/path && (nohup java server 0</dev/null 1>server_stdout 2>server_stderr &)"
or:
ssh server "cd /tmp/path && bash -c 'nohup java server 0</dev/null 1>server_stdout 2>server_stderr &'"
Also, with regard to your comment on the post,
Right, sh -c always returns 0. E.g., sh -c exit 1 has error code
0"
this is incorrect. Directly from the manpage:
Bash's exit status is the exit status of the last command executed in
the script. If no commands are executed, the exit status is 0.
Indeed:
$ bash -c "true ; exit 1"
$ echo $?
1
$ bash -c "false ; exit 22"
$ echo $?
22
ssh server "test -d /tmp/path" && ssh server "nohup ... &"
Answer roundup:
Bad: Using sh -c to wrap the entire nohup command doesn't work for my purposes because it doesn't return error codes. (#cbuckley)
Okay: ssh <server> <cmd1> && ssh <server> <cmd2> works but is much slower (#joachim-nilsson)
Good: Create a shell script on <server> that runs the commands in succession and returns the correct error code.
The last is what I ended up using. I'd still be interested in learning why the original use-case doesn't work, if someone who understands shell internals can explain it to me!

Kill a process generated by for loop after certain time

I have some code that tends to hang randomly inside it's 'for loop'. I'm looking for a solution that will automatically kill the ssh session's PID if it exists for 5 seconds. I'm killing the hung processes right now manually, but I want to put this in cron so automatic PID killing would be awesome.
for host in `cat $WORKDIR/linux_hosts.txt $WORKDIR/aix_hosts.txt`
do
ssh -o LogLevel=QUIET -o ConnectTimeout=2 -t $host "cat /etc/passwd" >> $FILEDIR/$host
done
Thanks for the help!
Run all the ssh processes in the background, then wait 5 seconds. Once sleep returns, use jobs -p to get the process IDs of any background jobs still running, and kill them.
cat "$WORKDIR"/{linux_hosts.txt,aix_hosts.txt} | while read host; do
ssh -o LogLevel=QUIET -o ConnectTimeout=2 -t "$host" "cat /etc/passwd" >> "$FILEDIR/$host" &
done
sleep 5
kill $(jobs -p) 2>/dev/null
Use timeout:
for host in `cat $WORKDIR/linux_hosts.txt $WORKDIR/aix_hosts.txt`
do
timeout 5s ssh -o LogLevel=QUIET -o ConnectTimeout=2 -t $host \
"cat /etc/passwd" >> $FILEDIR/$host
done
This will find and kill all ssh process older than 5 minutes.
cd /proc
kill $(find $(pidof ssh) -maxdepth 0 -mmin +5)

Resources