Using GNU timeout with SSH -t in a bash script to prevent hanging - bash

I have a script that ssh's to some servers. Sometimes an unexpected problem causes ssh to hang indefinitely. I want to avoid this by killing ssh if it runs too long.
I'm also using a wrapper function for input redirection. I need to force a tty with the -t flag to make a process on the server happy.
function _redirect {
if [ "$DEBUG" -eq 0 ]; then
$* 1> /dev/null 2>&1
else
$*
fi
return $?
exit
}
SSH_CMD="ssh -t -o BatchMode=yes -l robot"
SERVER="192.168.1.2"
ssh_script=$(cat <<EOF
sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary
EOF
)
_redirect timeout 1m $SSH_CMD $SERVER "($ssh_script)"
The result is a timeout with this message printed:
tcsetattr: Interrupted system call
The expected result is either the output of the remote shell command, or a timeout and proper exit code.
when I type
timeout 1m ssh -t -o BatchMode=yes -o -l robot 192.168.1.2 \
"(sudo sudo flock -w 60 -n /path/to/lock -c /path/to/some_golang_binary)" \
1> /dev/null
I get the expected result.
I suspect these two things:
1)The interaction between GNU timeout and ssh is causing the tcsetattr system call to take a very long time (or hang), then timeout sends a SIGTERM to interrupt it and it prints that message. There is no other output because this call is one of the first things done. I wonder if timeout launches ssh in a child process that cannot have a terminal, then uses its main process to count time and kill its child.
I looked here for the reasons this call can fail.
2) _redirect needs a different one of $#, $*, "$#", "$*" etc. Some bad escaping/param munging breaks the arguments to timeout which causes this tcsetattr error. Trying various combinations of this has not yet solved the problem.

What fixed this was --foreground flag to timeout.

Related

Can't terminate command from a different process

I have a command "command1" that runs indefinitely (must be killed with Ctrl+c), and that at random intervals outputs new lines to stdout. My goal is to run it and see if it outputs a certain "target" line within 10 seconds. If the target output is generated, stop immediately with success, otherwise wait for the 10 seconds and fail.
I came up with this:
timeout 10 bash -c '(while read line; do [[ "$line" == "target" ]] && break; done < <(command1))'
It works, but the problem is that when a match is found, although the timeout command completes and returns successfully, command1 will continue to run indefinitely as a background process. I need it to stop as well when "break" is executed. If a match is not found, and the timeout expires, command1 is stopped correctly.
I also tried this:
timeout 10 bash -c '(command1 | while read line; do [[ "$line" == "target" ]] && exit; done)'
Which does not leave any spurious processes running. The problem is that the exit command does not terminate command1 since it is in a separate process, and the timeout always expires even if the target is found before.
I was exploring some alternative options, such as wait -n, but the same problem persists, and I must use bash 4.2, so wait -n isn't even an option.
Any suggestions would be greatly appreciated.
When command1 does not terminate itself, you can kill it manually.
By the way: Instead of while read ... you can use grep.
timeout 10 bash -c 'command1 | (grep -m1 -Fx "target"; pkill -P $PPID command1)'
-P $PPID ensures that only the command1 from this command is killed, and not some other command1 that might run in another shell at the same time.
This assumes that command1 is a single command, and not something like (cmd1; cmd2; ...). For that case, you could simply kill the whole bash process using kill $PPID.
Found what works best for my case:
timeout 10 bash -c 'grep -q -m1 "target" <(command1); pkill -P $!'
All processes terminate gracefully when either the target is found or the timeout expires. If found, command returns 0, if not found, command returns 124.
Thank you #Socowi for some very helpful hints that put me on the right track.

ssh timeout in shell script workaround

I have a script that loops through a list of hosts
/usr/local/bin/ssh -q -o "StrictHostKeyChecking=no" -o "BatchMode=yes" -n ${host} "/usr/bin/uptime" > /dev/null 2>&1
if [ $? -ne 0 ]
then
echo ${host}:FAILED
fi
fi
done
The problem is that when it comes across a host that is offline it hangs for some time.
I am running SSH Tectia Server 4.4.12 on i686-pc-linux-gnu
And the ConnectTimeout option is not available.
Any ideas for a workaround?
Thanks
Perform a task for a given length of time by using the & operator, backgrounding the process and killing its PID.:
/usr/local/bin/ssh -q -o "StrictHostKeyChecking=no" -o "BatchMode=yes" -n ${host} "/usr/bin/uptime" & PID="$!"
# Allow 5 seconds to elapse before killing the command
sleep 5
kill "$PID"

Terminal Application to Keep Web Server Process Alive

Is there an app that can, given a command and options, execute for the lifetime of the process and ping a given URL indefinitely on a specific interval?
If not, could this be done on the terminal as a bash script? I'm almost positive it's doable through terminal, but am not fluent enough to whip it up within a few minutes.
Found this post that has a portion of the solution, minus the ping bits. ping runs on linux, indefinitely; until it's actively killed. How would I kill it from bash after say, two pings?
General Script
As others have suggested, use this in pseudo code:
execute command and save PID
while PID is active, ping and sleep
exit
This results in following script:
#!/bin/bash
# execute command, use '&' at the end to run in background
<command here> &
# store pid
pid=$!
while ps | awk '{ print $1 }' | grep $pid; do
ping <address here>
sleep <timeout here in seconds>
done
Note that the stuff inside <> should be replaces with actual stuff. Be it a command or an ip address.
Break from Loop
To answer your second question, that depends in the loop. In the loop above, simply track the loop count using a variable. To do that, add a ((count++)) inside the loop. And do this: [[ $count -eq 2 ]] && break. Now the loop will break when we're pinging for a second time.
Something like this:
...
while ...; do
...
((count++))
[[ $count -eq 2 ]] && break
done
ping twice
To ping only a few times, use the -c option:
ping -c <count here> <address here>
Example:
ping -c 2 www.google.com
Use man ping for more information.
Better practice
As hek2mgl noted in a comment below, the current solution may not suffice to solve the problem. While answering the question, the core problem will still persist. To aid to that problem, a cron job is suggested in which a simple wget or curl http request is sent periodically. This results in a fairly easy script containing but one line:
#!/bin/bash
curl <address here> > /dev/null 2>&1
This script can be added as a cron job. Leave a comment if you desire more information how to set such a scheduled job. Special thanks to hek2mgl for analyzing the problem and suggesting a sound solution.
Say you want to start a download with wget and while it is running, ping the url:
wget http://example.com/large_file.tgz & #put in background
pid=$!
while kill -s 0 $pid #test if process is running
do
ping -c 1 127.0.0.1 #ping your adress once
sleep 5 #and sleep for 5 seconds
done
A nice little generic utility for this is Daemonize. Its relevant options:
Usage: daemonize [OPTIONS] path [arg] ...
-c <dir> # Set daemon's working directory to <dir>.
-E var=value # Pass environment setting to daemon. May appear multiple times.
-p <pidfile> # Save PID to <pidfile>.
-u <user> # Run daemon as user <user>. Requires invocation as root.
-l <lockfile> # Single-instance checking using lockfile <lockfile>.
Here's an example of starting/killing in use: flickd
To get more sophisticated, you could turn your ping script into a systemd service, now standard on many recent Linuxes.

how to timeout a linux script

coreutils timeout and other timeout script i searched, they apply for a CDM
but i'd like to apply timeout for a linux script, if not finished for a period. like:
cd XXX && CMD && sleep 3 && kill -0 XX
How to do it?
You can pass the spawning of a subshell to timeout, and have the subshell run the code that needs to be timed out:
#!/bin/bash
timeout 5 bash -c "ping google.com -c 2; ping yahoo.com -c 10"
If you clarify what you need exactly there may be cleaner ways to achieve this.

Starting a process over ssh using bash and then killing it on sigint

I want to start a couple of jobs on different machines using ssh. If the user then interrupts the main script I want to shut down all the jobs gracefully.
Here is a short example of what I'm trying to do:
#!/bin/bash
trap "aborted" SIGINT SIGTERM
aborted() {
kill -SIGTERM $bash2_pid
exit
}
ssh -t remote_machine /foo/bar.sh &
bash2_pid=$!
wait
However the bar.sh process is still running the remote machine. If I do the same commands in a terminal window it shuts down the process on the remote host.
Is there an easy way to make this happen when I run the bash script? Or do I need to make it log on to the remote machine, find the right process and kill it that way?
edit:
Seems like I have to go with option B, killing the remotescript through another ssh connection
So no I want to know how do I get the remotepid?
I've tried a something along the lines of :
remote_pid=$(ssh remote_machine '{ /foo/bar.sh & } ; echo $!')
This doesn't work since it blocks.
How do I wait for a variable to print and then "release" a subprocess?
It would definitely be preferable to keep your cleanup managed by the ssh that starts the process rather than moving in for the kill with a second ssh session later on.
When ssh is attached to your terminal; it behaves quite well. However, detach it from your terminal and it becomes (as you've noticed) a pain to signal or manage remote processes. You can shut down the link, but not the remote processes.
That leaves you with one option: Use the link as a way for the remote process to get notified that it needs to shut down. The cleanest way to do this is by using blocking I/O. Make the remote read input from ssh and when you want the process to shut down; send it some data so that the remote's reading operation unblocks and it can proceed with the cleanup:
command & read; kill $!
This is what we would want to run on the remote. We invoke our command that we want to run remotely; we read a line of text (blocks until we receive one) and when we're done, signal the command to terminate.
To send the signal from our local script to the remote, all we need to do now is send it a line of text. Unfortunately, Bash does not give you a lot of good options, here. At least, not if you want to be compatible with bash < 4.0.
With bash 4 we can use co-processes:
coproc ssh user#host 'command & read; kill $!'
trap 'echo >&"${COPROC[1]}"' EXIT
...
Now, when the local script exits (don't trap on INT, TERM, etc. Just EXIT) it sends a new line to the file in the second element of the COPROC array. That file is a pipe which is connected to ssh's stdin, effectively routing our line to ssh. The remote command reads the line, ends the read and kills the command.
Before bash 4 things get a bit harder since we don't have co-processes. In that case, we need to do the piping ourselves:
mkfifo /tmp/mysshcommand
ssh user#host 'command & read; kill $!' < /tmp/mysshcommand &
trap 'echo > /tmp/mysshcommand; rm /tmp/mysshcommand' EXIT
This should work in pretty much any bash version.
Try this:
ssh -tt host command </dev/null &
When you kill the local ssh process, the remote pty will close and SIGHUP will be sent to the remote process.
Referencing the answer by lhunath and https://unix.stackexchange.com/questions/71205/background-process-pipe-input I came up with this script
run.sh:
#/bin/bash
log="log"
eval "$#" \&
PID=$!
echo "running" "$#" "in PID $PID"> $log
{ (cat <&3 3<&- >/dev/null; kill $PID; echo "killed" >> $log) & } 3<&0
trap "echo EXIT >> $log" EXIT
wait $PID
The difference being that this version kills the process when the connection is closed, but also returns the exit code of the command when it runs to completion.
$ ssh localhost ./run.sh true; echo $?; cat log
0
running true in PID 19247
EXIT
$ ssh localhost ./run.sh false; echo $?; cat log
1
running false in PID 19298
EXIT
$ ssh localhost ./run.sh sleep 99; echo $?; cat log
^C130
running sleep 99 in PID 20499
killed
EXIT
$ ssh localhost ./run.sh sleep 2; echo $?; cat log
0
running sleep 2 in PID 20556
EXIT
For a one-liner:
ssh localhost "sleep 99 & PID=\$!; { (cat <&3 3<&- >/dev/null; kill \$PID) & } 3<&0; wait \$PID"
For convenience:
HUP_KILL="& PID=\$!; { (cat <&3 3<&- >/dev/null; kill \$PID) & } 3<&0; wait \$PID"
ssh localhost "sleep 99 $HUP_KILL"
Note: kill 0 may be preferred to kill $PID depending on the behavior needed with regard to spawned child processes. You can also kill -HUP or kill -INT if you desire.
Update:
A secondary job control channel is better than reading from stdin.
ssh -n -R9002:localhost:8001 -L8001:localhost:9001 localhost ./test.sh sleep 2
Set job control mode and monitor the job control channel:
set -m
trap "kill %1 %2 %3" EXIT
(sleep infinity | netcat -l 127.0.0.1 9001) &
(netcat -d 127.0.0.1 9002; kill -INT $$) &
"$#" &
wait %3
Finally, here's another approach and a reference to a bug filed on openssh:
https://bugzilla.mindrot.org/show_bug.cgi?id=396#c14
This is the best way I have found to do this. You want something on the server side that attempts to read stdin and then kills the process group when that fails, but you also want a stdin on the client side that blocks until the server side process is done and will not leave lingering processes like <(sleep infinity) might.
ssh localhost "sleep 99 < <(cat; kill -INT 0)" <&1
It doesn't actually seem to redirect stdout anywhere but it does function as a blocking input and avoids capturing keystrokes.
The solution for bash 3.2:
mkfifo /tmp/mysshcommand
ssh user#host 'command & read; kill $!' < /tmp/mysshcommand &
trap 'echo > /tmp/mysshcommand; rm /tmp/mysshcommand' EXIT
doesn't work. The ssh command is not on the ps list on the "client" machine. Only after I echo something into the pipe will it appear in the process list of the client machine. The process that appears on the "server" machine would just be the command itself, not the read/kill part.
Writing again into the pipe does not terminate the process.
So summarizing, I need to write into the pipe for the command to start up, and if I write again, it does not kill the remote command, as expected.
You may want to consider mounting the remote file system and run the script from the master box. For instance, if your kernel is compiled with fuse (can check with the following):
/sbin/lsmod | grep -i fuse
You can then mount the remote file system with the following command:
sshfs user#remote_system: mount_point
Now just run your script on the file located in mount_point.

Resources