nohup doesn't work when used with double-ampersand (&&) instead of semicolon (;) - bash

I have a script that uses ssh to login to a remote machine, cd to a particular directory, and then start a daemon. The original script looks like this:
ssh server "cd /tmp/path ; nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
This script appears to work fine. However, it is not robust to the case when the user enters the wrong path so the cd fails. Because of the ;, this command will try to run the nohup command even if the cd fails.
The obvious fix doesn't work:
ssh server "cd /tmp/path && nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
that is, the SSH command does not return until the server is stopped. Putting nohup in front of the cd instead of in front of the java didn't work.
Can anyone help me fix this? Can you explain why this solution doesn't work? Thanks!
Edit: cbuckley suggests using sh -c, from which I derived:
ssh server "nohup sh -c 'cd /tmp/path && java server 0</dev/null 1>master_stdout 2>master_stderr' 2>/dev/null 1>/dev/null &"
However, now the exit code is always 0 when the cd fails; whereas if I do ssh server cd /failed/path then I get a real exit code. Suggestions?

See Bash's Operator Precedence.
The & is being attached to the whole statement because it has a higher precedence than &&. You don't need ssh to verify this. Just run this in your shell:
$ sleep 100 && echo yay &
[1] 19934
If the & were only attached to the echo yay, then your shell would sleep for 100 seconds and then report the background job. However, the entire sleep 100 && echo yay is backgrounded and you're given the job notification immediately. Running jobs will show it hanging out:
$ sleep 100 && echo yay &
[1] 20124
$ jobs
[1]+ Running sleep 100 && echo yay &
You can use parenthesis to create a subshell around echo yay &, giving you what you'd expect:
sleep 100 && ( echo yay & )
This would be similar to using bash -c to run echo yay &:
sleep 100 && bash -c "echo yay &"
Tossing these into an ssh, and we get:
# using parenthesis...
$ ssh localhost "cd / && (nohup sleep 100 >/dev/null </dev/null &)"
$ ps -ef | grep sleep
me 20136 1 0 16:48 ? 00:00:00 sleep 100
# and using `bash -c`
$ ssh localhost "cd / && bash -c 'nohup sleep 100 >/dev/null </dev/null &'"
$ ps -ef | grep sleep
me 20145 1 0 16:48 ? 00:00:00 sleep 100
Applying this to your command, and we get
ssh server "cd /tmp/path && (nohup java server 0</dev/null 1>server_stdout 2>server_stderr &)"
or:
ssh server "cd /tmp/path && bash -c 'nohup java server 0</dev/null 1>server_stdout 2>server_stderr &'"
Also, with regard to your comment on the post,
Right, sh -c always returns 0. E.g., sh -c exit 1 has error code
0"
this is incorrect. Directly from the manpage:
Bash's exit status is the exit status of the last command executed in
the script. If no commands are executed, the exit status is 0.
Indeed:
$ bash -c "true ; exit 1"
$ echo $?
1
$ bash -c "false ; exit 22"
$ echo $?
22

ssh server "test -d /tmp/path" && ssh server "nohup ... &"

Answer roundup:
Bad: Using sh -c to wrap the entire nohup command doesn't work for my purposes because it doesn't return error codes. (#cbuckley)
Okay: ssh <server> <cmd1> && ssh <server> <cmd2> works but is much slower (#joachim-nilsson)
Good: Create a shell script on <server> that runs the commands in succession and returns the correct error code.
The last is what I ended up using. I'd still be interested in learning why the original use-case doesn't work, if someone who understands shell internals can explain it to me!

Related

Execute a script through ssh and store its pid in a file on the remote machine [duplicate]

This question already has answers here:
How to pass argument with exclamation mark on Linux?
(3 answers)
Closed 3 years ago.
I am not able to store any PID in a file on the remote machine when running a script in background through ssh.
I need to store the PID of the script process in a file in purpose to kill it whenever needed. When running the exact command on the remote machine it is working, why through ssh it is not working so ?
What is wrong with the following command:
ssh user#remote_machine "nohup ./script.sh > /dev/null 2>&1 & echo $! > ./pid.log"
Result: The file pid.log is created but empty.
Expected: The file pid.log should contain the PID of the running script.
Use
ssh user#remote_machine 'nohup ./script.sh > /dev/null 2>&1 & echo $! > ./pid.log'
OR
ssh user#remote_machine "nohup ./script.sh > /dev/null 2>&1 & echo \$! > ./pid.log"
Issue:
Your $! was getting expanded locally, before calling ssh at all.
Worse, before calling the ssh command, if there was a process stared in the background, then $! would have expanded to that and complete ssh command would have got expanded to contain that PID as argument to echo.
e.g.
$ ls &
[12342] <~~~~ This is the PID of ls
$ <~~~~ Prompt returns immediately because ls was stared in background.
myfile1 myfile2 <~~~~ Output of ls.
[1]+ Done ls
#### At this point, $! contains 12342
$ ssh user#remote "command & echo $! > pidfile"
# before even calling ssh, shell internally expands it to:
$ ssh user#remote "command & echo 12342 > pidfile"
And it will put the wrong PID in the pidfile.

Script stuck during read line when script is executed remotely

I want to have one script which starts a services in another server.
I have tested that the script works as expected in the server where the server is going to run.
This is the code which starts the service and monitors the log until it is in the startup process:
pkill -f "$1"
nohup java -jar -Dspring.profiles.active=$PROFILE $1 &
tail -n 0 -f nohup.out | while read LOGLINE
do
echo $LOGLINE
[[ "${LOGLINE}" == *"$L_LOG_STRING"* ]] && pkill -P $$ tail
done
This works fine as long as I execute that from that machine.
Now I want to call that script from another server:
#!/usr/bin/env bash
DESTINATION_SERVER=$1
ssh root#$DESTINATION_SERVER /bin/bash << EOF
echo "Restarting first service..."
/usr/local/starter.sh -s parameter
echo "Restarting second service..."
/usr/local/starter.sh -s parameter2
EOF
Well, everytime I try that the script of the remote server gets stuck in the "while READ" loop. But as I said, when I execute it locally from the server works fine, and in my "not simplified script" I´m not using any system variable or similar.
Update: I just tried to simplify the code even more with the following lines in the first scenario:
pkill -f "$1"
nohup java -jar -Dspring.profiles.active=$PROFILE $1 &
tail -n 0 -f nohup.out | sed "/$L_LOG_STRING/ q"
I'd say the problem is some how in the "|" through ssh, but I still can find why.
it seems that the problem comes from not having an interactive console when you execute the ssh command, therefore the nohup command behaves strangly.
I could solve it in two ways, outputing the code to the file explicitly:
"nohup java -jar -Dspring.profiles.active=test &1 >> nohup.out &"
instead of:
"nohup java -jar -Dspring.profiles.active=test &1&"
Or changing the way I access via ssh adding the tt option (just one did not work):
ssh -tt root#$DESTINATION_SERVER /bin/bash << EOF
But this last solution could lead to other problems with some character, so unless someone suggests another solution that is my patch which makes it work.

shell forcing demonized program to start without output

I'm trying to run a script that is required to have an exit code of 0. Unfortunalty I cannot use an init.d or other startup script to control this this, so I must make this work.
Basically if I understand AWS's docs correctly (elastic beanstalk), I need be able to run the following two commands and exit with a 0 and provide no other output to stdout.
As the root user I need to cd to a particular dir and run these two commands:
pkill -f que
bundle exec que
In my actually script I have:
#!/usr/bin/env bash
su -s /bin/bash -c "cd /some/dir && nohup pkill -f que &>/dev/null &"
sleep 10
su -s /bin/bash -c "cd /some/dir && nohup bundle exec que &"
Which still causes this error to be raised:
returned non-zero exit status 1 (Executor::NonZeroExitStatus)
Any tips for how to silently run those commands correctly?
I'm also looking at these for ideas:
https://blog.eq8.eu/article/aws-elasticbeanstalk-hooks.html
http://www.dannemanne.com/posts/post-deployment_script_on_elastic_beanstalk_restart_delayed_job
But its still not clear to me how this is supposed to exit successfully
Perhaps I'm missing something, but wouldn't this be easily solved by using two shell scripts? One with cd, pkill, and bundle. Call this script (foo.sh) something like:
#!/usr/bin/env bash
su -c ./foo.sh > /dev/null 2>&1 < /dev/null
exit 0

bash get exitcode of su script execution

I have a shell script when need to run as a particular user. So I call that script as below,
su - testuser -c "/root/check_package.sh | tee -a /var/log/check_package.log"
So after this when I check the last execution exitcode it returns always 0 only even if that script fails.
I tried something below also which didn't help,
su - testuser -c "/root/check_package.sh | tee -a /var/log/check_package.log && echo $? || echo $?"
Is there way to get the exitcode of command whatever running through su.
The problem here is not su, but tee: By default, the shell exits with the exit status of the last pipeline component; in your code, that component is not check_package.sh, but instead is tee.
If your /bin/sh is provided by bash (as opposed to ash, dash, or another POSIX-baseline shell), use set -o pipefail to cause the entirely pipeline to fail if any component of it does:
su - testuser -c "set -o pipefail; /root/check_package.sh | tee -a /var/log/check_package.log"
Alternately, you can do the tee out-of-band with redirection to a process substitution (though this requires your current user to have permission to write to check_package.log):
su - testuser -c "/root/check_package.sh" > >(tee -a /var/log/check_package.log
Both su and sudo exit with the exit status of the command they execute (if authentication succeeded):
$ sudo false; echo $?
1
$ su -c false; echo $?
1
Your problem is that the command pipeline that su runs is a pipeline. The exit status of your pipeline is that of the tee command (which succeeds), but what you really want is that of the first command in the pipeline.
If your shell is bash, you have a couple of options:
set -o pipefail before your pipeline, which will make it return the rightmost failure value of all the commands if any of them fail
Examine the specific member of the PIPESTATUS array variable - this can give you the exit status of the first command whether or not tee succeeds.
Examples:
$ sudo bash -c "false | tee -a /dev/null"; echo $?
0
$ sudo bash -c "set -o pipefail; false | tee -a /dev/null"; echo $?
1
$ sudo bash -c 'false | tee -a /dev/null; exit ${PIPESTATUS[0]}'; echo $?
1
You will get similar results using su -c, if your system shell (in /bin/sh) is Bash. If not, then you'd need to explicitly invoke bash, at which point sudo is clearly simpler.
I was facing a similar issue today, in case the topic is still open here my solution, otherwise just ignore it...
I wrote a bash script (let's say my_script.sh) which looks more or less like this:
### FUNCTIONS ###
<all functions listed in the main script which do what I want...>
### MAIN SCRIPT ### calls the functions defined in the section above
main_script() {
log_message "START" 0
check_env
check_data
create_package
tar_package
zip_package
log_message "END" 0
}
main_script |tee -a ${var_log} # executes script and writes info into log file
var_sta=${PIPESTATUS[0]} # captures status of pipeline
exit ${var_sta} # exits with value of status
It works when you call the script directly or in sudo mode

bash && operator prevents backgrounding over ssh

After trying to figure out why a Capistrano task (which tried to start a daemon in the background) was hanging, I discovered that using && in bash over ssh prevents a subsequent program from running in the background. I tried it on bash 4.1.5 and 4.2.20.
The following will hang (i.e. wait for sleep to finish) in bash:
ssh localhost "cd /tmp && nohup sleep 10 >/dev/null 2>&1 &"
The following won't:
ssh localhost "cd /tmp ; nohup sleep 10 >/dev/null 2>&1 &"
Neither will this:
cd /tmp && nohup sleep 10 >/dev/null 2>&1 &
Both zsh and dash will execute it in the background in all cases, regardless of && and ssh. Is this normal/expected behavior for bash, or a bug?
One easy solution is to use:
ssh localhost "(cd /tmp && nohup sleep 10) >/dev/null 2>&1 &"
(this also works if you use braces, see second example below).
I did not experiment further but I am reasonably convinced it has to do with open file descriptors hanging around. Perhaps zsh and dash bind the && so that this means what has to be spelled as:
{ cd /tmp && nohup sleep 10; } >/dev/null 2>&1
in bash.Nope, quick experiment in dash shows that echo foo && echo bar >file only redirects the latter. Still, it has to have something to do with lingering open fd's causing ssh to wait for more output; I've run into this a lot in the past.
One more trick, not needed if you use the parentheses or braces for this particular case but might be useful in a more general context, where the set of commands to do with && are more complex. Since bash seems to be hanging on to the file descriptor inappropriately with && but not with ;, you can turn a && b && c into a || exit 1; b || exit 1; c. This works with the test case:
ssh localhost "true || exit 1; echo going on; nohup sleep 10 >/dev/null 2>&1 &"
Replace true with false and the echo of "going on" is omitted.
(You can also set -e, although sometimes that is a bigger hammer than desired.)
This seems to work:
ssh localhost "(exec 0>&- ; exec 1>&-; exec 2>&-; cd /tmp; sleep 20&)"

Resources