Execute a script through ssh and store its pid in a file on the remote machine [duplicate] - bash

This question already has answers here:
How to pass argument with exclamation mark on Linux?
(3 answers)
Closed 3 years ago.
I am not able to store any PID in a file on the remote machine when running a script in background through ssh.
I need to store the PID of the script process in a file in purpose to kill it whenever needed. When running the exact command on the remote machine it is working, why through ssh it is not working so ?
What is wrong with the following command:
ssh user#remote_machine "nohup ./script.sh > /dev/null 2>&1 & echo $! > ./pid.log"
Result: The file pid.log is created but empty.
Expected: The file pid.log should contain the PID of the running script.

Use
ssh user#remote_machine 'nohup ./script.sh > /dev/null 2>&1 & echo $! > ./pid.log'
OR
ssh user#remote_machine "nohup ./script.sh > /dev/null 2>&1 & echo \$! > ./pid.log"
Issue:
Your $! was getting expanded locally, before calling ssh at all.
Worse, before calling the ssh command, if there was a process stared in the background, then $! would have expanded to that and complete ssh command would have got expanded to contain that PID as argument to echo.
e.g.
$ ls &
[12342] <~~~~ This is the PID of ls
$ <~~~~ Prompt returns immediately because ls was stared in background.
myfile1 myfile2 <~~~~ Output of ls.
[1]+ Done ls
#### At this point, $! contains 12342
$ ssh user#remote "command & echo $! > pidfile"
# before even calling ssh, shell internally expands it to:
$ ssh user#remote "command & echo 12342 > pidfile"
And it will put the wrong PID in the pidfile.

Related

pgrep in bash script not working correctly

Here is the piece of code from shell script that is causing the problem.
LOG_FILE="/home/sample.log"
PID_FILE="/home/sample.pid"
sudo -u user1 trinidad -e production > "$LOG_FILE" 2>&1 & echo $! > "$PID_FILE"
PARENT_PID=`cat "$PID_FILE"`
pgrep -P "$PARENT_PID" > "$PID_FILE"
But here the last command does not print anything to PID_FILE. So for debugging purpose I tried echoing echo $PARENT_PID. It correctly prints the output like 1234.
Also in shell script If I do pgrep -P 1234 then also it prints the child process correctly but only if I do pgrep -P $PARENT_PID then it prints nothing.
You are writing stuff into a file and then reading the file back in. While that is just wasteful, not actually an explanation of your problem, I would refactor to
LOG_FILE="/home/sample.log"
PID_FILE="/home/sample.pid"
sudo -u user1 trinidad -e production > "$LOG_FILE" 2>&1 &
PARENT_PID=$!
pgrep -P "$PARENT_PID" > "$PID_FILE"
I'm guessing your actual problem is that the sudo process doesn't spawn any children. The action of pgrep -P is to print processes which are children of the PID you specify; if your process doesn't spawn any children, it won't print any.

nohup doesn't work when used with double-ampersand (&&) instead of semicolon (;)

I have a script that uses ssh to login to a remote machine, cd to a particular directory, and then start a daemon. The original script looks like this:
ssh server "cd /tmp/path ; nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
This script appears to work fine. However, it is not robust to the case when the user enters the wrong path so the cd fails. Because of the ;, this command will try to run the nohup command even if the cd fails.
The obvious fix doesn't work:
ssh server "cd /tmp/path && nohup java server 0</dev/null 1>server_stdout 2>server_stderr &"
that is, the SSH command does not return until the server is stopped. Putting nohup in front of the cd instead of in front of the java didn't work.
Can anyone help me fix this? Can you explain why this solution doesn't work? Thanks!
Edit: cbuckley suggests using sh -c, from which I derived:
ssh server "nohup sh -c 'cd /tmp/path && java server 0</dev/null 1>master_stdout 2>master_stderr' 2>/dev/null 1>/dev/null &"
However, now the exit code is always 0 when the cd fails; whereas if I do ssh server cd /failed/path then I get a real exit code. Suggestions?
See Bash's Operator Precedence.
The & is being attached to the whole statement because it has a higher precedence than &&. You don't need ssh to verify this. Just run this in your shell:
$ sleep 100 && echo yay &
[1] 19934
If the & were only attached to the echo yay, then your shell would sleep for 100 seconds and then report the background job. However, the entire sleep 100 && echo yay is backgrounded and you're given the job notification immediately. Running jobs will show it hanging out:
$ sleep 100 && echo yay &
[1] 20124
$ jobs
[1]+ Running sleep 100 && echo yay &
You can use parenthesis to create a subshell around echo yay &, giving you what you'd expect:
sleep 100 && ( echo yay & )
This would be similar to using bash -c to run echo yay &:
sleep 100 && bash -c "echo yay &"
Tossing these into an ssh, and we get:
# using parenthesis...
$ ssh localhost "cd / && (nohup sleep 100 >/dev/null </dev/null &)"
$ ps -ef | grep sleep
me 20136 1 0 16:48 ? 00:00:00 sleep 100
# and using `bash -c`
$ ssh localhost "cd / && bash -c 'nohup sleep 100 >/dev/null </dev/null &'"
$ ps -ef | grep sleep
me 20145 1 0 16:48 ? 00:00:00 sleep 100
Applying this to your command, and we get
ssh server "cd /tmp/path && (nohup java server 0</dev/null 1>server_stdout 2>server_stderr &)"
or:
ssh server "cd /tmp/path && bash -c 'nohup java server 0</dev/null 1>server_stdout 2>server_stderr &'"
Also, with regard to your comment on the post,
Right, sh -c always returns 0. E.g., sh -c exit 1 has error code
0"
this is incorrect. Directly from the manpage:
Bash's exit status is the exit status of the last command executed in
the script. If no commands are executed, the exit status is 0.
Indeed:
$ bash -c "true ; exit 1"
$ echo $?
1
$ bash -c "false ; exit 22"
$ echo $?
22
ssh server "test -d /tmp/path" && ssh server "nohup ... &"
Answer roundup:
Bad: Using sh -c to wrap the entire nohup command doesn't work for my purposes because it doesn't return error codes. (#cbuckley)
Okay: ssh <server> <cmd1> && ssh <server> <cmd2> works but is much slower (#joachim-nilsson)
Good: Create a shell script on <server> that runs the commands in succession and returns the correct error code.
The last is what I ended up using. I'd still be interested in learning why the original use-case doesn't work, if someone who understands shell internals can explain it to me!

Get the PID of a process started with nohup via ssh

I want to start a process using nohup on a remote machine via ssh. The problem is how to get the PID of the process started with nohup, so the "process actually doing something", not some outer shell instance or the like. Also, I want to store stdout and stderr in files, but that is not the issue here...
Locally, it works flawlessly using
nohup sleep 30 > out 2> err < /dev/null & echo $!
It is echoing me the exact PID of the command "sleep 30", which I can also see using "top" or "ps aux|grep sleep".
But I'm having trouble doing it remotely via ssh. I tried something like
ssh remote_machine 'nohup bash -c "( ( sleep 30 ) & )" > out 2> err < /dev/null'
but I cannot figure out where to place the "echo $!" so that it is displayed in my local shell. It is always showing me wrong PIDs, for example the one of the "bash" instance etc.
Has somebody an idea how to solve this?
EDIT:
OK, the "bash -c" might not be needed here. Like Lotharyx pointed out, I get the right PID just fine using
ssh remote 'nohup sleep 30 > out 2> err < /dev/null & echo $!'
but then the problem is that if you substitute "sleep 30" with something that produces output, say, "echo Hello World!", that output does not end up in the file "out", neither on the local nor on remote side. Anybody got an idea why?
EDIT2: My fault! There was just no space left on the other device, that's why the files "out" and "err" stayed empty!
So this is working. In addition, if one wants to call multiple commands in a row, separated by a semicolon (;), one can still use "bash -c", like so:
ssh remote 'nohup bash -c "echo bla;sleep 30;echo blupp" > out 2> err < /dev/null & echo $!'
Then it prints out the PID of the "bash -c" on the local side, which is just fine. (It is impossible to get the PID of the "innermost" or "busy" process, because every program itself can spawn new subprocesses, there is no way to find out...)
I tried the following (the local machine is Debian; the remote machine is CentOS), and it worked exactly as I think you're expecting:
~# ssh someone#somewhere 'nohup sleep 30 > out 2> err < /dev/null & echo $!'
someone#somewhere's password:
14193
~#
On the remote machine, I did ps -e, and saw this line:
14193 ? 00:00:00 sleep
So, clearly, on my local machine, the output is the PID of "sleep" executing on the remote machine.
Why are you adding bash to your command when sending it across an SSH tunnel?

Terminating SSH session executed by bash script

I have a script I can run locally to remotely start a server:
#!/bin/bash
ssh user#host.com <<EOF
nohup /path/to/run.sh &
EOF
echo 'done'
After running nohup, it hangs. I have to hit ctrl-c to exit the script.
I've tried adding an explicit exit at the end of the here doc and using "-t" argument for ssh. Neither works. How do I make this script exit immediately?
EDIT: The client is OSX 10.6, server is Ubuntu.
I think the problem is that nohup can't redirect output when you come in from ssh, it only redirects to nohup.out when it thinks it's connected to a terminal, and I the stdin override you have will prevent that, even with -t.
A workaround might be to redirect the output yourself, then the ssh client can disconnect - it's not waiting for the stream to close. Something like:
nohup /path/to/run.sh > run.log &
(This worked for me in a simple test connecting to an Ubuntu server from an OS X client.)
The problem might be that ...
... ssh is respecting the POSIX standard when not closing the session
if a process is still attached to the tty.
Therefore a solution might be to detach the stdin of the nohup command from the tty:
nohup /path/to/run.sh </dev/null &
See: SSH Hangs On Exit When Using nohup
Yet another approach might be to use ssh -t -t to force pseudo-tty allocation even if stdin isn't a terminal.
man ssh | less -Ip 'multiple -t'
ssh -t -t user#host.com <<EOF
nohup /path/to/run.sh &
EOF
See: BASH spawn subshell for SSH and continue with program flow
Redirecting the stdin of the remote host from a here document while invoking ssh without an explicit command leads to the message: Pseudo-terminal will not be allocated because stdin is not a terminal.
To avoid this message either use ssh's -T switch to tell the remote host there is no need to allocate a pseudo-terminal or explicitly specify a command (such as /bin/sh) for the remote host to execute the commands provided by the here document.
If an explicit command is given to ssh, the default is to provide no login shell in the form of a pseudo-terminal, i. e. there will be no normal login session when a command is specified (see man ssh).
Without a command specified for ssh, on the other hand, the default is to create a pseudo-tty for an interactive login session on the remote host.
- ssh user#host.com <<EOF
+ ssh -T user#host.com <<EOF
+ ssh user#host.com /bin/bash <<EOF
As a rule, ssh -t or even ssh -t -t should only be used if there are commands that expect stdin / stdout to be a terminal (such as top or vim) or if it is necessary to kill the remote shell and its children when the ssh client command finishes execution (see: ssh command unexpectedly continues on other system after ssh terminates).
As far as I can tell, the only way to combine an ssh command that does not allocate a pseudo-tty and a nohup command that writes to nohup.out on the remote host is to let the nohup command execute in a pseudo-terminal not created by the ssh mechanism. This can be done with the script command, for example, and will avoid the tcgetattr: Inappropriate ioctl for device message.
#!/bin/bash
ssh localhost /bin/sh <<EOF
#0<&- script -q /dev/null nohup sleep 10 1>&- &
#0<&- script -q -c "nohup sh -c 'date; sleep 10 1>&- &'" /dev/null # Linux
0<&- script -q /dev/null nohup sh -c 'date; sleep 10 1>&- &' # FreeBSD, Mac OS X
cat nohup.out
exit 0
EOF
echo 'done'
exit 0
You need to add a exit 0 at the end.

Getting ssh to execute a command in the background on target machine

This is a follow-on question to the How do you use ssh in a shell script? question. If I want to execute a command on the remote machine that runs in the background on that machine, how do I get the ssh command to return? When I try to just include the ampersand (&) at the end of the command it just hangs. The exact form of the command looks like this:
ssh user#target "cd /some/directory; program-to-execute &"
Any ideas? One thing to note is that logins to the target machine always produce a text banner and I have SSH keys set up so no password is required.
I had this problem in a program I wrote a year ago -- turns out the answer is rather complicated. You'll need to use nohup as well as output redirection, as explained in the wikipedia artcle on nohup, copied here for your convenience.
Nohuping backgrounded jobs is for
example useful when logged in via SSH,
since backgrounded jobs can cause the
shell to hang on logout due to a race
condition [2]. This problem can also
be overcome by redirecting all three
I/O streams:
nohup myprogram > foo.out 2> foo.err < /dev/null &
This has been the cleanest way to do it for me:-
ssh -n -f user#host "sh -c 'cd /whereever; nohup ./whatever > /dev/null 2>&1 &'"
The only thing running after this is the actual command on the remote machine
Redirect fd's
Output needs to be redirected with &>/dev/null which redirects both stderr and stdout to /dev/null and is a synonym of >/dev/null 2>/dev/null or >/dev/null 2>&1.
Parantheses
The best way is to use sh -c '( ( command ) & )' where command is anything.
ssh askapache 'sh -c "( ( nohup chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
Nohup Shell
You can also use nohup directly to launch the shell:
ssh askapache 'nohup sh -c "( ( chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
Nice Launch
Another trick is to use nice to launch the command/shell:
ssh askapache 'nice -n 19 sh -c "( ( nohup chown -R ask:ask /www/askapache.com &>/dev/null ) & )"'
If you don't/can't keep the connection open you could use screen, if you have the rights to install it.
user#localhost $ screen -t remote-command
user#localhost $ ssh user#target # now inside of a screen session
user#remotehost $ cd /some/directory; program-to-execute &
To detach the screen session: ctrl-a d
To list screen sessions:
screen -ls
To reattach a session:
screen -d -r remote-command
Note that screen can also create multiple shells within each session. A similar effect can be achieved with tmux.
user#localhost $ tmux
user#localhost $ ssh user#target # now inside of a tmux session
user#remotehost $ cd /some/directory; program-to-execute &
To detach the tmux session: ctrl-b d
To list screen sessions:
tmux list-sessions
To reattach a session:
tmux attach <session number>
The default tmux control key, 'ctrl-b', is somewhat difficult to use but there are several example tmux configs that ship with tmux that you can try.
I just wanted to show a working example that you can cut and paste:
ssh REMOTE "sh -c \"(nohup sleep 30; touch nohup-exit) > /dev/null &\""
You can do this without nohup:
ssh user#host 'myprogram >out.log 2>err.log &'
Quickest and easiest way is to use the 'at' command:
ssh user#target "at now -f /home/foo.sh"
I think you'll have to combine a couple of these answers to get what you want. If you use nohup in conjunction with the semicolon, and wrap the whole thing in quotes, then you get:
ssh user#target "cd /some/directory; nohup myprogram > foo.out 2> foo.err < /dev/null"
which seems to work for me. With nohup, you don't need to append the & to the command to be run. Also, if you don't need to read any of the output of the command, you can use
ssh user#target "cd /some/directory; nohup myprogram > /dev/null 2>&1"
to redirect all output to /dev/null.
This worked for me may times:
ssh -x remoteServer "cd yourRemoteDir; ./yourRemoteScript.sh </dev/null >/dev/null 2>&1 & "
You can do it like this...
sudo /home/script.sh -opt1 > /tmp/script.out &
It appeared quite convenient for me to have a remote tmux session using the tmux new -d <shell cmd> syntax like this:
ssh someone#elsewhere 'tmux new -d sleep 600'
This will launch new session on elsewhere host and ssh command on local machine will return to shell almost instantly. You can then ssh to the remote host and tmux attach to that session. Note that there's nothing about local tmux running, only remote!
Also, if you want your session to persist after the job is done, simply add a shell launcher after your command, but don't forget to enclose in quotes:
ssh someone#elsewhere 'tmux new -d "~/myscript.sh; bash"'
Actually, whenever I need to run a command on a remote machine that's complicated, I like to put the command in a script on the destination machine, and just run that script using ssh.
For example:
# simple_script.sh (located on remote server)
#!/bin/bash
cat /var/log/messages | grep <some value> | awk -F " " '{print $8}'
And then I just run this command on the source machine:
ssh user#ip "/path/to/simple_script.sh"
If you run remote command without allocating tty, redirect stdout/stderr works, nohup is not necessary.
ssh user#host 'background command &>/dev/null &'
If you use -t to allocate tty to run interactive command along with background command, and background command is the last command, like this:
ssh -t user#host 'bash -c "interactive command; nohup backgroud command &>/dev/null &"'
It's possible that background command doesn't actually start. There's race here:
bash exits after nohup starts. As a session leader, bash exit results in HUP signal sent to nohup process.
nohup ignores HUP signal.
If 1 completes before 2, the nohup process will exit and won't start the background command at all. We need to wait nohup start the background command. A simple workaroung is to just add a sleep:
ssh -t user#host 'bash -c "interactive command; nohup backgroud command &>/dev/null & sleep 1"'
The question was asked and answered years ago, I don't know if openssh behavior changed since then. I was testing on:
OpenSSH_8.6p1, OpenSSL 1.1.1g FIPS 21 Apr 2020
I was trying to do the same thing, but with the added complexity that I was trying to do it from Java. So on one machine running java, I was trying to run a script on another machine, in the background (with nohup).
From the command line, here is what worked: (you may not need the "-i keyFile" if you don't need it to ssh to the host)
ssh -i keyFile user#host bash -c "\"nohup ./script arg1 arg2 > output.txt 2>&1 &\""
Note that to my command line, there is one argument after the "-c", which is all in quotes. But for it to work on the other end, it still needs the quotes, so I had to put escaped quotes within it.
From java, here is what worked:
ProcessBuilder b = new ProcessBuilder("ssh", "-i", "keyFile", "bash", "-c",
"\"nohup ./script arg1 arg2 > output.txt 2>&1 &\"");
Process process = b.start();
// then read from process.getInputStream() and close it.
It took a bit of trial & error to get this working, but it seems to work well now.
YOUR-COMMAND &> YOUR-LOG.log &
This should run the command and assign a process id you can simply tail -f YOUR-LOG.log to see results written to it as they happen. you can log out anytime and the process will carry on
If you are using zsh then use program-to-execute &! is a zsh-specific shortcut to both background and disown the process, such that exiting the shell will leave it running.
A follow-on to #cmcginty's concise working example which also shows how to alternatively wrap the outer command in double quotes. This is how the template would look if invoked from within a PowerShell script (which can only interpolate variables from within double-quotes and ignores any variable expansion when wrapped in single quotes):
ssh user#server "sh -c `"($cmd) &>/dev/null </dev/null &`""
Inner double-quotes are escaped with back-tick instead of backslash. This allows $cmd to be composed by the PowerShell script, e.g. for deployment scripts and automation and the like. $cmd can even contain a multi-line heredoc if composed with unix LF.
First follow this procedure:
Log in on A as user a and generate a pair of authentication keys. Do not enter a passphrase:
a#A:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory '/home/a/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
Your public key has been saved in /home/a/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a#A
Now use ssh to create a directory ~/.ssh as user b on B. (The directory may already exist, which is fine):
a#A:~> ssh b#B mkdir -p .ssh
b#B's password:
Finally append a's new public key to b#B:.ssh/authorized_keys and enter b's password one last time:
a#A:~> cat .ssh/id_rsa.pub | ssh b#B 'cat >> .ssh/authorized_keys'
b#B's password:
From now on you can log into B as b from A as a without password:
a#A:~> ssh b#B
then this will work without entering a password
ssh b#B "cd /some/directory; program-to-execute &"
I think this is what you need:
At first you need to install sshpass on your machine.
then you can write your own script:
while read pass port user ip; do
sshpass -p$pass ssh -p $port $user#$ip <<ENDSSH1
COMMAND 1
.
.
.
COMMAND n
ENDSSH1
done <<____HERE
PASS PORT USER IP
. . . .
. . . .
. . . .
PASS PORT USER IP
____HERE

Resources