I'm running ruby in an Alpine docker container (it's a sidekiq worker, if that matters). At a certain point, my application receives instructions to shell out to a subcommand. I need to be able to stream STDOUT rather than have it buffer. This is why I'm using PTY instead of system() or another similar answer. I'm executing the following line of code:
stdout, stdin, pid = PTY.spawn(my_cmd)
When I connect to the docker container and run ps auxf, I see this:
root 7 0.0 0.4 187492 72668 ? Sl 01:38 0:00 ruby_program
root 12378 0.0 0.0 1508 252 pts/4 Ss+ 01:38 0:00 \_ sh -c my_cmd
root 12380 0.0 0.0 15936 6544 pts/4 S+ 01:38 0:00 \_ my_cmd
Note how the child process of ruby is "sh -c my_cmd", which itself then has a child "my_cmd" process.
"my_cmd" can take a significant amount of time to run. It is designed so that sending a signal USR1 to the process causes it to save its state so it can be resumed later and abort cleanly.
The problem is that the pid returned from "PTY.spawn()" is the pid of the "sh -c my_cmd" process, not the "my_cmd" process. So when I execute:
Process.kill('USR1', pid)
it sends USR1 to sh, not to my_cmd, so it doesn't behave as it should.
Is there any way to get the pid that corresponds to the command I actually specified? I'm open to ideas outside the realm of PTY, but it needs to satisfy the following constraints:
1) I need to be able to stream output from both STDOUT and STDERR as they are written, without waiting for them to be flushed (since STDOUT and STDERR get mixed together into a single stream in PTY, I'm redirecting STDERR to a file and using inotify to get updates).
2) I need to be able to send USR1 to the process to be able to pause it.
I gave up on a clean solution. I finally just executed
pgrep -P #{pid}
to get the child pid, and then I could send USR1 to that process. Feels hacky, but when ruby gives you lemons...
You should send your arguments as arrays. So instead of
stdout, stdin, pid = PTY.spawn("my_cmd arg1 arg2")
You should use
stdout, stdin, pid = PTY.spawn("my_cmd", "arg1", "arg2")
etc.
Also see:
Process.spawn child does not receive TERM
https://zendesk.engineering/running-a-child-process-in-ruby-properly-febd0a2b6ec8
I have a situation where one xterm runs another xterm which runs a process.
I would like to set some kind of watchdog and kill all those windows in case of a stuck process.
I'm using Ruby to do this. When the first xterm is opened I get it's PID, and later when a SIGTERM is sent to it, only the first xterm window dies, but not the second.
It can be easily reproduced in irb:
irb(main):002:0> cmd = "xterm -e xterm -e sleep 1000"
=> "xterm -e xterm -e sleep 1000"
irb(main):003:0> pid = Process.spawn(cmd)
=> 669
irb(main):004:0> Process.kill(15, 669)
=> 1
This leaves the second xterm window open. How can all the process chain be killed?
Thanks
Using either ps xf or pstree, you can dynamically see running processes hierarchy.
I've used strace to track which was the behaviour of those programs. It turned that they are checking the contents of /proc/$pid/stat and /proc/$pid/status, which give them information of who the process's parent is.
Probably, you will have to look for the processes whose Parent PID is equal to the PID you got (or that process's PPID).
I have a question about UNIX job control in RHEL6
Basically, I am trying to implement passenger debug log rotation using logrotate. I am following the instructions here:
https://github.com/phusion/passenger/wiki/Phusion-Passenger-and-logrotation
I've got everything setup correctly (I think). My problem is this; when I spawn the background job using
nohup pipetool $HOME/passenger.log < $HOME/passenger.pipe &
And then log out and back in, if I inspect the process table, for example by using 'ps aux' if I check the pid of the process it appears as with the command 'bash'. I have tried changing the first line of the command to "#!/usr/bin/ruby". Here is an example of this:
[root#server]# nohup pipetool /var/log/nginx/passenger-debug.log < /var/pipe/passenger.pipe &
[1] 63767
[root#server]# exit
exit
[me#server]$ sudo su
[sudo] password for me:
[root#server]# ps aux | grep 63767
root 63767 0.0 0.0 108144 2392 pts/0 S 15:26 0:00 bash
root 63887 0.0 0.0 103236 856 pts/0 S+ 15:26 0:00 grep 63767
[root#server]#
When this occurs the line in the supplied logrotate file ( killall -HUP pipetool ) fails because the 'pipetool' is not matched. Again, I've tried changing the first line to #!/usr/bin/ruby. This had no impact. So, my question is basically; is there any good way to have the actual command appear in the process table instead of just 'bash' when spawned using job control? I am using bash as the shell when I invoke the pipetool. I appreciate you taking the time to help me.
This should work for you: edit pipetool to set the global variable $PROGRAM_NAME:
$PROGRAM_NAME = 'pipetool'
The script should then show up as pipetool in the process list.
My question is very similar to this one except that my background process was started from a script. I could be doing something wrong but when I try this simple example:
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
I never receive notification when the sleep background command finishes. However, if I execute the same directly in the terminal,
$ set -mb
$ sleep 5 &
$
[1]+ Done sleep 5
I see the output that I expect.
I'm using bash on cygwin. I'm guessing that it might have something to do with where the output is directed, but trying various output redirection, I'm not getting any closer.
EDIT: So I have more information about the why (thx to jkramer), but still looking for the how. How do I get "push" notification that a background process started from a script has terminated? Saving a PID to a file and polling is not what I'm looking for.
if you run it as source file . then it can notify. e.g.
cat foo.sh
#!/bin/bash
set -mb # enable job control and notification
sleep 5 &
. foo.sh
[1]+ Done sleep 5
The job control of your shell only affects processes controlled by your terminal, that is tasks started directly from the shell.
When the parent process (your script) dies, the init process automatically becomes the parent of the child process (your sleep command), effectively killing all output. Try this example:
[jkramer/sgi5k:~]# cat foo.sh
#!/bin/bash
sleep 20 &
echo $!
[jkramer/sgi5k:~]# bash foo.sh
19638
[jkramer/sgi5k:~]# ps -ef | grep 19638
jkramer 19638 1 0 23:08 pts/3 00:00:00 sleep 20
jkramer 19643 19500 0 23:08 pts/3 00:00:00 grep 19638
As you can see, the parent process of sleep after the script terminates is 1, the init process. If you need to get notified about the termination of your child process, you can save the content of the $! variable (PID of the last created process) in a file or somewhere and use the `wait´ command to wait for its termination.
I have a process that is already running for a long time and don't want to end it.
How do I put it under nohup (that is, how do I cause it to continue running even if I close the terminal?)
Using the Job Control of bash to send the process into the background:
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h [job-spec] where [job-spec] is the job number (like %1 for the first running job; find about your number with the jobs command) so that the job isn't killed when the terminal closes.
Suppose for some reason Ctrl+Z is also not working, go to another terminal, find the process id (using ps) and run:
kill -SIGSTOP PID
kill -SIGCONT PID
SIGSTOP will suspend the process and SIGCONT will resume the process, in background. So now, closing both your terminals won't stop your process.
The command to separate a running job from the shell ( = makes it nohup) is disown and a basic shell-command.
From bash-manpage (man bash):
disown [-ar] [-h] [jobspec ...]
Without options, each jobspec is removed from the table of active jobs. If the -h option is given, each jobspec is not
removed from the table, but is marked so that SIGHUP is not sent to the job if the shell receives a SIGHUP. If no jobspec is
present, and neither the -a nor the -r option is supplied, the current job is used. If no jobspec is supplied, the -a option
means to remove or mark all jobs; the -r option without a jobspec argument restricts operation to running jobs. The return
value is 0 unless a jobspec does not specify a valid job.
That means, that a simple
disown -a
will remove all jobs from the job-table and makes them nohup
These are good answers above, I just wanted to add a clarification:
You can't disown a pid or process, you disown a job, and that is an important distinction.
A job is something that is a notion of a process that is attached to a shell, therefore you have to throw the job into the background (not suspend it) and then disown it.
Issue:
% jobs
[1] running java
[2] suspended vi
% disown %1
See http://www.quantprinciple.com/invest/index.php/docs/tipsandtricks/unix/jobcontrol/
for a more detailed discussion of Unix Job Control.
Unfortunately disown is specific to bash and not available in all shells.
Certain flavours of Unix (e.g. AIX and Solaris) have an option on the nohup command itself which can be applied to a running process:
nohup -p pid
See http://en.wikipedia.org/wiki/Nohup
Node's answer is really great, but it left open the question how can get stdout and stderr redirected. I found a solution on Unix & Linux, but it is also not complete. I would like to merge these two solutions. Here it is:
For my test I made a small bash script called loop.sh, which prints the pid of itself with a minute sleep in an infinite loop.
$./loop.sh
Now get the PID of this process somehow. Usually ps -C loop.sh is good enough, but it is printed in my case.
Now we can switch to another terminal (or press ^Z and in the same terminal). Now gdb should be attached to this process.
$ gdb -p <PID>
This stops the script (if running). Its state can be checked by ps -f <PID>, where the STAT field is 'T+' (or in case of ^Z 'T'), which means (man ps(1))
T Stopped, either by a job control signal or because it is being traced
+ is in the foreground process group
(gdb) call close(1)
$1 = 0
Close(1) returns zero on success.
(gdb) call open("loop.out", 01102, 0600)
$6 = 1
Open(1) returns the new file descriptor if successful.
This open is equal with open(path, O_TRUNC|O_CREAT|O_RDWR, S_IRUSR|S_IWUSR).
Instead of O_RDWR O_WRONLY could be applied, but /usr/sbin/lsof says 'u' for all std* file handlers (FD column), which is O_RDWR.
I checked the values in /usr/include/bits/fcntl.h header file.
The output file could be opened with O_APPEND, as nohup would do, but this is not suggested by man open(2), because of possible NFS problems.
If we get -1 as a return value, then call perror("") prints the error message. If we need the errno, use p errno gdb comand.
Now we can check the newly redirected file. /usr/sbin/lsof -p <PID> prints:
loop.sh <PID> truey 1u REG 0,26 0 15008411 /home/truey/loop.out
If we want, we can redirect stderr to another file, if we want to using call close(2) and call open(...) again using a different file name.
Now the attached bash has to be released and we can quit gdb:
(gdb) detach
Detaching from program: /bin/bash, process <PID>
(gdb) q
If the script was stopped by gdb from an other terminal it continues to run. We can switch back to loop.sh's terminal. Now it does not write anything to the screen, but running and writing into the file. We have to put it into the background. So press ^Z.
^Z
[1]+ Stopped ./loop.sh
(Now we are in the same state as if ^Z was pressed at the beginning.)
Now we can check the state of the job:
$ ps -f 24522
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
$ jobs
[1]+ Stopped ./loop.sh
So process should be running in the background and detached from the terminal. The number in the jobs command's output in square brackets identifies the job inside bash. We can use in the following built in bash commands applying a '%' sign before the job number :
$ bg %1
[1]+ ./loop.sh &
$ disown -h %1
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID><PPID> 0 11:16 pts/36 S 0:00 /bin/bash ./loop.sh
And now we can quit from the calling bash. The process continues running in the background. If we quit its PPID become 1 (init(1) process) and the control terminal become unknown.
$ ps -f <PID>
UID PID PPID C STIME TTY STAT TIME CMD
<UID> <PID> 1 0 11:16 ? S 0:00 /bin/bash ./loop.sh
$ /usr/bin/lsof -p <PID>
...
loop.sh <PID> truey 0u CHR 136,36 38 /dev/pts/36 (deleted)
loop.sh <PID> truey 1u REG 0,26 1127 15008411 /home/truey/loop.out
loop.sh <PID> truey 2u CHR 136,36 38 /dev/pts/36 (deleted)
COMMENT
The gdb stuff can be automatized creating a file (e.g. loop.gdb) containing the commands and run gdb -q -x loop.gdb -p <PID>. My loop.gdb looks like this:
call close(1)
call open("loop.out", 01102, 0600)
# call close(2)
# call open("loop.err", 01102, 0600)
detach
quit
Or one can use the following one liner instead:
gdb -q -ex 'call close(1)' -ex 'call open("loop.out", 01102, 0600)' -ex detach -ex quit -p <PID>
I hope this is a fairly complete description of the solution.
Simple and easiest steps
Ctrl + Z ----------> Suspends the process
bg --------------> Resumes and runs background
disown %1 -------------> required only if you need to detach from the terminal
To send running process to nohup (http://en.wikipedia.org/wiki/Nohup)
nohup -p pid , it did not worked for me
Then I tried the following commands and it worked very fine
Run some SOMECOMMAND,
say /usr/bin/python /vol/scripts/python_scripts/retention_all_properties.py 1.
Ctrl+Z to stop (pause) the program and get back to the shell.
bg to run it in the background.
disown -h so that the process isn't killed when the terminal closes.
Type exit to get out of the shell because now you're good to go as the operation will run in the background in its own process, so it's not tied to a shell.
This process is the equivalent of running nohup SOMECOMMAND.
ctrl + z - this will pause the job (not going to cancel!)
bg - this will put the job in background and return in running process
disown -a - this will cut all the attachment with job (so you can close the terminal and it will still run)
These simple steps will allow you to close the terminal while keeping process running.
It wont put on nohup (based on my understanding of your question, you don't need it here).
On my AIX system, I tried
nohup -p processid>
This worked well. It continued to run my process even after closing terminal windows. We have ksh as default shell so the bg and disown commands didn't work.