starting a new process group from bash script - bash

I basically want to run a script (which calls more scripts) in a new process group so that I can send signal to all the processes called by the script.
In Linux, I found out setsid helps me in doing that, but this is not available on FreeBSD.
Syntax for setsid (provided by util-linux-ng).
setsid /path/to/myscript
I, however learnt that session and process group are not the same. But starting a new session also solves my problem.

Sessions and groups are not the same thing. Let's make things clean:
A session consists of one or more process groups, and can have a controlling terminal. When the session has a controlling terminal, the session has, at any moment, exactly one foreground process group and one or more background process groups. In such a scenario, all terminal-generated signals and input is seen by every process in the foreground process group.
Also, when a session has a controlling terminal, the shell process is usually the session leader, dictating which process group is the foreground process group (implicitly making the other groups background process groups). Processes in a group are usually put there by a linear pipeline. For example, ls -l | grep a | sort will typically create a new process group where ls, grep and sort live.
Shells that support job control (which also requires support by the kernel and the terminal driver), as in the case of bash, create a new process group for each command invoked -- and if you invoke it to run in the background (with the & notation), that process group is not given the control of the terminal, and the shell makes it a background process group (and the foreground process group remains the shell).
So, as you can see, you almost certainly don't want to create a session in this case. A typical situation where you'd want to create a session is if you were daemonizing a process, but other than that, there is usually not much use in creating a new session.
You can run the script as a background job, as I mentioned, this will create a new process group. Since fork() inherits the process group ID, every process executed by the script will be in the same group. For example, consider this simple script:
#!/bin/bash
ps -o pid,ppid,pgid,comm | grep ".*"
This prints something like:
PID PPID PGID COMMAND
11888 11885 11888 bash
12343 11888 12343 execute.sh
12344 12343 12343 ps
12345 12343 12343 grep
As you can see, execute.sh, ps and grep are all on the same process group (the value in PGID).
So all you want is:
/path/to/myscript &
Then you can check the process group ID of myscript with ps -o pid,ppid,pgid,comm | grep myscript. To send a signal to the group, send it to the group leader (PGID is the PID of the leader of the group). A signal sent to a group is delivered to every process in that group.

Using FreeBSD you may try using the script command that will internally execute the setsid command.
stty -echo -onlcr # avoid added \r in output
script -q /dev/null /path/to/myscript
stty echo onlcr
# sync # ... if terminal prompt does not return

This is not exactly answer, but is an alternative approach based on names.
You can have a common part of name for all process. For example we have my_proc_group_29387172 part for all the following processes:
-rwxrwxr-x. my_proc_group_29387172_microservice_1
-rwxrwxr-x. my_proc_group_29387172_microservice_2
-rwxrwxr-x. my_proc_group_29387172_data_dumper
Spawn all of them (and as much as you want):
ADDR=1 ./my_proc_group_29387172_microservice_1
ADDR=2 ./my_proc_group_29387172_microservice_1
ADDR=3 ./my_proc_group_29387172_microservice_2
./my_proc_group_29387172_data_dumper
When you want to kill all processes you can use pkill command (pattern kill) or killall with --regexp parameter:
pkill my_proc_group_29387172
Benefit :) - you can start as many process as you want at any time (or any day) from any script.
Drawback :( - you can kill innocent processes if they has common part of name with your pattern.

Related

What does percent sign % do in "kill %vmtouch"?

I came across this shell script
bash# while true; do
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %vmtouch
done
and wonder how does the kill %vmtouch portion work?
I normally pass a pid to kill a process but how does %vmtouch resolve to a pid?
I tried to run portions of script seperately but I got
-bash: kill: %vmtouch: no such job error.
%something is not a general shell script feature, but syntax used by the kill, fg and bg builtin commands to identify jobs. It searches the list of the shell's active jobs for the given string, and then signals that.
Here's man bash searching for /jobspec:
The character % introduces a job specification (jobspec).
Job number n may be referred to as %n. A job may also be referred to using a prefix of the name used to start it, or using a substring that appears in its command line. [...]
So if you do:
sleep 30 &
cat &
You can use things like %sleep or %sl to conveniently refer to the last one without having to find or remember its pid or job number.
You should look at the Job control section of the man bash page. The character % introduces a job specification (jobspec). Ideally when you have started this background job, you should have seen an entry in the terminal
[1] 25647
where 25647 is some random number I used. The line above means that the process id of the last backgrounded job (on a pipeline, the process id of the last process) is using job number as 1.
The way you are using the job spec is wrong in your case as it does not take process name of the background job. The last backgrounded is referred to as %1, so ideally your kill command should have been written as below, which is the same as writing kill 25647
vmtouch -m 10000000000 -l *head* & sleep 10m
kill %1
But that said, instead of relying the jobspec ids, you can access the process id of the background job which is stored in a special shell variable $! which you can use as
vmtouch -m whatever -l *head* & vmtouch_pid=$!
sleep 10m
kill "$vmtouch_pid"
See Job Control Basics from the GNU bash man page.

How to run a shell script with the terminal closed, and stop the script at any time

What I usually do is pause my script, run it in the background and then disown it like
./script
^Z
bg
disown
However, I would like to be able to cancel my script at any time. If I have a script that runs indefinitely, I would like to be able to cancel it after a few hours or a day or whenever I feel like cancelling it.
Since you are having a bit of trouble following along, let's see if we can keep it simple for you. (this presumes you can write to /tmp, change as required). Let's start your script in the background and create a PID file containing the PID of its process.
$ ./script & echo $! > /tmp/scriptPID
You can check the contents of /tmp/scriptPID
$ cat /tmp/scriptPID
######
Where ###### is the PID number of the running ./script process. You can further confirm with pidof script (which will return the same ######). You can use ps aux | grep script to view the number as well.
When you are ready to kill the ./script process, you simply pass the number (e.g. ######) to kill. You can do that directly with:
$ kill $(</tmp/scriptPID)
(or with the other methods listed in my comment)
You can add rm /tmp/scriptPID to remove the pid file after killing the process.
Look things over and let me know if you have any further questions.

Why does running a background task over ssh fail if a pseudo-tty is allocated?

I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.
Running ssh localhost 'touch foobar &' creates a file called foobar as expected:
[bob#server ~]$ ssh localhost 'touch foobar &'
[bob#server ~]$ ls foobar
foobar
However running the same command but with the -t option to force pseudo-tty allocation fails to create foobar:
[bob#server ~]$ ssh -t localhost 'touch foobar &'
Connection to localhost closed.
[bob#server ~]$ echo $?
0
[bob#server ~]$ ls foobar
ls: cannot access foobar: No such file or directory
My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:
[bob#pidora ~]$ ssh -t localhost 'touch foobar & sleep 1'
Connection to localhost closed.
[bob#pidora ~]$ ls foobar
foobar
If anyone has a definitive explanation I would be very interested to hear it. Thanks.
Oh, that's a good one.
This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c, and the effect of & in input commands.
The answer assumes you're familiar with how job control works in UNIX; if you're not, here's a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word' would place the processes running cat(1), sort(1) and grep(1) in the same process group). bash is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3). A signal sent to a process group is delivered to every process in that group.
If the concept of process groups and job control is completely new to you, I think you'll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).
That being said, let's see what is happening here. We have to fit together every piece of the puzzle.
In both cases, the ssh remote side invokes bash(1) with -c. The -c flag causes bash(1) to run as a non-interactive shell. From the manpage:
An interactive shell is one started without non-option arguments and
without the -c option whose standard input and error are both
connected to terminals (as determined by isatty(3)), or one started
with the -i option. PS1 is set and $- includes i if bash is
interactive, allowing a shell script or a startup file to test this
state.
Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t.
However, the use of & has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:
If a command is terminated by the control operator &, the shell
executes the command in the background in a subshell. The shell does
not wait for the command to finish, and the return status is 0.
Commands separated by a ; are executed sequentially; the shell waits
for each command to terminate in turn. The return status is the exit
status of the last command executed.
So, in both cases, bash will not wait for command execution, and touch(1) will be executed in the same process group as bash(1).
Now, consider what happens when a session leader exits. Quoting from setpgid(2) manpage:
If a session has a controlling terminal, and the CLOCAL flag for that
terminal is not set, and a terminal hangup occurs, then the session
leader is sent a SIGHUP. If the session leader exits, then a SIGHUP
signal will also be sent to each process in the foreground process
group of the controlling terminal.
(Emphasis mine)
When you don't use -t
When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()'d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)), there is no SIGHUP sent to the process group, because bash wasn't a session leader (and there is no controlling terminal). So everything works.
When you use -t
-t forces PTY allocation, which means that the ssh remote side will call setsid(2), allocate a pseudo-terminal + fork a new process with forkpty(3), connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1). forkpty(3) opens the PTY slave side in the forked process that will become bash; since there's no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.
Then the same thing happens again: touch(1) is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &, when it exits, SIGHUP is delivered to the process group and touch(1) dies prematurely.
About nohup
nohup(1) doesn't work here because there is still a race condition. If bash(1) terminates before nohup(1) has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)
A possible fix
Forcefully re-enabling job control fixes it. In bash, you do that with set -m. This works:
ssh -t localhost 'set -m ; touch foobar &'
Or force bash to wait for touch(1) to complete:
ssh -t localhost 'touch foobar & wait `pgrep touch`'
The answer of #Filipe Gonçalves is great, but it has something wrong. I have no enough reputation to comment there, so i correct/enrich content here:
When you don't use -t,
#Filipe says:
When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. ...
Actually, bash is a session leader and new session is created.
Let us test this:
# run sleep background process first, then call ps directly:
[root#90fb1c3f30ce ~]# ssh localhost 'sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args'
PID PPID PGID SESS TPGID TT COMMAND
184074 67 184074 184074 -1 ? sshd: root#notty
184076 184074 184076 184076 -1 ? bash -c sleep 66 & ps -o pid,ppid,pgid,sess,tpgid,tty,args
184081 184076 184076 184076 -1 ? sleep 66
184082 184076 184076 184076 -1 ? ps -o pid,ppid,pgid,sess,tpgid,tty,args
Notice ^^^^^ ^^^^^
We can see these bash/sleep/ps processes have the same PGID/SESS which equals to PID 184076 of bash process, but sshd parent prcoess has a different PGID/SESS. Here, the bash process is the leader of a new session and bash/sleep/ps processes belong to another process group.
In addition, we can find the ssh command does not return right away, it still waits about 66 seconds. You can find its reason here: Getting ssh to execute a command in the background on target machine
During the ssh command waiting, we can open another session and run:
[root#90fb1c3f30ce ~]# ps -eo pid,ppid,pgid,sess,tpgid,tty,args
PID PPID PGID SESS TPGID TT COMMAND
# unrelated lines removed #
184074 67 184074 184074 -1 ? sshd: root#notty
184081 1 184076 184076 -1 ? sleep 66
Notice ^^^^^ ^^^^^
[root#90fb1c3f30ce ~]# ps -e | grep 184076
[root#90fb1c3f30ce ~]#
We can see the bash process (pid 184076) has already gone, but PGID/SESS of the sleep background process keeps no change. It does not matter, APUE session 9.4:
Each prcoess group can have a process group leader. The leader is identified by its process group ID being equal to its process ID.
It is possible for a process group leader to create a process group, create processes in the group, and then terminate. The process group still exists, as long as at least one process is in the group, regardless of whether the group leader terminates.
So, why doesn't this sleep process die?
When you don't use -t, there is no PTY allocation on the remote side, so prcoess group on the remote side is not a foreground process group (without a terminal, no meaning of foreground or background). As such, even though the shell terminates very quickly, there is no SIGHUP sent to its process group, because the process group is not a foreground process group. (SIGHUP signal will be sent to each process in the foreground process group of the controlling terminal).
The key is decoupling the stdin/stdout/stderr streams of the child process from the originating bash/ssh session; then pseudo-tty allocation (ssh -t) is no longer required to allow the child to survive the termination of the ssh connection. See here for a complete answer...

PID of the process that created the file

Can I get the pid of touch when it's creating a file? I've tried touch ID$! & but it doesn't display the pid correctly. It takes the command before touch. Any advices?
I suppose you could write a small C or Perl program that calls fork() and then uses one of the exec*() functions to invoke touch from the child process. The parent process would receive the child's PID as the result of the fork call.
You say in a comment that you want to insert the PID into the name of the file. I can't think of a way to invoke touch with its own PID as part of its comand-line argument; you won't know the PID soon enough to do that. I suppose you could rename the file after touching it.
But the PID of the touch process isn't particularly meaningful. The process will have terminated before you can make any use of it.
If you just want a (more or less) unique number as part of the file name, I can't think of any good reason that it has to be the PID of the touch process. You could just do something like:
NEW_PID=$(sh -c 'echo $$')
touch foo-$NEW_PID.txt
which gives you the PID of a short-lived shell process.
Note that PIDs are not unique; since there are only finitely many possible PIDs, they're re-used after a while. (I've been able to force a PID to be reused in less than a minute by forking multiple processes very quickly.)
This is touch rewritten in perl with the pid of the creating process as part of the filename
perl -e 'open(F,">".$$."myfile")||die $!'
If you really need that pid, it's a multi-step process:
f=$(mktemp)
touch $f &
wait $!
mv $f ./ID$!

Why do unix background processes sometimes die when I exit my shell?

I wanted to know why i am seeing a different behaviour in the background process in Bash shell
Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
typed sleep 2000 &
press enter
It gave me the job number. Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process died.
Case 2:Case 1: Logged in to Unix server using Putty(SSH)
By default it uses csh shell
I changed to bash shell
vi mysleep.sh
sleep 2000 & Saved mysleep.sh
./mysleep.sh
Diff here is..instead of executing the sleep command directly i am storing the sleep command in a file and executing the file.
Now i killed my session by clicking the x in the putty window
Now open another session and tried to lookup the process..the process is still there
Not sure why this is happening. I thought i need to do disown in bash to run the process even after logging out.
One diff i see in the parent process id..In the second case..the parent process id for the sleep 2000 becomes 1. Looks like as soon as process for mysleep.sh died the kernel assigned the parent process to 1.
The difference here is indeed the intervening process.
When you close the terminal window, a HUP signal (related to "nohup" as an0nymo0usc0ward mentioned) is sent to the processes running in it. The default action on receiving HUP is to die - from the signal(3) manpage,
No Name Default Action Description
1 SIGHUP terminate process terminal line hangup
In your first example, the sleep process directly receives this HUP signal and dies because it isn't set to do anything else. (Some processes catch HUP and use it to perform some action, e.g. reread some configuration files)
In the second example, the shell process running your shell script has already died, so the sleep process never gets the signal. In UNIX, every process must have a parent process due to the internals of how the wait(2) family of calls works and indeed processes in general. So when the parent process dies, the kernel gives it to init (pid 1, as you note) as a foster child.
Orphan process (on wikipedia) has some more information available about it, also see Zombie process for some additional technical details.
Already running process?
^z
bg
disown %<jobid>
New process/script (on local machine's console)?
nohup script.sh &
New process/script (on remote machine's console)?
Depending on your need,
there are two options [ there will be more ;-) ]
ssh remotehost 'nohup /path/to/script.sh </dev/null > nohup.out 2>&1 &'
OR
use 'screen'
Try "nohup cmd args..."
Steven's answer is correct, but I'd like to highlight the tricky part here again:
=> Using a bash script that just executes sleep in the background
The effect of this is that the "script" exits almost immediately (since it's done all its commands). However, it did create a child process (sleep) during its lifetime. The effect of this is that:
The "script" cannot be the parent anymore, and sleep is orphaned to init (which shows nicely in a pstree)
The bash shell where you started the script from has no underlying jobs anymore
Note that this stuff all happens when you executed the script, and has nothing to do with any ssh logout/putty closing.
When you then finally close your putty session, bash receives a "SIGHUP", but doesn't forward it to any other process (since there are no jobs left)
In the other case, bash did still have a job left, which it then sent the SIGHUP to, causing it to end (as you noticed)
Hope this helps

Resources