Using child process id in arguments to child, especially log file - bash

We can access the process id of the current SHELL process with $$, like so:
$ echo $$
9777
But, this is the pid of the current process, the shell - not a child process.
And, we can reference the process id of the last, backgrounded child process, like so:
$ date &
[1] 10765
Thu Aug 14 10:30:04 CDT 2014
[1]+ Done date
$ echo $!
10765
(Note: I rearranged the above output to make it more readable. The prompt may appear in the middle of the output text.)
I don't think there is a way to directly pass a child's process to itself. So, what is the simplest way to embed a child's process id in its arguments, especially a log file?
This is my best approach, at the moment:
$ eval "date >& /tmp/log &"; wait; mv /tmp/log /tmp/log.$!
[1] 10884
[1]+ Done date &>/tmp/log
$ eval "date >& /tmp/log &"; wait; mv /tmp/log /tmp/log.$!
[1] 10891
[1]+ Done date &>/tmp/log
$ ls /tmp/log*
/tmp/log.10884 /tmp/log.10891
Is there a more elegant way to achieve the same effect? Is there another magic shell variable that is interpreted as the child process id during the evaluation of the child's input arguments? I don't see how, without some serious internal shell magic.
Thanks!

Summon a subshell and use exec so the child process would inherit the process id of the calling subshell:
( exec date &>"/tmp/log.$BASHPID" )
On shells not supporting $BASHPID, you can just summon a general shell:
/bin/bash -c 'exec date &>"/tmp/log.$$"'
Or
/bin/sh -c 'exec date >"/tmp/log.$$" 2>&1'
See exec from Wikipedia.

You can't pass a child's PID as an argument to the child because the argument list is constructed before the child is created.
However, you can cheat. If the child process you want to create is a simple utility (as opposed to a bash pipeline or sequence of bash commands), you can use an explicit bash child process and pass its PID to the command, or use it as the name of a log file:
bash -c 'date >&/tmp/log.$$' &
This relies on the fact that when bash is invoked to execute a single command with -c cmd, it uses exec to replace itself with the command, with the consequence that the PID does not change. Hence, it is possible to use $$ (which in this case will be interpreted by the child bash process, since it is single quoted) as the PID of the command process.
If the only point is to give a unique name to the log file, then it doesn't really matter whether the PID is that of the command or the child bash process, and then you could pass a more complex argument to bash -c.

Version 4 of bash introduced a BASHPID variable which does store the process ID of the current shell, rather than the parent shell that $$ stores.

Related

Launch process from Bash script in the background, then bring it to foreground

The following is a simplified version of some code I have:
#!/bin/bash
myfile=file.txt
interactive_command > $myfile &
pid=$!
# Use tail to wait for the file to be populated
while read -r line; do
first_output_line=$line
break # we only need the first line
done < <(tail -f $file)
rm $file
# do stuff with $first_output_line and $pid
# ...
# bring `interactive_command` to foreground?
I want to bring interactive_command to the foreground after its first line of output has been stored to a variable, so that a user can interact with it via calling this script.
However, it seems that using fg %1 does not work in the context of a script, and I cannot use fg with the PID. Is there a way that I can do this?
(Also, is there a more elegant way of capturing the first line of output, without writing to a temp file?)
Job control using fg and bg are only available on interactive shells (i.e. when typing commands in a terminal). Usually the shell scripts run in non-interactive shells (same reason why aliases don't work in shell scripts by default)
Since you already have the PID stored in a variable, foregrounding the process is same as waiting on it (See Job Control Builtins). For example you could just do
wait "$pid"
Also what you have is a basic version of coproc bash built-in which allows you get the standard output messages captured from background commands. It exposes two file descriptors stored in an array, using which one can read outputs from stdout or feed inputs to its stdin
coproc fdPair interactive_command
The syntax is usually coproc <array-name> <cmd-to-bckgd>. The array is populated with the file descriptor id's by the built-in. If no variable is used explicitly, it is populated under COPROC variable. So your requirement can be written as
coproc fdPair interactive_command
IFS= read -r -u "${fdPair[0]}" firstLine
printf '%s\n' "$firstLine"

pass GLOBIGNORE to a bash invocation

The bash manual page states
If the shell is started with the effective user (group) id not equal to
the real user (group) id, [...] the SHELLOPTS, BASHOPTS, CDPATH, and
GLOBIGNORE variables if they appear in the environment, are ignored
So normally this happens.
> export GLOBIGNORE='*t*'
> echo *
afile
> bash -i
>> # look, the variable is passed through
>> $ echo $GLOBIGNORE
*t*
>> # but to no effect
>> $ echo *
afile anotherfile athirdfile
I do not think it would make much sense to fake the real user id to enable passing GLOBIGNORE and a number of other unwanted side-effects.
Is it possibile to make the subshell respect an exported GLOBIGNORE?
Some other shell hacks may come to the rescue. All these solutions require at least to modify the shell invocation, but make the subshell start readily prepared.
As shell startup is different on interactive shells, two strategies are needed.
Interactive
When starting an interactive session, bash normally sources the default ~/.bashrc file. There is a switch to change where to look for this file. This can be exploited without loss as long as the script passed in there redirects to the original location.
> echo 'GLOBIGNORE=*t*' > rc
> echo 'source ~/.bashrc' >> rc
> bash --rcfile rc -i
>> echo *
Non-Interactive, Modifyable Command String
As Cyrus already pointed out, one could simply augment the command with the assignment so that it happens inside the subshell to begin with.
> bash -c 'GLOBIGNORE="*t*" ; echo *'
Fully Automated
If modification of the passed commands should be avoided, another special variable can be employed. It is called BASH_ENV and denotes a script to source when starting up a non-interactive session. With this, a strategy similar to --rcfile arises.
> echo 'GLOBIGNORE=*t*' > rc
> BASH_ENV=rc bash -c "echo *"
Or, to be even more sleazy and avoid the temporary file rc, we can force piping, which is clearly not intended as the value - is not regarded as the standard input.
> echo 'GLOBIGNORE=*t*' | BASH_ENV=/dev/stdin bash -c "echo *"

Escaping last bg job pid ($!) on mac osx shell

I am trying to run a background job and get its PID from the bash command line, such as this:
$ cat &
$ echo $!
These two commands work perfectly, but if I try to inline them into one line I run into problems with bash history expansion conflicting with $!:
$ (cat &); echo $!;
-bash: !: event not found
I have tried various types of quoting around the exclamation mark, but the most I could get was for echo to display the literal string "$!".
Any help will be appreciated!
You are putting the first command in the background with the & delimiter. You can easily capture the pid of the backgrouded process by assigning $! to a variable to later use to kill the process (or you can write it to a tmp file): E.g.
cat & savpid=$!; ...do stuff...; kill $savpid (or kill %1 if no other jobs running)
You also have to option to check the status of your backgrounded process with the jobs command to list backgrounded jobs, and the use of the fg (foreground) command to bring the command to the foreground. Let us know if you have any further questions.
In order to accomplish what you are attempting, you must redirect stderr and stdout before backgrounding the process:
cat 2>&1 & echo $!

When does command substitution spawn more subshells than the same commands in isolation?

Yesterday it was suggested to me that using command substitution in bash causes an unnecessary subshell to be spawned. The advice was specific to this use case:
# Extra subshell spawned
foo=$(command; echo $?)
# No extra subshell
command
foo=$?
As best I can figure this appears to be correct for this use case. However, a quick search trying to verify this leads to reams of confusing and contradictory advice. It seems popular wisdom says ALL usage of command substitution will spawn a subshell. For example:
The command substitution expands to the output of commands. These commands are executed in a subshell, and their stdout data is what the substitution syntax expands to. (source)
This seems simple enough unless you keep digging, on which case you'll start finding references to suggestions that this is not the case.
Command substitution does not necessarily invoke a subshell, and in most cases won't. The only thing it guarantees is out-of-order evaluation: it simply evaluates the expressions inside the substitution first, then evaluates the surrounding statement using the results of the substitution. (source)
This seems reasonable, but is it true? This answer to a subshell related question tipped me off that man bash has this to note:
Each command in a pipeline is executed as a separate process (i.e., in a subshell).
This brings me to the main question. What, exactly, will cause command substitution to spawn a subshell that would not have been spawned anyway to execute the same commands in isolation?
Please consider the following cases and explain which ones incur the overhead of an extra subshell:
# Case #1
command1
var=$(command1)
# Case #2
command1 | command2
var=$(command1 | command2)
# Case #3
command1 | command 2 ; var=$?
var=$(command1 | command2 ; echo $?)
Do each of these pairs incur the same number of subshells to execute? Is there a difference in POSIX vs. bash implementations? Are there other cases where using command substitution would spawn a subshell where running the same set of commands in isolation would not?
Update and caveat:
This answer has a troubled past in that I confidently claimed things that turned out not to be true. I believe it has value in its current form, but please help me eliminate other inaccuracies (or convince me that it should be deleted altogether).
I've substantially revised - and mostly gutted - this answer after #kojiro pointed out that my testing methods were flawed (I originally used ps to look for child processes, but that's too slow to always detect them); a new testing method is described below.
I originally claimed that not all bash subshells run in their own child process, but that turns out not to be true.
As #kojiro states in his answer, some shells - other than bash - DO sometimes avoid creation of child processes for subshells, so, generally speaking in the world of shells, one should not assume that a subshell implies a child process.
As for the OP's cases in bash (assumes that command{n} instances are simple commands):
# Case #1
command1 # NO subshell
var=$(command1) # 1 subshell (command substitution)
# Case #2
command1 | command2 # 2 subshells (1 for each pipeline segment)
var=$(command1 | command2) # 3 subshells: + 1 for command subst.
# Case #3
command1 | command2 ; var=$? # 2 subshells (due to the pipeline)
var=$(command1 | command2 ; echo $?) # 3 subshells: + 1 for command subst.;
# note that the extra command doesn't add
# one
It looks like using command substitution ($(...)) always adds an extra subshell in bash - as does enclosing any command in (...).
I believe, but am not certain these results are correct; here's how I tested (bash 3.2.51 on OS X 10.9.1) - please tell me if this approach is flawed:
Made sure only 2 interactive bash shells were running: one to run the commands, the other to monitor.
In the 2nd shell I monitored the fork() calls in the 1st with sudo dtruss -t fork -f -p {pidOfShell1} (the -f is necessary to also trace fork() calls "transitively", i.e. to include those created by subshells themselves).
Used only the builtin : (no-op) in the test commands (to avoid muddling the picture with additional fork() calls for external executables); specifically:
:
$(:)
: | :
$(: | :)
: | :; :
$(: | :; :)
Only counted those dtruss output lines that contained a non-zero PID (as each child process also reports the fork() call that created it, but with PID 0).
Subtracted 1 from the resulting number, as running even just a builtin from an interactive shell apparently involves at least 1 fork().
Finally, assumed that the resulting count represents the number of subshells created.
Below is what I still believe to be correct from my original post: when bash creates subshells.
bash creates subshells in the following situations:
for an expression surrounded by parentheses ( (...) )
except directly inside [[ ... ]], where parentheses are only used for logical grouping.
for every segment of a pipeline (|), including the first one
Note that every subshell involved is a clone of the original shell in terms of content (process-wise, subshells can be forked from other subshells (before commands are executed)).
Thus, modifications of subshells in earlier pipeline segments do not affect later ones.
(By design, commands in a pipeline are launched simultaneously - sequencing only happens through their connected stdin/stdout pipes.)
bash 4.2+ has shell option lastpipe (OFF by default), which causes the last pipeline segment NOT to run in a subshell.
for command substitution ($(...))
for process substitution (<(...))
typically creates 2 subshells; in the case of a simple command, #konsolebox came up with a technique to only create 1: prepend the simple command with exec (<(exec ...)).
background execution (&)
Combining these constructs will result in more than one subshell.
In Bash, a subshell always executes in a new process space. You can verify this fairly trivially in Bash 4, which has the $BASHPID and $$ environment variables:
$$ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the subshell.
BASHPID Expands to the process id of the current bash process. This differs from $$ under certain circumstances, such as subshells that do not require bash to be re-initialized
in practice:
$ type echo
echo is a shell builtin
$ echo $$-$BASHPID
4671-4671
$ ( echo $$-$BASHPID )
4671-4929
$ echo $( echo $$-$BASHPID )
4671-4930
$ echo $$-$BASHPID | { read; echo $REPLY:$$-$BASHPID; }
4671-5086:4671-5087
$ var=$(echo $$-$BASHPID ); echo $var
4671-5006
About the only case where the shell can elide an extra subshell is when you pipe to an explicit subshell:
$ echo $$-$BASHPID | ( read; echo $REPLY:$$-$BASHPID; )
4671-5118:4671-5119
Here, the subshell implied by the pipe is explicitly applied, but not duplicated.
This varies from some other shells that try very hard to avoid fork-ing. Therefore, while I feel the argument made in js-shell-parse misleading, it is true that not all shells always fork for all subshells.

How to change shell to dash from bash

I want to execute some scripts on dash shell compared to standard default bash. This is an example (test.sh)
#!/bin/dash
echo $SHELL
echo $0
This execution gives me
/bin/bash
./test.sh
as output. I was expecting '/bin/dash' as output.
If this is wrong, can someone let me know how do I actually work on dash.
Thanks
SHELL environment variable picks up the value from /etc/passwd. (It denotes the path to user's preferred command language interpreter.)
This value wouldn't change if you change the shell in your session or your script.
You can validate that you are running dash by adding the command
ps | grep $$
The $$ variable contains the PID of the process of the running shell.
This one would show the exact command.
ps o command --no-header --pid "$$"

Resources