Do subshells started by () read startup files? - bash

Parentheses are used in shell to group commands, executing them in a subshell, so that they don't affect parent shell environment.
Now, I wonder if this spawned subshells do read init files, like any other shells.
From direct experience I'd say they don't.
But I don't find any place where that is stated.
Also, is this different for different types of shells?

In general, the behaviour of shells is not particularly well-defined since the only applicable standard was essentially the result of reverse-engineering the behaviour of various commonly-used shells. Nonetheless, there is an expectation that shells will converge to the standard, albeit with extensions.
Having said that, here's what Posix says about (...):
(compound-list)
Execute compound-list in a subshell environment.
And a subshell environment:
A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action. Changes made to the subshell environment shall not affect the shell environment. Command substitution, commands that are grouped with parentheses, and asynchronous lists shall be executed in a subshell environment. Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment. All other commands shall be executed in the current shell environment.
The take-away here is that the subshell environment is a "duplicate of the shell environment", and not a new shell; the only difference is the specific exception for signal traps. So it is pretty clearly not expected that the subshell will undergo reinitialization, such as rereading startup files.
Posix only provides one requirement for start-up files, which is documented in Section 4 in the description of the sh utility:
ENV
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion by the shell, and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
Most shells implement a richer set of start-up files with specific names, so that the ENV variable may not be necessary. So the fact that Posix states "when and only when an interactive shell is invoked" is only indicative, but I think it is a good indication.

When a subshell is started, it is just a child resulting from a fork(), thus it inherits all from the father and doesn't need to read the config files, whose it already knows.
Conversely, when a shell is exec()-uted, it looses everything except PIDs and redirections, thus it has to read again config files.

Related

How to predictably run shell script in unknown user environment?

Summary
How can I guarantee that my shell scripts will do what I expect, regardless of the environment?
(Let's assume that people have alias'd and function'd everything they can, but that they haven't touched any system binaries eg. /bin/ls)
Explanation
I am distributing shell scripts as part of an app. These shell scripts are executed in the user's environment - this cannot be changed.
This means users may have aliases for anything and functions redefining "standard" behavior. There have already been a few cases when normal shell keywords have been redefined (eg. local), causing unexpected side effects and crashes.
The only tokens that cannot be defined as functions are as follows:
Bash:
! [[ ]] case coproc do done elif else esac fi for function if in select then time until while { }
ZSH:
! [[ case coproc do done elif else end esac fi for foreach function if nocorrect repeat select then time until while { }
I am aware that:
You can escape a word to skip alias lookup
You can use builtin to always run a builtin
You can use command to always run a command
However, builtin and command can be redefined, so \builtin <command> may not always do what I expect.
Aliases are not expanded in bash scripts (unless you explicitly request this), and functions are usually not inherited by child processes. The caller of your script just has to avoid sourcing it. Problems could be environment variables and file handles.
It is difficult to make a script completely self-containing. For instance, I have seen cases where even standard programs (ls, cat,....) are stored in different locations, which means that if you set up your own PATH and don't know anything about the target platform, you have to apply some heuristics (searching a list of "commonly known directories") and hope that your search is correct.
A more reliable way would be to require from the user of the script to provide a certain minimal configuration (typically containing the basic definition for a PATH) and pass this configuration as parameter to your script.
There is one problem pointed out in the comment by Renaud Pacalet, in that bash allows functions to be exported (using export -f), and in bash, you would have to find out which functions exist, and explicitly remove their definitions (similarily as you would do it with environment variables). However, I see that you have tagged your question by bash and zsh, and if you don't mind, which script language you are using, writing the script in zsh would be perhaps better, because zsh does not have exported functions.
One point to keep in mind is, that every shell, bash and zsh, processes on startup certain files, before the commands in your script have any chance to run. For instance, no matter how you start your zsh, it will always process /etc/zshenv. For instance, if your script at one point invokes a zsh child script too, it would again run /etc/zshenv.
Of course, those startup files could set up functions, and in zsh, aliases are (AFIK) even expanded inside scripts. The strategy would be therefore to initially loop over your environment variables, the currently defined functions, the currently defined aliases (in zsh), and remove those definitions. Then you set up your own definitions (functions, variables).

Create a custom `ls` but only for manual use

I'm thinking of writing my own ls command. Mostly as a learning experience, but I also think I can make it a bit more useful (for me) than the default.
I'm worried though that if I alias ls, this also interferes with any bash/sh scripts that use ls as it's output.
Is there a way to override ls, but only when it's not used in scripts (or pipes?)
You're worried that aliasing your version of ls will interfere with other processes.
Let's have a look at the POSIX standard.
From the man page of alias:
Historical versions of the KornShell have allowed aliases to be
exported to scripts that are invoked by the same shell. This is
triggered by the alias −x flag; it is allowed by this volume of
POSIX.1‐2008 only when an explicit extension such as −x is used. The
standard developers considered that aliases were of use primarily to
interactive users and that they should normally not affect shell
scripts called by those users; functions are available to such
scripts.
So, what does "normally" mean for bash? For instance, which version of ls would be used inside a shell script?
From the man page of bash:
Aliases are not expanded when the shell is not interactive, unless the
expand_aliases shell option is set using shopt (see the description of
shopt under SHELL BUILTIN COMMANDS below).
That means, you don't have to worry about shell scripts - they will use the unaliased version of ls.
But what about pipes? Again, we can combine those two man pages for great good:
From the man page of bash:
Each command in a pipeline is executed as a separate process (i.e., in
a subshell).
From the man page of alias:
An alias definition shall affect the current shell execution
environment and the execution environments of the subshells of the
current shell. When used as specified by this volume of POSIX.1‐2008,
the alias definition shall not affect the parent process of the
current shell nor any utility environment invoked by the shell
That is, while your alias will not be used inside shell scripts, it will be used in pipes.

Script runs when executed but fails when sourced

Original Title: Indirect parameter substitution breaks when the script is sourced (zsh)
zsh 5.7.1 (x86_64-apple-darwin19.0)
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
I’m developing a shell script on a Mac and I’m trying to keep it portable between bash & zsh, so array indexing is a consideration. I know that I can set KSH_ARRAYS to get indexing to start at 0, but I decided to query the OS for the shell that’s in use and set the start index accordingly, which led to the issue described below.
It made sense (to me anyway!) to use indirect expansion, which is what led to the problem. Consider the script indirect.sh:
#! /bin/bash
declare -r ARRAY_START_BASH=0
declare -r ARRAY_START_ZSH=1
declare -r SHELL_BASH=0
declare -r SHELL_ZSH=1
# Indirect expansion is used to reference the values of the variables declared
# in this case statement e.g. ${!ARRAY_START}
case $(basename $SHELL) in
"bash" )
declare -r SHELL_ID=SHELL_BASH
declare -r ARRAY_START=ARRAY_START_BASH
;;
"zsh" )
declare -r SHELL_ID=SHELL_ZSH
declare -r ARRAY_START=ARRAY_START_ZSH
;;
* )
return 1
;;
esac
echo "Shell ID: ${!SHELL_ID} Index arrays from: ${!ARRAY_START}"
It works fine when run from the command line while in the same directory:
<my home> ~ % echo "$(./indirect.sh)"
Shell ID: 1 Index arrays from: 1
Problems arise when I source the script:
<my home> ~ % echo "$(. ~/indirect.sh)"
/Users/<me>/indirect.sh:28: bad substitution
I don’t understand why sourcing the script changes the behavior of the parameter expansion.
Is this expected behavior? If so, I’d be grateful if someone could explain it and hopefully, offer a work around.
The problem described in the original post has nothing to do with indirect expansion. The difference in behavior is a result of different shells being invoked depending on whether the script is “executed” or “sourced”. These differences reveal the basic flaw in deriving the shell from the $SHELL variable that underpins the script's design. If the shell defined in $SHELL does not match the shebang, the script will fail either when sourced or executed. An explanation follows.
Indirect expansion doesn’t offer value in the given scenario because values could just as easily be assigned directly. They’ll have to be assigned that way regardless given the different syntax used for indirect expansion between shells. In fact, other syntax differences between shells makes the entire premise for detecting the shell moot! However, putting that aside, the difference in behavior is a result of different shells being invoked based on whether the script is “executed” or “sourced”. The behavior of sourcing is well documented with numerous explanations on the web, but for context here’s how it works:
Executing a Script
Use the “./“ syntax to execute a script.
When run this way, the script executes in a sub-shell. Any changes the
script makes to it’s shell are applied to the sub-shell, not the shell
in which the script was launched, so those changes are lost when the
shell exits because the sub-shell in which it executed is destroyed as
well. For example, if the script changes the working directory, it
does so in the sub-shell. The working directory of the main shell that
launched the script is unchanged when the script terminates. If you
want to make changes to the shell in which the script was launched, it
must be sourced.
Sourcing a Script
Use the “source “ syntax to source a
script. When run this way, the script essentially becomes an argument
for the source command, which handles invoking the appropriate
execution. Some shells (e.g. ksh) use a single period “.” instead of
“source”.
When a script is executed with the “./“ syntax, the shebang at the top of the file is used to determine which shell to use. When a script is sourced, the shebang is ignored and the shell in which the script is launched is used instead. Also note that the period that appears in the “./“ command syntax used to execute a script, is not related to the period that’s occasionally used as an alias for the source command.
The script in the post uses bash in the shebang statement, so it works when executed because it’s run using bash. When it’s sourced from zsh, it encounters the incorrect indirect expansion syntax:
“${!A_VAR}"
The correct syntax is:
"${(P)A_VAR}"
However, correcting the syntax won’t help because it will then fail when executed. The shebang will invoke bash and the syntax will be wrong again. That renders indirection useless for accessing a variable designed to indicate the shell in use. More importantly, a design based on querying an environment variable for the shell is flawed due to differences in the shell that’s ultimately used depending on whether the script is executed or sourced.
To add to your answer (what I'm going to say is too long for a comment), I can not think of any application, why your script could be useful if not sourced. Actually, I came accross the need of such a script by myself in exactly one occasion:
Since I use as interactive shell not only zsh, but also sometimes bash, so I have written my .zshrc and .bashrc to set up everything (including defining variables and shell functions for interactive use). In order to safe work,
I try to put code which works under both bash and zsh into a single file (say: .commonrc), and my .zshrc and .bashrc have inside them a
source .commonrc
While many things are so different in bash and zsh, that I can't put them into .commonrc, some can, provided I do some tweaking. One reason for headache is obviously the different indexing of arrays, which you seemingly try to solve. So I have also a similar feature. However, I don't nee ca case construct for this. Instead, my .bashrc looks like this (using your naming of the variables):
...
declare -r ARRAY_START=0
source .commonrc
...
and my .zshrc looks like this:
...
declare -r ARRAY_START=1
source .commonrc
...
Since it does not happen that the .bashrc is run from a zsh and vice versa, I don't need to query what kind of shell I have.

Are shell functions exported to subprocesses, or not?

According to man bash, shell functions are only exported to subprocesses if they are explicitly exported by using export or declare -x. Also, parentheses and backticks (including $(...)) run in subprocesses. So then why does this work?
#!/bin/bash
function x { echo x; }
x
(x)
echo `x`
echo $(x)
bash -c x
I would expect to see "x" followed by 4 errors. In fact, I see 4 exes, followed by one error. How is this explained?
The bash man page states
When a simple command other than a builtin or shell function is to be executed, it is invoked in a separate execution
environment that consists of the following.
...
shell variables and functions marked for export, along with variables exported for the command, passed in the environment
and also
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell
environment that is a duplicate of the shell environment, except that traps caught by the shell are reset to
the values that the shell inherited from its parent at invocation. Builtin commands that are invoked as part
of a pipeline are also executed in a subshell environment. Changes made to the subshell environment cannot
affect the shell's execution environment.
(emphasis mine)
Only the last command bash -c x is executed in a separate execution environment.
Thanks to axiac for posting the manual link. Usually I've found what I need in the man page, but here it's a little deeper. So I learned:
There is a thing called a "shell environment", which extends the usual process environment (environment variables, open files, etc) with defined functions (among other things, like shell variables).
Command substitution, commands grouped with parentheses (and asynchronous commands) inherit a copy of the whole shell environment, except for traps.
The manual is a little confusing, as it also uses the phrases "execution environment" and "subshell environment" to refer to the same thing as the shell environment, but they all seem to refer to the same concept.
Thanks everyone!

When are bash variables exported to subshells and/or accessible by scripts?

I'm confused over whether bash variables are exported to subshells and when they are accessible by scripts. My experience so far led me to believe that bash variables are automatically available to subshells. E.g.:
> FOO=bar
> echo $FOO
bar
> (echo $FOO)
bar
The above appears to demonstrate that bash variables are accessible in subshells.
Given this script:
#! /usr/bin/bash
# c.sh
func()
{
echo before
echo ${FOO}
echo after
}
func
I understand that calling the script in the current shell context gives it access to the current shell's variables:
> . ./c.sh
before
bar
after
If I were to call the script without the "dot space" precedent...
> ./c.sh
before
after
...isn't it the case that the script is called in a subshell? If so, and it's also true that the current shell's variables are available to subshells (as I inferred from the firstmost code-block), why is $FOO not available to c.sh when run this way?
Similarly, why is $FOO also unavailable when c.sh is run within parentheses - which I understood to mean running the expression in a subshell:
> (./c.sh)
before
after
(If this doesn't muddy this post with too many questions: if "./c.sh" and "(./c.sh)" both run the script in a subshell of the current shell, what's the difference between the two ways of calling?)
(...) runs ... in a separate environment, something most easily achieved (and implemented in bash, dash, and most other POSIX-y shells) using a subshell -- which is to say, a child created by fork()ing the old shell, but not calling any execv-family function. Thus, the entire in-memory state of the parent is duplicated, including non-exported shell variables. And for a subshell, this is precisely what you typically want: just a copy of the parent shell's process image, not replaced with a new executable image and thus keeping all its state in place.
Consider (. shell-library.bash; function-from-that-library "$preexisting_non_exported_variable") as an example: Because of the parens it fork()s a subshell, but it then sources the contents of shell-library.bash directly inside that shell, without replacing the shell interpreter created by that fork() with a separate executable. This means that function-from-that-library can see non-exported functions and variables from the parent shell (which it couldn't if it were execve()'d), and is a bit faster to start up (since it doesn't need to link, load, and otherwise initialize a new shell interpreter as happens during execve() operation); but also that changes it makes to in-memory state, shell configuration, and process attributes like working directory won't modify the parent interpreter that called it (as would be the case if there were no subshell and it weren't fork()'d), so the parent shell is protected from having configuration changes made by the library that could modify its later operation.
./other-script, by contrast, runs other-script as a completely separate executable; it does not retain non-exported variables after the child shell (which is not a subshell!) has been invoked. This works as follows:
The shell calls fork() to create a child. At this point in time, the child still has even non-exported variable state copied.
The child honors any redirections (if it was ./other-script >>log.out, the child would open("log.out", O_APPEND) and then fdup() the descriptor over to 1, overwriting stdout).
The child calls execv("./other-script", {"./other-script", NULL}), instructing the operating system to replace it with a new instance of other-script. After this call succeeds, the process running under the child's PID is an entirely new program, and only exported variables survive.

Resources