Are shell functions exported to subprocesses, or not? - bash

According to man bash, shell functions are only exported to subprocesses if they are explicitly exported by using export or declare -x. Also, parentheses and backticks (including $(...)) run in subprocesses. So then why does this work?
#!/bin/bash
function x { echo x; }
x
(x)
echo `x`
echo $(x)
bash -c x
I would expect to see "x" followed by 4 errors. In fact, I see 4 exes, followed by one error. How is this explained?

The bash man page states
When a simple command other than a builtin or shell function is to be executed, it is invoked in a separate execution
environment that consists of the following.
...
shell variables and functions marked for export, along with variables exported for the command, passed in the environment
and also
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell
environment that is a duplicate of the shell environment, except that traps caught by the shell are reset to
the values that the shell inherited from its parent at invocation. Builtin commands that are invoked as part
of a pipeline are also executed in a subshell environment. Changes made to the subshell environment cannot
affect the shell's execution environment.
(emphasis mine)
Only the last command bash -c x is executed in a separate execution environment.

Thanks to axiac for posting the manual link. Usually I've found what I need in the man page, but here it's a little deeper. So I learned:
There is a thing called a "shell environment", which extends the usual process environment (environment variables, open files, etc) with defined functions (among other things, like shell variables).
Command substitution, commands grouped with parentheses (and asynchronous commands) inherit a copy of the whole shell environment, except for traps.
The manual is a little confusing, as it also uses the phrases "execution environment" and "subshell environment" to refer to the same thing as the shell environment, but they all seem to refer to the same concept.
Thanks everyone!

Related

Script runs when executed but fails when sourced

Original Title: Indirect parameter substitution breaks when the script is sourced (zsh)
zsh 5.7.1 (x86_64-apple-darwin19.0)
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
I’m developing a shell script on a Mac and I’m trying to keep it portable between bash & zsh, so array indexing is a consideration. I know that I can set KSH_ARRAYS to get indexing to start at 0, but I decided to query the OS for the shell that’s in use and set the start index accordingly, which led to the issue described below.
It made sense (to me anyway!) to use indirect expansion, which is what led to the problem. Consider the script indirect.sh:
#! /bin/bash
declare -r ARRAY_START_BASH=0
declare -r ARRAY_START_ZSH=1
declare -r SHELL_BASH=0
declare -r SHELL_ZSH=1
# Indirect expansion is used to reference the values of the variables declared
# in this case statement e.g. ${!ARRAY_START}
case $(basename $SHELL) in
"bash" )
declare -r SHELL_ID=SHELL_BASH
declare -r ARRAY_START=ARRAY_START_BASH
;;
"zsh" )
declare -r SHELL_ID=SHELL_ZSH
declare -r ARRAY_START=ARRAY_START_ZSH
;;
* )
return 1
;;
esac
echo "Shell ID: ${!SHELL_ID} Index arrays from: ${!ARRAY_START}"
It works fine when run from the command line while in the same directory:
<my home> ~ % echo "$(./indirect.sh)"
Shell ID: 1 Index arrays from: 1
Problems arise when I source the script:
<my home> ~ % echo "$(. ~/indirect.sh)"
/Users/<me>/indirect.sh:28: bad substitution
I don’t understand why sourcing the script changes the behavior of the parameter expansion.
Is this expected behavior? If so, I’d be grateful if someone could explain it and hopefully, offer a work around.
The problem described in the original post has nothing to do with indirect expansion. The difference in behavior is a result of different shells being invoked depending on whether the script is “executed” or “sourced”. These differences reveal the basic flaw in deriving the shell from the $SHELL variable that underpins the script's design. If the shell defined in $SHELL does not match the shebang, the script will fail either when sourced or executed. An explanation follows.
Indirect expansion doesn’t offer value in the given scenario because values could just as easily be assigned directly. They’ll have to be assigned that way regardless given the different syntax used for indirect expansion between shells. In fact, other syntax differences between shells makes the entire premise for detecting the shell moot! However, putting that aside, the difference in behavior is a result of different shells being invoked based on whether the script is “executed” or “sourced”. The behavior of sourcing is well documented with numerous explanations on the web, but for context here’s how it works:
Executing a Script
Use the “./“ syntax to execute a script.
When run this way, the script executes in a sub-shell. Any changes the
script makes to it’s shell are applied to the sub-shell, not the shell
in which the script was launched, so those changes are lost when the
shell exits because the sub-shell in which it executed is destroyed as
well. For example, if the script changes the working directory, it
does so in the sub-shell. The working directory of the main shell that
launched the script is unchanged when the script terminates. If you
want to make changes to the shell in which the script was launched, it
must be sourced.
Sourcing a Script
Use the “source “ syntax to source a
script. When run this way, the script essentially becomes an argument
for the source command, which handles invoking the appropriate
execution. Some shells (e.g. ksh) use a single period “.” instead of
“source”.
When a script is executed with the “./“ syntax, the shebang at the top of the file is used to determine which shell to use. When a script is sourced, the shebang is ignored and the shell in which the script is launched is used instead. Also note that the period that appears in the “./“ command syntax used to execute a script, is not related to the period that’s occasionally used as an alias for the source command.
The script in the post uses bash in the shebang statement, so it works when executed because it’s run using bash. When it’s sourced from zsh, it encounters the incorrect indirect expansion syntax:
“${!A_VAR}"
The correct syntax is:
"${(P)A_VAR}"
However, correcting the syntax won’t help because it will then fail when executed. The shebang will invoke bash and the syntax will be wrong again. That renders indirection useless for accessing a variable designed to indicate the shell in use. More importantly, a design based on querying an environment variable for the shell is flawed due to differences in the shell that’s ultimately used depending on whether the script is executed or sourced.
To add to your answer (what I'm going to say is too long for a comment), I can not think of any application, why your script could be useful if not sourced. Actually, I came accross the need of such a script by myself in exactly one occasion:
Since I use as interactive shell not only zsh, but also sometimes bash, so I have written my .zshrc and .bashrc to set up everything (including defining variables and shell functions for interactive use). In order to safe work,
I try to put code which works under both bash and zsh into a single file (say: .commonrc), and my .zshrc and .bashrc have inside them a
source .commonrc
While many things are so different in bash and zsh, that I can't put them into .commonrc, some can, provided I do some tweaking. One reason for headache is obviously the different indexing of arrays, which you seemingly try to solve. So I have also a similar feature. However, I don't nee ca case construct for this. Instead, my .bashrc looks like this (using your naming of the variables):
...
declare -r ARRAY_START=0
source .commonrc
...
and my .zshrc looks like this:
...
declare -r ARRAY_START=1
source .commonrc
...
Since it does not happen that the .bashrc is run from a zsh and vice versa, I don't need to query what kind of shell I have.

Access local shell variables in vim

In vim I can access my bash environment variables such as $PWD and $PATH. I would like to know how to access my temporary shell variables in vim too.
For example, suppose I was in my terminal and define a variable foo="bar". Then I enter vim and try to access this variable with the following command :!echo $foo, but it does not recognize this variable. From my understanding, vim starts a new shell each time a bash command is invoked and then closes it immediately after. Is there a way to use the same shell in vim that my local variable foo was defined in?
No, you can't interact with the parent shell from a subprocess it spawned (without that shell's active participation, which isn't reasonably/practically available in the scenario at hand) -- but you can export your variables to make them accessible to new shells started in child processes.
Running
set -a
...will make any variable defined going forward be automatically exported to the environment, even without an explicit export command.
Since (unlike the C system() function) vim's system() honors the SHELL environment variable, if SHELL=/bin/bash (or :set shell=/bin/bash has been run in vim), you can also invoke exported functions from vim. That is, if you define the function and export it as follows:
foo() { echo "bar"; }
export -f foo
...then you can invoke it with !foo from inside vim.
Even then, however, this is running in a new, transient shell instance, not the original parent process.
Explanation
Environment variables and shell variables are two entirely different concepts, but as we manipulate them in a similar way in bash, it's easy to get confused.
Whenever a process is created (by fork), it may include an environment, given by its parent at fork-time. The child process may then access and modify its content. How this is done as a user depends on the program :
In vim, you can access an environment variable like this : :echo $foo
In bash, you can access it like this : $ echo "$foo"
In most programming languages, you can access it with a syntax coherent with the rest of the language, such as ENV['foo'] in ruby
On the other hand, a program may allocate memory for any internal use, but notably, it will quite often define and use variables. Once again, this depends on the program :
In vim, you would use the :let command to assign an internal variable
In bash, you would assign a variable with $ foo='bar', and then read it with $ echo "$foo"
In most programming languages, you have a variation of the foo='bar' syntax, sometimes with type declarations, etc
As you can see, bash uses the same syntax to read an environment variable and one of its own private variables, which can lead to some confusion.
When you execute vim from your bash shell, the environment is copied over from the parent process (bash) to the child (vim), but the private memory of bash (including the variables you may have defined) are not.
Thus, accessing them from the child process would require some inter-process communication mechanism, between parent and child. While technically doable, this option is not implemented in bash nor vim.
Solution
In order for your variable to be accessible from vim (or any forked process, for that matter), you need it to be present in the environment of your vim process.
Several options to do that :
$ export foo='bar' : This will mark your variable for export to the environment of subsequently executed commands. That's what you want in most cases.
$ foo='bar' vim : This adds your variable to the environment of this vim command. Very useful for troubleshooting, or for one-liners.
$ set -a : As you can see in bash manpage, this marks every subsequent definitions for export to the environment of subsequent commands. It's essentially equivalent to prepending every subsequent definition by export.
To go further
The question uses the :!echo $foo syntax to display the value of foo, which is yet another usecase. The ! here is actually an escape sequence that allows you to execute a shell command from vim.
However, vim cannot execute anything in the parent shell (the one you executed the vim command in), so it creates a new bash shell in a child process, executes echo in it, and displays the result.
In the current case, the result is mostly the same, but it could easily be misleading in other situations, so it's important to understand what is happening here.
There is another vim syntax, using expand, that allows one to lookup variables : :echo expand("$foo")
It however works entirely differently.
If no internal variable named foo exists, vim will invoke a shell to look it up (similarly to what ! would do).
This options is way slower than an environment lookup, and not recommended for most usecases.
If you want to use a value from your shell on the :substitute command, there's actually a way to do it.
I don't know if it solves your need but here we go.
Let's say we want to substitute Mydir by your PWD:
:s/Mydir/\=expand($PWD)/g

Why can you set environment variables in Bash functions but not in the script itself

Why does this work:
# a.sh
setEnv() {
export TEST_A='Set'
}
when this doesn't:
# b.sh
export TEST_B='Set'
Ex:
> source a.sh
> setEnv
> env | grep TEST_A
TEST_A=Set
> b.sh
> env | grep TEST_B
I understand why running the script doesn't work and what to do to make it work (source b.sh etc), but I'm curious to why the function works.
This is on OS X if that matters.
You need to understand the difference between sourcing and executing a script.
Sourcing runs the script from the parent-shell in which the script is invoked; all the environment variables are retained until the parent-shell is terminated (the terminal is closed, or the variables are reset or unset), whereas
Execute forks a new shell from the parent shell and those variables including your export variables are retained only in the sub-shell's environment and destroyed at the end of script termination.
i.e. the sub-shell ( imagine it being an environment) created in the first case to hold the variables are not allocated in scope of a separate child environment but are just added in the parents' ( e.g. imagine an extra memory cell, maintained by the parent ) environment which is held until you have the session open. But executing a script is, imagine a simple analogy, calling a function whose variables are in stored in stack which loose scope at the end of function call. Likewise, the forked shell's environment looses scope at the end of its termination.
So it comes down to this, even if you have a function to export your variable, if you don't source it to the current shell and just plainly execute it, the variable is not retained; i.e.
# a.sh
setEnv() {
export TEST_A='Set'
}
and if you run it in the shell as
bash script.sh # unlike/NOT source script.sh
env | grep TEST_A
# empty
Executing a function does not, in and of itself, start a new process like b.sh does.
From the man page (emphasis on the last sentence):
FUNCTIONS
A shell function, defined as described above under SHELL GRAMMAR,
stores a series of commands for later execution. When the name of a
shell function is used as a simple command name, the list of commands
associated with that function name is executed. **Functions are executed
in the context of the current shell; no new process is created to
interpret them (contrast this with the execution of a shell script).**
I understand why running the script doesn't work and what to do to make it work (source b.sh etc)
So you already understand the fact that executing b.sh directly -- in a child process, whose changes to the environment fundamentally won't be visible to the current process (shell) -- will not define TEST_B in the current (shell) process, so we can take this scenario out of the picture.
I'm curious why the function works.
When you source a script, you execute it in the context of the current shell - loosely speaking, it is as if you had typed the contents of the script directly at the prompt: any changes to the environment, including shell-specific elements such as shell variables, aliases, functions, become visible to the current shell.
Therefore, after executing source a.sh, function setEnv is now available in the current shell, and invoking it executes export TEST_A='Set', which defines environment variable TEST_A in the current shell (and subsequently created child processes would see it).
Perhaps your misconception is around what chepner's helpful answer addresses: in POSIX-like shells, functions run in the current shell - in contrast with scripts (when run without source), for which a child process is created.
This is on OS X if that matters.
Not in this case, because only functionality built into bash itself is used.

Access variable declared inside Makefile command

I'm trying to access a variable declared by previous command (inside a Makefile).
Here's the Makefile:
all:
./script1.sh
./script2.sh
Here's the script declaring the variable I want to access,script1.sh:
#!/usr/bin/env bash
myVar=1234
Here's the script trying to access the variable previously defined, script2.sh:
#!/usr/bin/env bash
echo $myVar
Unfortunately when I run make, myVar isn't accessible. Is there an other way around? Thanks.
Make will run each shell command in its own shell. And when the shell exits, its environment is lost.
If you want variables from one script to be available in the next, there are constructs which will do this. For example:
all:
( . ./script1.sh; ./script2.sh )
This causes Make to launch a single shell to handle both scripts.
Note also that you will need to export the variable in order for it to be visible in the second script; unexported variables are available only to the local script, and not to subshells that it launches.
UPDATE (per Kusalananda's comment):
If you want your shell commands to populate MAKE variables instead of merely environment variables, you may have options that depend on the version of Make that you are running. For example, in BSD make and GNU make, you can use "variable assignment modifiers" including (from the BSD make man page):
!= Expand the value and pass it to the shell for execution and
assign the result to the variable. Any newlines in the result
are replaced with spaces.
Thus, with BSD make and GNU make, you could do this:
$ cat Makefile
foo!= . ./script1.sh; ./script2.sh
all:
#echo "foo=${foo}"
$
$ cat script1.sh
export test=bar
$
$ cat script2.sh
#!/usr/bin/env bash
echo "$test"
$
$ make
foo=bar
$
Note that script1.sh does not include any shebang because it's being sourced, and is therefore running in the calling shell, whatever that is. That makes the shebang line merely a comment. If you're on a system where the default shell is POSIX but not bash (like Ubuntu, Solaris, FreeBSD, etc), this should still work because POSIX shells should all understand the concept of exporting variables.
The two separate invocations of the scripts create two separate environments. The first script sets a variable in its environment and exits (the environment is lost). The second script does not have that variable in its environment, so it outputs an empty string.
You can not have environment variables pass between environments other than between the environments of a parent shell to its child shell (not the other way around). The variables passed over into the child shell are only those that the parent shell has export-ed. So, if the first script invoked the second script, the value would be outputted (if it was export-ed in the first script).
In a shell, you would source the first file to set the variables therein in the current environment (and then export them!). However, in Makefiles it's a bit trickier since there's no convenient source command.
Instead you may want to read this StackOverflow question.
EDIT in light of #ghoti's answer: #ghoti has a good solution, but I'll leave my answer in here as it explains a bit more verbosely about environment variables and what we can do and not do with them with regards to passing them between environments.

Do subshells started by () read startup files?

Parentheses are used in shell to group commands, executing them in a subshell, so that they don't affect parent shell environment.
Now, I wonder if this spawned subshells do read init files, like any other shells.
From direct experience I'd say they don't.
But I don't find any place where that is stated.
Also, is this different for different types of shells?
In general, the behaviour of shells is not particularly well-defined since the only applicable standard was essentially the result of reverse-engineering the behaviour of various commonly-used shells. Nonetheless, there is an expectation that shells will converge to the standard, albeit with extensions.
Having said that, here's what Posix says about (...):
(compound-list)
Execute compound-list in a subshell environment.
And a subshell environment:
A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action. Changes made to the subshell environment shall not affect the shell environment. Command substitution, commands that are grouped with parentheses, and asynchronous lists shall be executed in a subshell environment. Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment. All other commands shall be executed in the current shell environment.
The take-away here is that the subshell environment is a "duplicate of the shell environment", and not a new shell; the only difference is the specific exception for signal traps. So it is pretty clearly not expected that the subshell will undergo reinitialization, such as rereading startup files.
Posix only provides one requirement for start-up files, which is documented in Section 4 in the description of the sh utility:
ENV
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion by the shell, and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
Most shells implement a richer set of start-up files with specific names, so that the ENV variable may not be necessary. So the fact that Posix states "when and only when an interactive shell is invoked" is only indicative, but I think it is a good indication.
When a subshell is started, it is just a child resulting from a fork(), thus it inherits all from the father and doesn't need to read the config files, whose it already knows.
Conversely, when a shell is exec()-uted, it looses everything except PIDs and redirections, thus it has to read again config files.

Resources