Does a Shell function run in a sub-shell? - bash

I'm trying to get around a problem that seems to me you cannot pass open db2 connection to a sub-shell.
My code organization is as follows:
Driver script (in my_driver.sh)
# foo.sh defines baz() bar(), which use a db2 connection
# Also the "$param_file" is set in foo.sh!
source foo.sh
db2 "connect to $dbName USER $dbUser using $dbPass"
function doit
{
cat $param_file | while read params
do
baz $params
bar $params
done
}
doit
I've simplified my code, but the above is enough the give the idea. I start the above:
my_driver.sh
Now, my real issue is that the db2 connection is not available in sub-shell:
I tried:
. my_driver.sh
Does not help
If I do it manually from the command line:
source foo.sh
And I set $params manually:
baz $params
bar $params
Then it does work! So it seems that doit or something else acts as if bar and baz are executed from a sub-shell.
I would be elated if I can somehow figure out how to pass db2 open connection to sub-shell would be best.
Otherwise, these shell functions seem to me that they run in a sub-shell. Is there a way around that?

The shell does not create a subshell to run a function.
Of course, it does create subshells for many other purposes, not all of which might be obvious. For example, it creates subshells in the implementation of |.
db2 requires that the all db2 commands have the same parent as the db2 command which established the connection. You could log the PID using something like:
echo "Execute db2 from PID $$" >> /dev/stderr
db2 ...
(as long as the db2 command isn't execute inside a pipe or shell parentheses.)
One possible problem in the code shown (which would have quite a different symptom) is the use of the non-standard syntax
function f
To define a function. A standard shell expects
f()
Bash understands both, but if you don't have a shebang line or you execute the scriptfile using the sh command, you will end up using the system's default shell, which might not be bash.

Found solution, but can't yet fully explain the problem ....
if you change doit as follows it works!
function doit
{
while read params
do
baz $params
bar $params
done < $param_file
}
Only, I'm not sure why? and how I can prove it ...
If I stick in debug code:
echo debug check with PID=$$ PPID=$PPID and SHLVL=$SHLVL
I get back same results with the | or not. I do understand that cat $param_file | while read params creates a subshell, however, my debug statements always show the same PID and PPID...
So my problem is solved, but I'm missing some explanations.
I also wonder if this question would not be more well suited in the unix.stackexchange community?

A shell function in such shells as sh (i.e. Dash) or Bash may be considered as a labelled commands group or named "code block" which may be called multiple times by its name. A command group surrounded by {} does not create a subshell or "fork" a process, but executes in the same process and environment.
Some might find it relatively similar to goto where function names represent labels as in other programming languages, including C, Basic, or Assembler. However, the statements vary quite greatly (e.g. functions return, but goto - doesn't) and Go To Statement may be Considered Harmful.
Shell Functions
Shell functions are a way to group commands for later
execution using a single name for the group. They are executed just
like a "regular" command. When the name of a shell function is used as
a simple command name, the list of commands associated with that
function name is executed. Shell functions are executed in the current
shell context; no new process is created to interpret them.
Functions are declared using this syntax:
fname () compound-command [ redirections ]
or
function fname [()] compound-command [ redirections ]
This defines a shell function named fname. The reserved word function is optional. If the function reserved word is supplied, the parentheses are optional.
Source: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html or man bash.
Grouping Commands Together
Commands may be grouped by writing either
(list)
or
{ list; }
The first of these executes the commands in a subshell. Builtin commands grouped into a (list) will not affect the current shell. The
second form does not fork another shell so is slightly more efficient.
Grouping commands together this way allows you to redirect their
output as though they were one program:
{ printf " hello " ; printf " world\n" ; } > greeting
Note that "}" must follow a control operator (here, ";") so that it is recognized as a reserved word and not as another command
argument.
Functions
The syntax of a function definition is
name () command
A function definition is an executable statement; when executed it installs a function named name and returns an exit status of zero. The command is normally a list enclosed between "{" and "}".
Source: https://linux.die.net/man/1/dash or man sh.
Transfers control unconditionally.
Used when it is otherwise impossible to transfer control to the desired location using other statements... The goto statement transfers control to the location specified by label. The goto statement must be in the same function as the label it is referring, it may appear before or after the label.
Source: https://en.cppreference.com/w/cpp/language/goto
Goto
... It performs a one-way transfer of control to another line of code; in
contrast a function call normally returns control. The jumped-to
locations are usually identified using labels, though some languages
use line numbers. At the machine code level, a goto is a form of
branch or jump statement, in some cases combined with a stack
adjustment. Many languages support the goto statement, and many do not...
Source: https://en.wikipedia.org/wiki/Goto
Related:
https://mywiki.wooledge.org/BashProgramming#Functions
https://uomresearchit.github.io/shell-programming-course/04-subshells_and_functions/index.html (Subshells and Functions...)
Is there a "goto" statement in bash?
What's the difference between "call" and "invoke"?
https://en.wikipedia.org/wiki/Call_stack
https://mywiki.wooledge.org/BashPitfalls

Related

Script runs when executed but fails when sourced

Original Title: Indirect parameter substitution breaks when the script is sourced (zsh)
zsh 5.7.1 (x86_64-apple-darwin19.0)
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
I’m developing a shell script on a Mac and I’m trying to keep it portable between bash & zsh, so array indexing is a consideration. I know that I can set KSH_ARRAYS to get indexing to start at 0, but I decided to query the OS for the shell that’s in use and set the start index accordingly, which led to the issue described below.
It made sense (to me anyway!) to use indirect expansion, which is what led to the problem. Consider the script indirect.sh:
#! /bin/bash
declare -r ARRAY_START_BASH=0
declare -r ARRAY_START_ZSH=1
declare -r SHELL_BASH=0
declare -r SHELL_ZSH=1
# Indirect expansion is used to reference the values of the variables declared
# in this case statement e.g. ${!ARRAY_START}
case $(basename $SHELL) in
"bash" )
declare -r SHELL_ID=SHELL_BASH
declare -r ARRAY_START=ARRAY_START_BASH
;;
"zsh" )
declare -r SHELL_ID=SHELL_ZSH
declare -r ARRAY_START=ARRAY_START_ZSH
;;
* )
return 1
;;
esac
echo "Shell ID: ${!SHELL_ID} Index arrays from: ${!ARRAY_START}"
It works fine when run from the command line while in the same directory:
<my home> ~ % echo "$(./indirect.sh)"
Shell ID: 1 Index arrays from: 1
Problems arise when I source the script:
<my home> ~ % echo "$(. ~/indirect.sh)"
/Users/<me>/indirect.sh:28: bad substitution
I don’t understand why sourcing the script changes the behavior of the parameter expansion.
Is this expected behavior? If so, I’d be grateful if someone could explain it and hopefully, offer a work around.
The problem described in the original post has nothing to do with indirect expansion. The difference in behavior is a result of different shells being invoked depending on whether the script is “executed” or “sourced”. These differences reveal the basic flaw in deriving the shell from the $SHELL variable that underpins the script's design. If the shell defined in $SHELL does not match the shebang, the script will fail either when sourced or executed. An explanation follows.
Indirect expansion doesn’t offer value in the given scenario because values could just as easily be assigned directly. They’ll have to be assigned that way regardless given the different syntax used for indirect expansion between shells. In fact, other syntax differences between shells makes the entire premise for detecting the shell moot! However, putting that aside, the difference in behavior is a result of different shells being invoked based on whether the script is “executed” or “sourced”. The behavior of sourcing is well documented with numerous explanations on the web, but for context here’s how it works:
Executing a Script
Use the “./“ syntax to execute a script.
When run this way, the script executes in a sub-shell. Any changes the
script makes to it’s shell are applied to the sub-shell, not the shell
in which the script was launched, so those changes are lost when the
shell exits because the sub-shell in which it executed is destroyed as
well. For example, if the script changes the working directory, it
does so in the sub-shell. The working directory of the main shell that
launched the script is unchanged when the script terminates. If you
want to make changes to the shell in which the script was launched, it
must be sourced.
Sourcing a Script
Use the “source “ syntax to source a
script. When run this way, the script essentially becomes an argument
for the source command, which handles invoking the appropriate
execution. Some shells (e.g. ksh) use a single period “.” instead of
“source”.
When a script is executed with the “./“ syntax, the shebang at the top of the file is used to determine which shell to use. When a script is sourced, the shebang is ignored and the shell in which the script is launched is used instead. Also note that the period that appears in the “./“ command syntax used to execute a script, is not related to the period that’s occasionally used as an alias for the source command.
The script in the post uses bash in the shebang statement, so it works when executed because it’s run using bash. When it’s sourced from zsh, it encounters the incorrect indirect expansion syntax:
“${!A_VAR}"
The correct syntax is:
"${(P)A_VAR}"
However, correcting the syntax won’t help because it will then fail when executed. The shebang will invoke bash and the syntax will be wrong again. That renders indirection useless for accessing a variable designed to indicate the shell in use. More importantly, a design based on querying an environment variable for the shell is flawed due to differences in the shell that’s ultimately used depending on whether the script is executed or sourced.
To add to your answer (what I'm going to say is too long for a comment), I can not think of any application, why your script could be useful if not sourced. Actually, I came accross the need of such a script by myself in exactly one occasion:
Since I use as interactive shell not only zsh, but also sometimes bash, so I have written my .zshrc and .bashrc to set up everything (including defining variables and shell functions for interactive use). In order to safe work,
I try to put code which works under both bash and zsh into a single file (say: .commonrc), and my .zshrc and .bashrc have inside them a
source .commonrc
While many things are so different in bash and zsh, that I can't put them into .commonrc, some can, provided I do some tweaking. One reason for headache is obviously the different indexing of arrays, which you seemingly try to solve. So I have also a similar feature. However, I don't nee ca case construct for this. Instead, my .bashrc looks like this (using your naming of the variables):
...
declare -r ARRAY_START=0
source .commonrc
...
and my .zshrc looks like this:
...
declare -r ARRAY_START=1
source .commonrc
...
Since it does not happen that the .bashrc is run from a zsh and vice versa, I don't need to query what kind of shell I have.

Purpose of #!/bin/false in bash script

While working on a project written in bash by my former colleague, I noticed that all .sh files contain nothing but function definitions start with #!/bin/false, which is, as I understand, a safety mechanism of preventing execution of include-only files.
Example:
my_foo.sh
#!/bin/false
function foo(){
echo foontastic
}
my_script.sh
#!/bin/bash
./my_foo.sh # does nothing
foo # error, no command named "foo"
. ./my_foo.sh
foo # prints "foontastic"
However when I don't use #!/bin/false, effects of both proper and improper use are exactly the same:
Example:
my_bar.sh
function bar(){
echo barvelous
}
my_script.sh
#!/bin/bash
./my_bar.sh # spawn a subshell, defines bar and exit, effectively doing nothing
bar # error, no command named "bar"
. ./my_bar.sh
bar # prints "barvelous"
Since properly using those scripts by including them with source in both cases works as expected, and executing them in both cases does nothing from the perspective of a parent shell and generate no error message concerning invalid use, what is exactly the purpose of #!/bash/false in those script?
In general, let’s consider a file testcode with bash code in it
#!/bin/bash
if [ "$0" = "${BASH_SOURCE[0]}" ]; then
echo "You are executing ${BASH_SOURCE[0]}"
else
echo "You are sourcing ${BASH_SOURCE[0]}"
fi
you can do three different things with it:
$ ./testcode
You are executing ./testcode
This works if testcode has the right permissions and the right shebang. With a shebang of #!/bin/false, this outputs nothing and returns a code of 1 (false).
$ bash ./testcode
You are executing ./testcode
This completely disregards the shebang (which can even be missing) and it only requires read permission, not executable permission. This is the way to call bash scripts from a CMD command line in Windows (if you have bash.exe in your PATH...), since there the shebang machanism doesn’t work.
$ . ./testcode
You are sourcing ./testcode
This also completely disregards the shebang, as above, but it is a complete different matter, because sourcing a script means having the current shell execute it, while executing a script means invoking a new shell to execute it. For instance, if you put an exit command in a sourced script, you exit from the current shell, which is rarely what you want. Therefore, sourcing is often used to load function definitions or constants, in a way somewhat resembling the import statement of other programming languages, and various programmers develop different habits to differentiate between scripts meant to be executed and include files to be sourced. I usually don’t use any extension for the former (others use .sh), but I use an extension of .shinc for the latter. Your former colleague used a shebang of #!/bin/false and one can only ask them why they preferred this to a zillion other possibilities. One reason that comes to my mind is that you can use file to tell these files apart:
$ file testcode testcode2
testcode: Bourne-Again shell script, ASCII text executable
testcode2: a /bin/false script, ASCII text executable
Of course, if these include files contain only function definitions, it’s harmless to execute them, so I don’t think your colleague did it to prevent execution.
Another habit of mine, inspired by the Python world, is to place some regression tests at the end of my .shinc files (at least while developing)
... function definitions here ...
[ "$0" != "${BASH_SOURCE[0]}" ] && return
... regression tests here ...
Since return generates an error in executed scripts but is OK in sourced scripts, a more cryptic way to get the same result is
... function definitions here ...
return 2>/dev/null || :
... regression tests here ...
The difference in using #!/bin/false or not from the point of view of the parent shell is in the return code.
/bin/false always return a failing return code (in my case 1, but not sure if it is standard).
Try that :
./my_foo.sh //does nothing
echo $? // shows "1", a.k.a failing
./my_bar.sh //does nothing
echo $? // shows "0", a.k.a. everything went right
So, using #!/bin/false not only documents the fact that the script is not intended to be executed, but also produces an error return code when doing so.

Bash - Special characters on command line [duplicate]

I'm looking for a way (other than ".", '.', \.) to use bash (or any other linux shell) while preventing it from parsing parts of command line. The problem seems to be unsolvable
How to interpret special characters in command line argument in C?
In theory, a simple switch would suffice (e.g. -x ... telling that the
string ... won't be interpreted) but it apparently doesn't exist. I wonder whether there is a workaround, hack or idea for solving this problem. The original problem is a script|alias for a program taking youtube URLs (which may contain special characters (&, etc.)) as arguments. This problem is even more difficult: expanding "$1" while preventing shell from interpreting the expanded string -- essentially, expanding "$1" without interpreting its result
Use a here-document:
myprogramm <<'EOF'
https://www.youtube.com/watch?v=oT3mCybbhf0
EOF
If you wrap the starting EOF in single quotes, bash won't interpret any special chars in the here-doc.
Short answer: you can't do it, because the shell parses the command line (and interprets things like "&") before it even gets to the point of deciding your script/alias/whatever is what will be run, let alone the point where your script has any control at all. By the time your script has any influence in the process, it's far too late.
Within a script, though, it's easy to avoid most problems: wrap all variable references in double-quotes. For example, rather than curl -o $outputfile $url you should use curl -o "$outputfile" "$url". This will prevent the shell from applying any parsing to the contents of the variable(s) before they're passed to the command (/other script/whatever).
But when you run the script, you'll always have to quote or escape anything passed on the command line.
Your spec still isn't very clear. As far as I know the problem is you want to completely reinvent how the shell handles arguments. So… you'll have to write your own shell. The basics aren't even that difficult. Here's pseudo-code:
while true:
print prompt
read input
command = (first input)
args = (argparse (rest input))
child_pid = fork()
if child_pid == 0: // We are inside child process
exec(command, args) // See variety of `exec` family functions in posix
else: // We are inside parent process and child_pid is actual child pid
wait(child_pid) // See variety of `wait` family functions in posix
Your question basically boils down to how that "argparse" function is implemented. If it's just an identity function, then you get no expansion at all. Is that what you want?

Using set -e / set +e in bash with functions

I've been using a simple bash preamble like this in my scripts:
#!/bin/bash
set -e
In conjunction with modularity / using functions this has bitten me today.
So, say I have a function somewhere like
foo() {
#edit: some error happens that make me want to exit the function and signal that to the caller
return 2
}
Ideally I'd like to be able to use multiple small files, include their functions in other files and then call these functions like
set +e
foo
rc=$?
set -e
. This works for exactly two layers of routines. But if foo is also calling subroutines like that, the last setting before the return will be set -e, which will make the script exit on the return - I cannot override this in the calling function. So, what I had to do is
foo() {
#calling bar() in a shielded way like above
#..
set +e
return 2
}
Which I find very counterintuitive (and also not what I want - what if in some contexts I'd like to use the function without shielding against failures, while in other contexts I want to handle the cleanup?) What's the best way to handle this? Btw. I'm doing this on OSX, I haven't tested whether this behaviour is different on Linux.
Shell functions don't really have "return values", just exit codes.
You could add && : to the caller, this makes the command "tested", and won't exit it:
foo() {
echo 'x'
return 42
}
out=$(foo && :)
echo $out
The : is the "null command" (ie. it doesn't do anything). In this case it doesn't even get executed, since it only gets run if foo return 0 (which it doesn't).
This outputs:
x
It's arguably a bit ugly, but then again, all of shell scripting is arguably a bit ugly ;-)
Quoting sh(1) from FreeBSD, which explains this better than bash's man page:
-e errexit
Exit immediately if any untested command fails in non-interactive
mode. The exit status of a command is considered to be explicitly
tested if the command is part of the list used to control an if,
elif, while, or until; if the command is the left hand operand of
an “&&” or “||” operator; or if the command is a pipeline preceded
by the ! operator. If a shell function is executed and its exit
status is explicitly tested, all commands of the function are con‐
sidered to be tested as well.

Bash function fails when it's part of a pipe

This is an attempt to simplify a problem I'm having.
I define a function which sets a variable and this works in this scenario:
$ function myfunc { res="ABC" ; }
$ res="XYZ"
$ myfunc
$ echo $res
ABC
So res has been changed by the call to myfunc. But:
$ res="XYZ"
$ myfunc | echo
$ echo $res
XYZ
So when myfunc is part of a pipe the value doesn't change.
How can I make myfunc work the way I desire even when a pipe is involved?
(In the real script "myfunc" does something more elaborate of course and the other side of the pipe has a zenity progress dialogue rather than a pointless echo)
Thanks
This isn't possible on Unix. To understand this better, you need to know what a variable is. Bash keeps two internal tables with all defined variables. One is for variables local to the current shell. You can create those with set name=value or just name=value. These are local to the process; they are not inherited when a new process is created.
To export a variable to new child processes, you must export it with export name. That tells bash "I want children to see the value of this variable". It's a security feature.
When you invoke a function in bash, it's executed within the context of the current shell, so it can access and modify all the variables.
But a pipe is a list of processes which are connected with I/O pipes. That means your function is executed in a shell and only the output of this shell is visible to echo.
Even exporting in myfunc wouldn't work because export works only for processes started by the shell where you did the export and echo was started by the same shell as myfunc:
bash
+-- myfunc
+-- echo
that is echo is not a child of myfunc.
Workarounds:
Write the variable into a file
Use a more complex output format like XML or several lines where the first line of output is always the variable and the real output comes in the next line.
As #Aaron said, the problem is caused by the function running in a subshell. But there is a way to avoid this in bash, using process substitution instead of a pipe:
myfunc > >(echo)
This does much the same thing a pipe would, but myfunc runs in the main shell rather than a subprocess. Note that this is a bash-only feature, and you must use #!/bin/bash as your shebang for this to work.

Resources