Here's a shell script:
globvar=0
function myfunc {
let globvar=globvar+1
echo "myfunc: $globvar"
}
myfunc
echo "something" | myfunc
echo "Global: $globvar"
When called, it prints out the following:
$ sh zzz.sh
myfunc: 1
myfunc: 2
Global: 1
$ bash zzz.sh
myfunc: 1
myfunc: 2
Global: 1
$ zsh zzz.sh
myfunc: 1
myfunc: 2
Global: 2
The question is: why this happens and what behavior is correct?
P.S. I have a strange feeling that function behind the pipe is called in a forked shell... So, can there be a simple workaround?
P.P.S. This function is a simple test wrapper. It runs test application and analyzes its output. Then it increments $PASSED or $FAILED variables. Finally, you get a number of passed/failed tests in global variables. The usage is like:
test-util << EOF | myfunc
input for test #1
EOF
test-util << EOF | myfunc
input for test #2
EOF
echo "Passed: $PASSED, failed: $FAILED"
Korn shell gives the same results as zsh, by the way.
Please see BashFAQ/024. Pipes create subshells in Bash and variables are lost when subshells exit.
Based on your example, I would restructure it something like this:
globvar=0
function myfunc {
echo $(($1 + 1))
}
myfunc "$globvar"
globalvar=$(echo "something" | myfunc "$globalvar")
Piping something into myfunc in sh or bash causes a new shell to spawn. You can confirm this by adding a long sleep in myfunc. While it's sleeping call ps and you'll see a subprocess. When the function returns, that sub shell exits without changing the value in the parent process.
If you really need that value to be changed, you'll need to return a value from the function and check $PIPESTATUS after, I guess, like this:
globvar=0
function myfunc {
let globvar=globvar+1
echo "myfunc: $globvar"
return $globvar
}
myfunc
echo "something" | myfunc
globvar=${PIPESTATUS[1]}
echo "Global: $globvar"
The problem is 'which end of a pipeline using built-ins is executed by the original process?'
In zsh, it looks like the last command in the pipeline is executed by the main shell script when the command is a function or built-in.
In Bash (and sh is likely to be a link to Bash if you're on Linux), then either both commands are run in a sub-shell or the first command is run by the main process and the others are run by sub-shells.
Clearly, when the function is run in a sub-shell, it does not affect the variable in the parent shell (only the global in the sub-shell).
Consider adding an extra test:
echo Something | { myfunc; echo $globvar; }
echo $globvar
Related
I'm trying to build a Jenkins Pipeline for which a parameter is
optional:
parameters {
string(
name:'foo',
defaultValue:'',
description:'foo is foo'
)
}
My purpose is calling a shell script and providing foo as argument:
stages {
stage('something') {
sh "some-script.sh '${params.foo}'"
}
}
The shell script will do the Right Thing™ if the provided value is the empty
string.
Unfortunately I can't just get an empty string. If the user does not provide
a value for foo, Jenkins will set it to null, and I will get null
(as string) inside my command.
I found this related question but the only answer is not really helpful.
Any suggestion?
OP here realized a wrapper script can be helpful… I ironically called it junkins-cmd and I call it like this:
stages {
stage('something') {
sh "junkins-cmd some-script.sh '${params.foo}'"
}
}
Code:
#!/bin/bash
helpme() {
cat <<EOF
Usage: $0 <command> [parameters to command]
This command is a wrapper for jenkins pipeline. It tries to overcome jenkins
idiotic behaviour when calling programs without polluting the remaining part
of the toolkit.
The given command is executed with the fixed version of the given
parameters. Current fixes:
- 'null' is replaced with ''
EOF
} >&2
trap helpme EXIT
command="${1:?Missing command}"; shift
trap - EXIT
typeset -a params
for p in "$#"; do
# Jenkins pipeline uses 'null' when the parameter is undefined.
[[ "$p" = 'null' ]] && p=''
params+=("$p")
done
exec $command "${params[#]}"
Beware: prams+=("$p") seems not to be portable among shells: hence this ugly script is running #!/bin/bash.
A recent vulnerability, CVE-2014-6271, in how Bash interprets environment variables was disclosed. The exploit relies on Bash parsing some environment variable declarations as function definitions, but then continuing to execute code following the definition:
$ x='() { echo i do nothing; }; echo vulnerable' bash -c ':'
vulnerable
But I don't get it. There's nothing I've been able to find in the Bash manual about interpreting environment variables as functions at all (except for inheriting functions, which is different). Indeed, a proper named function definition is just treated as a value:
$ x='y() { :; }' bash -c 'echo $x'
y() { :; }
But a corrupt one prints nothing:
$ x='() { :; }' bash -c 'echo $x'
$ # Nothing but newline
The corrupt function is unnamed, and so I can't just call it. Is this vulnerability a pure implementation bug, or is there an intended feature here, that I just can't see?
Update
Per Barmar's comment, I hypothesized the name of the function was the parameter name:
$ n='() { echo wat; }' bash -c 'n'
wat
Which I could swear I tried before, but I guess I didn't try hard enough. It's repeatable now. Here's a little more testing:
$ env n='() { echo wat; }; echo vuln' bash -c 'n'
vuln
wat
$ env n='() { echo wat; }; echo $1' bash -c 'n 2' 3 -- 4
wat
…so apparently the args are not set at the time the exploit executes.
Anyway, the basic answer to my question is, yes, this is how Bash implements inherited functions.
This seems like an implementation bug.
Apparently, the way exported functions work in bash is that they use specially-formatted environment variables. If you export a function:
f() { ... }
it defines an environment variable like:
f='() { ... }'
What's probably happening is that when the new shell sees an environment variable whose value begins with (), it prepends the variable name and executes the resulting string. The bug is that this includes executing anything after the function definition as well.
The fix described is apparently to parse the result to see if it's a valid function definition. If not, it prints the warning about the invalid function definition attempt.
This article confirms my explanation of the cause of the bug. It also goes into a little more detail about how the fix resolves it: not only do they parse the values more carefully, but variables that are used to pass exported functions follow a special naming convention. This naming convention is different from that used for the environment variables created for CGI scripts, so an HTTP client should never be able to get its foot into this door.
The following:
x='() { echo I do nothing; }; echo vulnerable' bash -c 'typeset -f'
prints
vulnerable
x ()
{
echo I do nothing
}
declare -fx x
seems, than Bash, after having parsed the x=..., discovered it as a function, exported it, saw the declare -fx x and allowed the execution of the command after the declaration.
echo vulnerable
x='() { x; }; echo vulnerable' bash -c 'typeset -f'
prints:
vulnerable
x ()
{
echo I do nothing
}
and running the x
x='() { x; }; echo Vulnerable' bash -c 'x'
prints
Vulnerable
Segmentation fault: 11
segfaults - infinite recursive calls
It doesn't overrides already defined function
$ x() { echo Something; }
$ declare -fx x
$ x='() { x; }; echo Vulnerable' bash -c 'typeset -f'
prints:
x ()
{
echo Something
}
declare -fx x
e.g. the x remains the previously (correctly) defined function.
For the Bash 4.3.25(1)-release the vulnerability is closed, so
x='() { echo I do nothing; }; echo Vulnerable' bash -c ':'
prints
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'
but - what is strange (at least for me)
x='() { x; };' bash -c 'typeset -f'
STILL PRINTS
x ()
{
x
}
declare -fx x
and the
x='() { x; };' bash -c 'x'
segmentation faults too, so it STILL accept the strange function definition...
I think it's worth looking at the Bash code itself. The patch gives a bit of insight as to the problem. In particular,
*** ../bash-4.3-patched/variables.c 2014-05-15 08:26:50.000000000 -0400
--- variables.c 2014-09-14 14:23:35.000000000 -0400
***************
*** 359,369 ****
strcpy (temp_string + char_index + 1, string);
! if (posixly_correct == 0 || legal_identifier (name))
! parse_and_execute (temp_string, name, SEVAL_NONINT|SEVAL_NOHIST);
!
! /* Ancient backwards compatibility. Old versions of bash exported
! functions like name()=() {...} */
! if (name[char_index - 1] == ')' && name[char_index - 2] == '(')
! name[char_index - 2] = '\0';
if (temp_var = find_function (name))
--- 364,372 ----
strcpy (temp_string + char_index + 1, string);
! /* Don't import function names that are invalid identifiers from the
! environment, though we still allow them to be defined as shell
! variables. */
! if (legal_identifier (name))
! parse_and_execute (temp_string, name, SEVAL_NONINT|SEVAL_NOHIST|SEVAL_FUNCDEF|SEVAL_ONECMD);
if (temp_var = find_function (name))
When Bash exports a function, it shows up as an environment variable, for example:
$ foo() { echo 'hello world'; }
$ export -f foo
$ cat /proc/self/environ | tr '\0' '\n' | grep -A1 foo
foo=() { echo 'hello world'
}
When a new Bash process finds a function defined this way in its environment, it evalutes the code in the variable using parse_and_execute(). For normal, non-malicious code, executing it simply defines the function in Bash and moves on. However, because it's passed to a generic execution function, Bash will correctly parse and execute additional code defined in that variable after the function definition.
You can see that in the new code, a flag called SEVAL_ONECMD has been added that tells Bash to only evaluate the first command (that is, the function definition) and SEVAL_FUNCDEF to only allow functio0n definitions.
In regard to your question about documentation, notice here in the commandline documentation for the env command, that a study of the syntax shows that env is working as documented.
There are, optionally, 4 possible options
An optional hyphen as a synonym for -i (for backward compatibility I assume)
Zero or more NAME=VALUE pairs. These are the variable assignment(s) which could include function definitions.
Note that no semicolon (;) is required between or following the assignments.
The last argument(s) can be a single command followed by its argument(s). It will run with whatever permissions have been granted to the login being used. Security is controlled by restricting permissions on the login user and setting permissions on user-accessible executables such that users other than the executable's owner can only read and execute the program, not alter it.
[ spot#LX03:~ ] env --help
Usage: env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
Set each NAME to VALUE in the environment and run COMMAND.
-i, --ignore-environment start with an empty environment
-u, --unset=NAME remove variable from the environment
--help display this help and exit
--version output version information and exit
A mere - implies -i. If no COMMAND, print the resulting environment.
Report env bugs to bug-coreutils#gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
Report env translation bugs to <http://translationproject.org/team/>
I have a script a.sh which has :
a() {
echo "123"
}
echo "dont"
Then I have other script b.sh which has :
b() {
echo "345"
}
All I want to do is to use a in b, but when I source it I don't want to print whatever is in a() or echo "Dont".
I just want to source it for now.
so I did, source a.sh in b.sh
But it doesn't work.
Reason for sourcing is. so if I want I can call any functions when I want too.
If I do . /a.sh in b.sh it prints everything in a.sh.
One approach which will work on any POSIX-compliant shell is this:
# put function definitions at the top
a() {
echo "123"
}
# divide them with a conditional return
[ -n "$FUNCTIONS_ONLY" ] && return
# put direct code to execute below
echo "dont"
...and in your other script:
FUNCTIONS_ONLY=1 . other.sh
Make a library of common functions in a file called functionLib.sh like this:
#!/bin/sh
a(){
echo Inside a, with $1
}
b(){
echo Inside b, with $1
}
Then in script1, do this:
#!/bin/sh
. functionLib.sh # Source in the functions
a 42 # Use one
b 37 # Use another
and in another script, script2 re-use the functions:
#!/bin/sh
. functionLib.sh # Source in the functions
a 23 # Re-use one
b 24 # Re-use another
I have adopted a style in my shell scripts that allows me to design every script as a potential library, making it behave differently when it is sourced (with . .../path/script) and when it is executed directly. You can compare this to the python if __name__ == '__main__': trick.
I have not found a method that is portable across all Bourne shell descendants without explicitly referring to the script's name, but this is what I use:
a() {
echo a
}
b() {
echo b
}
(program=xyzzy
set -u -e
case $0 in
*${program}) : ;;
*) exit;;
esac
# main
a
b
)
The rules for this method are strictly:
Start a section with nothing but functions.
No variable assignments or any other activity.
Then, at the very end, create a subshell ( ... )
The first action inside the subshell tests whether
it's being sourced. If so, exit from the subshell.
If not, run a command.
I am working with a bash script and I want to execute a function to print a return value:
function fun1(){
return 34
}
function fun2(){
local res=$(fun1)
echo $res
}
When I execute fun2, it does not print "34". Why is this the case?
Although Bash has a return statement, the only thing you can specify with it is the function's own exit status (a value between 0 and 255, 0 meaning "success"). So return is not what you want.
You might want to convert your return statement to an echo statement - that way your function output could be captured using $() braces, which seems to be exactly what you want.
Here is an example:
function fun1(){
echo 34
}
function fun2(){
local res=$(fun1)
echo $res
}
Another way to get the return value (if you just want to return an integer 0-255) is $?.
function fun1(){
return 34
}
function fun2(){
fun1
local res=$?
echo $res
}
Also, note that you can use the return value to use Boolean logic - like fun1 || fun2 will only run fun2 if fun1 returns a non-0 value. The default return value is the exit value of the last statement executed within the function.
Functions in Bash are not functions like in other languages; they're actually commands. So functions are used as if they were binaries or scripts fetched from your path. From the perspective of your program logic, there shouldn't really be any difference.
Shell commands are connected by pipes (aka streams), and not fundamental or user-defined data types, as in "real" programming languages. There is no such thing like a return value for a command, maybe mostly because there's no real way to declare it. It could occur on the man-page, or the --help output of the command, but both are only human-readable and hence are written to the wind.
When a command wants to get input it reads it from its input stream, or the argument list. In both cases text strings have to be parsed.
When a command wants to return something, it has to echo it to its output stream. Another often practiced way is to store the return value in dedicated, global variables. Writing to the output stream is clearer and more flexible, because it can take also binary data. For example, you can return a BLOB easily:
encrypt() {
gpg -c -o- $1 # Encrypt data in filename to standard output (asks for a passphrase)
}
encrypt public.dat > private.dat # Write the function result to a file
As others have written in this thread, the caller can also use command substitution $() to capture the output.
Parallely, the function would "return" the exit code of gpg (GnuPG). Think of the exit code as a bonus that other languages don't have, or, depending on your temperament, as a "Schmutzeffekt" of shell functions. This status is, by convention, 0 on success or an integer in the range 1-255 for something else. To make this clear: return (like exit) can only take a value from 0-255, and values other than 0 are not necessarily errors, as is often asserted.
When you don't provide an explicit value with return, the status is taken from the last command in a Bash statement/function/command and so forth. So there is always a status, and return is just an easy way to provide it.
$(...) captures the text sent to standard output by the command contained within. return does not output to standard output. $? contains the result code of the last command.
fun1 (){
return 34
}
fun2 (){
fun1
local res=$?
echo $res
}
The problem with other answers is they either use a global, which can be overwritten when several functions are in a call chain, or echo which means your function cannot output diagnostic information (you will forget your function does this and the "result", i.e. return value, will contain more information than your caller expects, leading to weird bugs), or eval which is way too heavy and hacky.
The proper way to do this is to put the top level stuff in a function and use a local with Bash's dynamic scoping rule. Example:
func1()
{
ret_val=hi
}
func2()
{
ret_val=bye
}
func3()
{
local ret_val=nothing
echo $ret_val
func1
echo $ret_val
func2
echo $ret_val
}
func3
This outputs
nothing
hi
bye
Dynamic scoping means that ret_val points to a different object, depending on the caller! This is different from lexical scoping, which is what most programming languages use. This is actually a documented feature, just easy to miss, and not very well explained. Here is the documentation for it (emphasis is mine):
Variables local to the function may be declared with the local
builtin. These variables are visible only to the function and the
commands it invokes.
For someone with a C, C++, Python, Java,C#, or JavaScript background, this is probably the biggest hurdle: functions in bash are not functions, they are commands, and behave as such: they can output to stdout/stderr, they can pipe in/out, and they can return an exit code. Basically, there isn't any difference between defining a command in a script and creating an executable that can be called from the command line.
So instead of writing your script like this:
Top-level code
Bunch of functions
More top-level code
write it like this:
# Define your main, containing all top-level code
main()
Bunch of functions
# Call main
main
where main() declares ret_val as local and all other functions return values via ret_val.
See also the Unix & Linux question Scope of Local Variables in Shell Functions.
Another, perhaps even better solution depending on situation, is the one posted by ya.teck which uses local -n.
Another way to achieve this is name references (requires Bash 4.3+).
function example {
local -n VAR=$1
VAR=foo
}
example RESULT
echo $RESULT
The return statement sets the exit code of the function, much the same as exit will do for the entire script.
The exit code for the last command is always available in the $? variable.
function fun1(){
return 34
}
function fun2(){
local res=$(fun1)
echo $? # <-- Always echos 0 since the 'local' command passes.
res=$(fun1)
echo $? #<-- Outputs 34
}
As an add-on to others' excellent posts, here's an article summarizing these techniques:
set a global variable
set a global variable, whose name you passed to the function
set the return code (and pick it up with $?)
'echo' some data (and pick it up with MYVAR=$(myfunction) )
Returning Values from Bash Functions
I like to do the following if running in a script where the function is defined:
POINTER= # Used for function return values
my_function() {
# Do stuff
POINTER="my_function_return"
}
my_other_function() {
# Do stuff
POINTER="my_other_function_return"
}
my_function
RESULT="$POINTER"
my_other_function
RESULT="$POINTER"
I like this, because I can then include echo statements in my functions if I want
my_function() {
echo "-> my_function()"
# Do stuff
POINTER="my_function_return"
echo "<- my_function. $POINTER"
}
The simplest way I can think of is to use echo in the method body like so
get_greeting() {
echo "Hello there, $1!"
}
STRING_VAR=$(get_greeting "General Kenobi")
echo $STRING_VAR
# Outputs: Hello there, General Kenobi!
Instead of calling var=$(func) with the whole function output, you can create a function that modifies the input arguments with eval,
var1="is there"
var2="anybody"
function modify_args() {
echo "Modifying first argument"
eval $1="out"
echo "Modifying second argument"
eval $2="there?"
}
modify_args var1 var2
# Prints "Modifying first argument" and "Modifying second argument"
# Sets var1 = out
# Sets var2 = there?
This might be useful in case you need to:
Print to stdout/stderr within the function scope (without returning it)
Return (set) multiple variables.
Git Bash on Windows is using arrays for multiple return values
Bash code:
#!/bin/bash
## A 6-element array used for returning
## values from functions:
declare -a RET_ARR
RET_ARR[0]="A"
RET_ARR[1]="B"
RET_ARR[2]="C"
RET_ARR[3]="D"
RET_ARR[4]="E"
RET_ARR[5]="F"
function FN_MULTIPLE_RETURN_VALUES(){
## Give the positional arguments/inputs
## $1 and $2 some sensible names:
local out_dex_1="$1" ## Output index
local out_dex_2="$2" ## Output index
## Echo for debugging:
echo "Running: FN_MULTIPLE_RETURN_VALUES"
## Here: Calculate output values:
local op_var_1="Hello"
local op_var_2="World"
## Set the return values:
RET_ARR[ $out_dex_1 ]=$op_var_1
RET_ARR[ $out_dex_2 ]=$op_var_2
}
echo "FN_MULTIPLE_RETURN_VALUES EXAMPLES:"
echo "-------------------------------------------"
fn="FN_MULTIPLE_RETURN_VALUES"
out_dex_a=0
out_dex_b=1
eval $fn $out_dex_a $out_dex_b ## <-- Call function
a=${RET_ARR[0]} && echo "RET_ARR[0]: $a "
b=${RET_ARR[1]} && echo "RET_ARR[1]: $b "
echo
## ---------------------------------------------- ##
c="2"
d="3"
FN_MULTIPLE_RETURN_VALUES $c $d ## <--Call function
c_res=${RET_ARR[2]} && echo "RET_ARR[2]: $c_res "
d_res=${RET_ARR[3]} && echo "RET_ARR[3]: $d_res "
echo
## ---------------------------------------------- ##
FN_MULTIPLE_RETURN_VALUES 4 5 ## <--- Call function
e=${RET_ARR[4]} && echo "RET_ARR[4]: $e "
f=${RET_ARR[5]} && echo "RET_ARR[5]: $f "
echo
##----------------------------------------------##
read -p "Press Enter To Exit:"
Expected output:
FN_MULTIPLE_RETURN_VALUES EXAMPLES:
-------------------------------------------
Running: FN_MULTIPLE_RETURN_VALUES
RET_ARR[0]: Hello
RET_ARR[1]: World
Running: FN_MULTIPLE_RETURN_VALUES
RET_ARR[2]: Hello
RET_ARR[3]: World
Running: FN_MULTIPLE_RETURN_VALUES
RET_ARR[4]: Hello
RET_ARR[5]: World
Press Enter To Exit:
In the bash manual section 3.7.4, it states
The environment for any simple command or function may be augmented
temporarily by prefixing it with parameter assignments, as described
in Shell Parameters [section 3.4].
And a trivial example of this is
MYVAR=MYVALUE mycommand
I have read section 3.4 and I still can't figure out how to specify multiple parameter assignments. Note that the statement in 3.7.4 is definitely plural, implying that it is possible.
The following does not seem to work:
MYVAR1=abc MYVAR2=xyz mycommand
I am using bash version 4.1, 23 December 2009.
It should work. That is acceptable syntax. Here's an example:
$ cat a
#!/bin/sh
echo $MYVAR1
echo $MYVAR2
$ ./a
$ MYVAR1=abc MYVAR2=xyz ./a
abc
xyz
$
UPDATE: Your updated example given in your answer will work if you precede the simple command with the variables as required:
mycommand () { echo MYVAR1=[$MYVAR1]; echo MYVAR2=[$MYVAR2]; }
for f in ~/*.txt ; do MYVAR1=abc MYVAR2=xyz mycommand; done
Oops, my example was over-simplified. This works fine:
mycommand () { echo MYVAR1=[$MYVAR1]; echo MYVAR2=[$MYVAR2]; }
MYVAR1=abc MYVAR2=xyz mycommand
but this does not:
mycommand () { echo MYVAR1=[$MYVAR1]; echo MYVAR2=[$MYVAR2]; }
MYVAR1=abc MYVAR2=xyz for f in ~/*.txt; do mycommand; done
and the key difference is this phrase:
for any simple command or function
A for command is a complex command, not "a simple command or function", according to section 3.2.
I've never seen that method used.
You could always just pass in regular parameters to the script.
Updating for new example:
This works in both situations:
> mycommand () { echo MYVAR1=[$1]; echo MYVAR2=[$2]; }
> mycommand a b
MYVAR1=[a]
MYVAR2=[b]
> for f in ~/file*.txt; do mycommand a b; done
MYVAR1=[a]
MYVAR2=[b]
MYVAR1=[a]
MYVAR2=[b]