shell $RANDOM seed not honored in pipelines - bash

This is a strange behavior I can't explain. I want to use shell to generate a predictable random number sequence. I use $RANDOM with a seed. Here is a test program.
RANDOM=15
echo $RANDOM
This works fine by giving the same number every time I run it. But if I add a pipe to this program it gives different results every time. Try the following simplified program.
RANDOM=15
echo $RANDOM | cat
I have found 2 fixes to the problem (making it predictable), but still can't explain why.
Fix 1
RANDOM=15
x=$RANDOM
echo $x | cat
Fix 2
(RANDOM=15
echo $RANDOM) | cat
I tried on Linux and Mac. The behavior is consistent. Can somebody explain?

Pipelines, as in echo $RANDOM | cat, create subshells -- separate processes forked from the parent but not replaced with a different executable image using an exec()-family call. You're observing a difference in behavior between the shell in which RANDOM is explicitly set, and subshells forked from same.
Your workarounds either move the evaluation of $RANDOM out of a subshell into the parent (first case), or move the explicit seed set into the subshell (second case).

Thank you Charles Duffy for pointing to the right direction (subshell). I found in the src code of bash, there is file variable.c . $RANDOM is a "dynamic variable", to get the value a function is called; and the function re-seeds the random generator when $RANDOM is first evaluated in the subshell.
// from bash-4.3/variables.c
int
get_random_number ()
{
int rv, pid;
/* Reset for command and process substitution. */
pid = getpid ();
if (subshell_environment && seeded_subshell != pid)
{
seedrand (); // <<<<==== re-seed!
seeded_subshell = pid;
}
do
rv = brand ();
while (rv == last_random_value);
return rv;
}
Seed is a static variable, so each shell has its own copy. Re-seeding in the subshell has no effects in the parent. Here is another test case to show $RANDOM reference in subshell has nothing to do with the sequence in parent shell.
RANDOM=15
echo $RANDOM $RANDOM
RANDOM=15
echo $RANDOM | cat
echo $RANDOM
The last line gives the first random number after 15.

Related

bash when is a variable read during a fucntion or loop

In a bash script lets take the extreme below examples where the call/start of the myFn() is 5 minutes before the echo of $inVar -> $myvar happens. During this time between the function start and the interaction with the $myvar, it is updated.
myvar=$(... //some json
alpha=hello )
myFn(){
local -n inVar=$1
//wait for 5 mins .... :)
echo $inVar
}
myFn "myvar"
if [ -z $var ]
then
//wait 5 mins
//but life goes on and this is non blocking
//so other parts of the script are running
echo $myvar
fi
myvar=$(... // after 2 mins update alpha
alpha=world
)
As the $myvar is passed to myFn(), when is $myvar actually read,
at myFn call time (when the function is called/starts)
at the reference copy time inVar=$1
when the echo $inVar occurs
and is this the same for other types of processes such as while, if etc?
You're setting inVar as a nameref, so the value is not known until the variable is expanded at the echo statement
HOWEVER
In your scenario, myFn is "non blocking", meaning you launch it in the background. In this case, the subshell gets a copy of the current value of myVar -- if myVar gets updated subsequently, that update is happening in the current shell, not the background shell.
To demonstrate:
$ bash -c '
fn() { local -n y=$1; sleep 2; echo "in background function, y=$y"; }
x=5
fn x &
x=10
wait
'
in background function, y=5
TL;DR: namerefs and background processes don't mix well.

Tcsh Script Last Exit Code ($?) value is resetting

I am running the following script using tcsh. In my while loop, I'm running a C++ program that I created and will return a different exit code depending on certain things. While it returns an exit code of 0, I want the script to increment counter and run the program again.
#!/bin/tcsh
echo "Starting the script."
set counter = 0
while ($? == 0)
# counter ++
./auto $counter
end
I have verified that my program is definitely returning with exit code = 1 after a certain point. However, the condition in the while loop keeps evaluating to true for some reason and running.
I found that if I stick the following line at the end of my loop and then replace the condition check in the while loop with this new variable, it works fine.
while ($return_code == 0)
# counter ++
./auto $counter
set return_code = $?
end
Why is it that I can't just use $? directly? Is another operation underneath the hood performed in between running my custom program and checking the loop condition that's causing $? to change value?
That is peculiar.
I've altered your example to something that I think illustrates the issue more clearly. (Note that $? is an alias for $status.)
#!/bin/tcsh -f
foreach i (1 2 3)
false
# echo false status=$status
end
echo Done status=$status
The output is
Done status=0
If I uncomment the echo command in the loop, the output is:
false status=1
false status=1
false status=1
Done status=0
(Of course the echo in the loop would break the logic anyway, because the echo command completes successfully and sets $status to zero.)
I think what's happening is that the end that terminates the loop is executed as a statement, and it sets $status ($?) to 0.
I see the same behavior with both tcsh and bsd-csh.
Saving the value of $status in another variable immediately after the command is a good workaround -- and arguably just a better way of doing it, since $status is extremely fragile, and will almost literally be clobbered if you look at it.
Note that I've add a -f option to the #! line. This prevents tcsh from sourcing your init file(s) (.cshrc or .tcshrc) and is considered good practice. (That's not the case for sh/bash/ksh/zsh, which assign a completely different meaning to -f.)
A digression: I used tcsh regularly for many years, both as my interactive login shell and for scripting. I would not have anticipated that end would set $status. This is not the first time I've had to find out how tcsh or csh behaves by trial and error and been surprised by the result. It is one of the reasons I switched to bash for interactive and scripting use. I won't tell you to do the same, but you might want to read Tom Christiansen's classic "csh.whynot".
Slightly shorter/simpler explanation:
Recall that with tcsh/csh EACH command (including shell builtin) return a status. Therefore $? (aliases to $status) is updated by 'if' statements, 'for' loops, assignments, ...
From practical point of view, better to limit the usage of direct use of $? to an if statement after the command execution:
do-something
if ( $status == 0 )
...
endif
In all other cases, capture the status in a variable, and use only that variable
do-something
something_status=$?
if ( $something_status == 0 )
...
endif
To expand on the $status, even a condition test in an if statement will modify the status, therefore the following repeated test on $status will not never hit the '$status == 5', even when do-something will return status of 5
do-something
if ( $status == 2 ) then
echo FOO
else if ( $status == 5 ) then
echo BAR
endif

Consistent syntax for obtaining output of a command efficiently in bash?

Bash has the command substitution syntax $(f), which allows to capture
the STDOUT of a command f. If the command is an executable, this is fine
– the creation of a new process is necessary anyway. But if the command is
a shell-function, using this syntax creates an overhead of about 25ms for
each subshell on my system. This is enough to add up to noticable delays
when used in inner loops, especially in interactive contexts such as
command completions or $PS1.
A common optimization is to use global variables instead
[1] for returning values,
but it comes at a cost to readability: The intent becomes less clear, and
output capturing suddenly is inconsistent between shell functions and
executables. I am adding a comparison of options and their weaknesses below.
In order to get a consistent, reliable syntax, I was wondering if bash has
any feature that allows to capture shell-function and executable output
alike, while avoiding subshells for shell-functions.
Ideally, a solution would also contain a more efficient alternative to executing multiple commands in a subshell, which allows more cleanly isolating concerns, e.g.
person=$(
db_handler=$(database_connect) # avoids leaking the variable
query $db_handler lastname # outside it's required
echo ", " # scope.
query $db_handler firstname
database_close $db_handler
)
Such a construct allows the reader of the code to ignore everything inside $(), if the details of how $person is formatted aren't interesting to them.
Comparison of Options
1. With command substitution
person="$(get lastname), $(get firstname)"
Slow, but readable and consistent: It doesn't matter to the reader at first
glance whether get is a shell function or an executable.
2. With same global variable for all functions
get lastname
person="$R, "
get firstname
person+="$R"
Obscures what $person is supposed to contain. Alternatively,
get lastname
local lastname="$R"
get firstname
local firstname="$R"
person="$lastname, $firstname"
but that's very verbose.
3. With different global variable for each function
get_lastname
get_firstname
person="$lastname $firstname"
More readable assignment, but
If some function is invoked twice, we're back to (2).
The side-effect of setting the variable is not obvious.
It is easy to use the wrong variable by accident.
4. With global variable, whose name is passed as argument
get LN lastname
get FN firstname
person="$LN, $FN"
More readable, allows multiple return values easily.
Still inconsistent with capturing output from executables.
Note: Assignment to dynamic variable names should be done with declare
rather than eval:
$VARNAME="$LOCALVALUE" # doesn't work.
declare -g "$VARNAME=$LOCALVALUE" # will work.
eval "$VARNAME='$LOCALVALUE'" # doesn't work for *arbitrary* values.
eval "$VARNAME=$(printf %q "$LOCALVALUE")"
# doesn't avoid a subshell afterall.
[1] http://rus.har.mn/blog/2010-07-05/subshells/
If you want it to be efficient the shell functions can't return their result via stdout. If they did, there'd be no way to get it but by running the function in a subshell and capturing the output via an internal pipe, and these operations are kind of expensive (a few ms on a modern system).
When I was focusing on shell scripts and I needed to max their performance I used a convention where function foo would return its result via a variable foo. This you can do even in a POSIX shell and it has the nice property that it won't overwrite your locals because if foo is a function, you've already kind of reserved the name.
Then I had this bx_r getter function that runs a shell function and saves its output into either a variable whose name is given by the first argument or it outputs the output to stdout if the first argument is a word that's an illegal variable name (without a newline if the word is exactly an empty word, i.e., '').
I've modified it so it can be used uniformly with either commands or functions.
You can't use the type builtin to differentiate between the two here because
type returns its result via stdout => you'd need to capture that result and that would impose the forking penalty again.
So what I do when I'm about to run function foo is I check if there's a corresponding variable foo (this can catch a local variable but you'll avoid the chances of this if you limit yourself to properly namespaced shell function names). If there is, I assume that's where function foo returns its result, otherwise I run it in a $(), capturing its stdout.
Here's the code with some testing code:
bx_varlike_eh()
{
case $1 in
([!A-Za-z_0-9]*) false;;
(*) true;;
esac
}
bx_r() #{{{ Varname=$1; shift; Invoke $# and save it to $Varname if a legal varname or print it
{
# `bx_r '' some_command` prints without a newline
# `bx_r - some_command` (or any non-variable-character-containing word instead of -)
# prints with a newline
local bx_r__varname="$1"; shift 1
local bx_r
if ! bx_varlike_eh "$1" || eval "[ \"\${$1+set}\" != set ]"; then
#https://unix.stackexchange.com/a/465715/23692
bx_r=$( "$#" ) || return #$1 not varlike or unset => must be a regular command, so capture
else
#if $1 is a variable name, assume $1 is a function that saves its output there
"$#" || return
eval "bx_r=\$$1" #put it in bx_r
fi
case "$bx_r__varname" in
('') printf '%s' "$bx_r";;
([!A-Za-z_0-9]*) printf '%s\n' "$bx_r";;
(*) eval "$bx_r__varname=\$bx_r";;
esac
} #}}}
#TEST
for sh in sh bash; do
time $sh -c '
. ./bx_r.sh
bx_getnext=; bx_getnext() { bx_getnext=$((bx_getnext+1)); }
bx_r - bx_getnext
bx_r - bx_getnext
i=0; while [ $i -lt 10000 ]; do
bx_r ans bx_getnext
i=$((i+1)); done; echo ans=$ans
'
echo ====
$sh -c '
. ./bx_r.sh
bx_r - date
bx_r - /bin/date
bx_r ans /bin/date
echo ans=$ans
'
echo ====
time $sh -c '
. ./bx_r.sh
bx_echoget() { echo 42; }
i=0; while [ $i -lt 10000 ]; do
ans=$(bx_echoget)
i=$((i+1)); done; echo ans=$ans
'
done
exit
#MY TEST OUTPUT
1
2
ans=10002
0.14user 0.00system 0:00.14elapsed 99%CPU (0avgtext+0avgdata 1644maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
====
Thu Sep 5 17:12:01 CEST 2019
Thu Sep 5 17:12:01 CEST 2019
ans=Thu Sep 5 17:12:01 CEST 2019
====
ans=42
1.95user 1.14system 0:02.81elapsed 110%CPU (0avgtext+0avgdata 1656maxresident)k
0inputs+1256outputs (0major+350075minor)pagefaults 0swaps
1
2
ans=10002
0.92user 0.03system 0:00.96elapsed 99%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+0outputs (0major+159minor)pagefaults 0swaps
====
Thu Sep 5 17:12:05 CEST 2019
Thu Sep 5 17:12:05 CEST 2019
ans=Thu Sep 5 17:12:05 CEST 2019
====
ans=42
5.20user 2.40system 0:06.96elapsed 109%CPU (0avgtext+0avgdata 3220maxresident)k
0inputs+1248outputs (0major+949297minor)pagefaults 0swaps
As you can see, you can get uniform call syntax with this, while speeding up
the execution of small shell functions by up to about 14 times due to eliminating the need for captures ($()).
Use a bash nameref.
With bash v4 you can use variable namerefs:
get() {
declare -n _get__res
_get_res="$1"
case "$2" in
firstname) _get_res="Kamil"; ;;
lastname) _get_res="Cuk"; ;;
esac
}
get LN lastname
get FN firstname
person="$LN, $FN"
Namerefs can still clash with variables from outer scope. Use long names for the namerefs, like here I used underscore, function name, two underscores and then variable name.

Function behaviour on shell(ksh) script

Here are 2 different versions of a program:
this
Program:
#!/usr/bin/ksh
printmsg() {
i=1
print "hello function :)";
}
i=0;
echo I printed `printmsg`;
printmsg
echo $i
Output:
# ksh e
I printed hello function :)
hello function :)
1
and
Program:
#!/usr/bin/ksh
printmsg() {
i=1
print "hello function :)";
}
i=0;
echo I printed `printmsg`;
echo $i
Output:
# ksh e
I printed hello function :)
0
The only difference between the above 2 programs is that printmsg is 2times in the above program while printmsg is called once in the below program.
My Doubt arises here: To quote
Be warned: Functions act almost just like external scripts... except
that by default, all variables are SHARED between the same ksh
process! If you change a variable name inside a function.... that
variable's value will still be changed after you have left the
function!!
But we can clearly see in the 2nd program's output that the value of i remains unchanged. But we are sure that the function is called as the print statement gets the the output of the function and prints it. So why is the output different in both?
When you use backticks (or $(...)), you execute it in a subshell.
That is, a new shell is started (which inherits from the current one) and then exists.
Edit: I checked your link, if you read the bottom of it, the very last section, you'll see this explained.

Increment a global variable in Bash

Here's a shell script:
globvar=0
function myfunc {
let globvar=globvar+1
echo "myfunc: $globvar"
}
myfunc
echo "something" | myfunc
echo "Global: $globvar"
When called, it prints out the following:
$ sh zzz.sh
myfunc: 1
myfunc: 2
Global: 1
$ bash zzz.sh
myfunc: 1
myfunc: 2
Global: 1
$ zsh zzz.sh
myfunc: 1
myfunc: 2
Global: 2
The question is: why this happens and what behavior is correct?
P.S. I have a strange feeling that function behind the pipe is called in a forked shell... So, can there be a simple workaround?
P.P.S. This function is a simple test wrapper. It runs test application and analyzes its output. Then it increments $PASSED or $FAILED variables. Finally, you get a number of passed/failed tests in global variables. The usage is like:
test-util << EOF | myfunc
input for test #1
EOF
test-util << EOF | myfunc
input for test #2
EOF
echo "Passed: $PASSED, failed: $FAILED"
Korn shell gives the same results as zsh, by the way.
Please see BashFAQ/024. Pipes create subshells in Bash and variables are lost when subshells exit.
Based on your example, I would restructure it something like this:
globvar=0
function myfunc {
echo $(($1 + 1))
}
myfunc "$globvar"
globalvar=$(echo "something" | myfunc "$globalvar")
Piping something into myfunc in sh or bash causes a new shell to spawn. You can confirm this by adding a long sleep in myfunc. While it's sleeping call ps and you'll see a subprocess. When the function returns, that sub shell exits without changing the value in the parent process.
If you really need that value to be changed, you'll need to return a value from the function and check $PIPESTATUS after, I guess, like this:
globvar=0
function myfunc {
let globvar=globvar+1
echo "myfunc: $globvar"
return $globvar
}
myfunc
echo "something" | myfunc
globvar=${PIPESTATUS[1]}
echo "Global: $globvar"
The problem is 'which end of a pipeline using built-ins is executed by the original process?'
In zsh, it looks like the last command in the pipeline is executed by the main shell script when the command is a function or built-in.
In Bash (and sh is likely to be a link to Bash if you're on Linux), then either both commands are run in a sub-shell or the first command is run by the main process and the others are run by sub-shells.
Clearly, when the function is run in a sub-shell, it does not affect the variable in the parent shell (only the global in the sub-shell).
Consider adding an extra test:
echo Something | { myfunc; echo $globvar; }
echo $globvar

Resources