Slow script execution after mocking system "date" command - bash

Hellow fellow Stack users,
I am using a function "setup_date" to change date command for my custom execution. This is to mock it and test some bash scripts, their execution has to use always the same date in order to compare results.
So this approach worked very good with ssh or sftp command mocking. But this time, just after "date" command substitution, the script's execution becomes very slow! What is the reason for that? Is "date" command called frequently by linux system for internal uses?
Regards,
#
#replace a command with previousy defined mock one
#
mock_cmd() {
local command="${1:-}"
local override="${2:-}"
# Remove target function if one is already set
unset ${command}
# Create a wrapper function called "${command}"
eval "${command}() { ${override} \${#}; }"
}
#mock the date command
#1- date formatting
#There has to be a variable: dateFile!
date_mock_SP() {
date "${1}" -r ${dateFile}
}
#
#1- date ex: 201203101513
#2- dateFile path
setup_date() {
touch -t "${1}" ${2}/dateFile
export dateFile=${2}/dateFile
}
EXECUTION :
mock_cmd "date" "date_mock_SP"
setup_date "201203101513" ${pwd}/in
Date=$(date +"%y%j")
echo $Date
Date=$(date +"%y%j")
echo $Date
exit 1

mock_cmd is brittle and more complicated than you need. You are already defining the function date_mock_SP; just name it date, and the function will override the command. Inside the function, use command date to avoid infinite recursion.
date () { command date "$1" -r "$dateFile"; }
setup_date "201203101513" "$pwd/in" # uses the function, not the executable, date
unset -f date

Related

Consistent syntax for obtaining output of a command efficiently in bash?

Bash has the command substitution syntax $(f), which allows to capture
the STDOUT of a command f. If the command is an executable, this is fine
– the creation of a new process is necessary anyway. But if the command is
a shell-function, using this syntax creates an overhead of about 25ms for
each subshell on my system. This is enough to add up to noticable delays
when used in inner loops, especially in interactive contexts such as
command completions or $PS1.
A common optimization is to use global variables instead
[1] for returning values,
but it comes at a cost to readability: The intent becomes less clear, and
output capturing suddenly is inconsistent between shell functions and
executables. I am adding a comparison of options and their weaknesses below.
In order to get a consistent, reliable syntax, I was wondering if bash has
any feature that allows to capture shell-function and executable output
alike, while avoiding subshells for shell-functions.
Ideally, a solution would also contain a more efficient alternative to executing multiple commands in a subshell, which allows more cleanly isolating concerns, e.g.
person=$(
db_handler=$(database_connect) # avoids leaking the variable
query $db_handler lastname # outside it's required
echo ", " # scope.
query $db_handler firstname
database_close $db_handler
)
Such a construct allows the reader of the code to ignore everything inside $(), if the details of how $person is formatted aren't interesting to them.
Comparison of Options
1. With command substitution
person="$(get lastname), $(get firstname)"
Slow, but readable and consistent: It doesn't matter to the reader at first
glance whether get is a shell function or an executable.
2. With same global variable for all functions
get lastname
person="$R, "
get firstname
person+="$R"
Obscures what $person is supposed to contain. Alternatively,
get lastname
local lastname="$R"
get firstname
local firstname="$R"
person="$lastname, $firstname"
but that's very verbose.
3. With different global variable for each function
get_lastname
get_firstname
person="$lastname $firstname"
More readable assignment, but
If some function is invoked twice, we're back to (2).
The side-effect of setting the variable is not obvious.
It is easy to use the wrong variable by accident.
4. With global variable, whose name is passed as argument
get LN lastname
get FN firstname
person="$LN, $FN"
More readable, allows multiple return values easily.
Still inconsistent with capturing output from executables.
Note: Assignment to dynamic variable names should be done with declare
rather than eval:
$VARNAME="$LOCALVALUE" # doesn't work.
declare -g "$VARNAME=$LOCALVALUE" # will work.
eval "$VARNAME='$LOCALVALUE'" # doesn't work for *arbitrary* values.
eval "$VARNAME=$(printf %q "$LOCALVALUE")"
# doesn't avoid a subshell afterall.
[1] http://rus.har.mn/blog/2010-07-05/subshells/
If you want it to be efficient the shell functions can't return their result via stdout. If they did, there'd be no way to get it but by running the function in a subshell and capturing the output via an internal pipe, and these operations are kind of expensive (a few ms on a modern system).
When I was focusing on shell scripts and I needed to max their performance I used a convention where function foo would return its result via a variable foo. This you can do even in a POSIX shell and it has the nice property that it won't overwrite your locals because if foo is a function, you've already kind of reserved the name.
Then I had this bx_r getter function that runs a shell function and saves its output into either a variable whose name is given by the first argument or it outputs the output to stdout if the first argument is a word that's an illegal variable name (without a newline if the word is exactly an empty word, i.e., '').
I've modified it so it can be used uniformly with either commands or functions.
You can't use the type builtin to differentiate between the two here because
type returns its result via stdout => you'd need to capture that result and that would impose the forking penalty again.
So what I do when I'm about to run function foo is I check if there's a corresponding variable foo (this can catch a local variable but you'll avoid the chances of this if you limit yourself to properly namespaced shell function names). If there is, I assume that's where function foo returns its result, otherwise I run it in a $(), capturing its stdout.
Here's the code with some testing code:
bx_varlike_eh()
{
case $1 in
([!A-Za-z_0-9]*) false;;
(*) true;;
esac
}
bx_r() #{{{ Varname=$1; shift; Invoke $# and save it to $Varname if a legal varname or print it
{
# `bx_r '' some_command` prints without a newline
# `bx_r - some_command` (or any non-variable-character-containing word instead of -)
# prints with a newline
local bx_r__varname="$1"; shift 1
local bx_r
if ! bx_varlike_eh "$1" || eval "[ \"\${$1+set}\" != set ]"; then
#https://unix.stackexchange.com/a/465715/23692
bx_r=$( "$#" ) || return #$1 not varlike or unset => must be a regular command, so capture
else
#if $1 is a variable name, assume $1 is a function that saves its output there
"$#" || return
eval "bx_r=\$$1" #put it in bx_r
fi
case "$bx_r__varname" in
('') printf '%s' "$bx_r";;
([!A-Za-z_0-9]*) printf '%s\n' "$bx_r";;
(*) eval "$bx_r__varname=\$bx_r";;
esac
} #}}}
#TEST
for sh in sh bash; do
time $sh -c '
. ./bx_r.sh
bx_getnext=; bx_getnext() { bx_getnext=$((bx_getnext+1)); }
bx_r - bx_getnext
bx_r - bx_getnext
i=0; while [ $i -lt 10000 ]; do
bx_r ans bx_getnext
i=$((i+1)); done; echo ans=$ans
'
echo ====
$sh -c '
. ./bx_r.sh
bx_r - date
bx_r - /bin/date
bx_r ans /bin/date
echo ans=$ans
'
echo ====
time $sh -c '
. ./bx_r.sh
bx_echoget() { echo 42; }
i=0; while [ $i -lt 10000 ]; do
ans=$(bx_echoget)
i=$((i+1)); done; echo ans=$ans
'
done
exit
#MY TEST OUTPUT
1
2
ans=10002
0.14user 0.00system 0:00.14elapsed 99%CPU (0avgtext+0avgdata 1644maxresident)k
0inputs+0outputs (0major+76minor)pagefaults 0swaps
====
Thu Sep 5 17:12:01 CEST 2019
Thu Sep 5 17:12:01 CEST 2019
ans=Thu Sep 5 17:12:01 CEST 2019
====
ans=42
1.95user 1.14system 0:02.81elapsed 110%CPU (0avgtext+0avgdata 1656maxresident)k
0inputs+1256outputs (0major+350075minor)pagefaults 0swaps
1
2
ans=10002
0.92user 0.03system 0:00.96elapsed 99%CPU (0avgtext+0avgdata 3284maxresident)k
0inputs+0outputs (0major+159minor)pagefaults 0swaps
====
Thu Sep 5 17:12:05 CEST 2019
Thu Sep 5 17:12:05 CEST 2019
ans=Thu Sep 5 17:12:05 CEST 2019
====
ans=42
5.20user 2.40system 0:06.96elapsed 109%CPU (0avgtext+0avgdata 3220maxresident)k
0inputs+1248outputs (0major+949297minor)pagefaults 0swaps
As you can see, you can get uniform call syntax with this, while speeding up
the execution of small shell functions by up to about 14 times due to eliminating the need for captures ($()).
Use a bash nameref.
With bash v4 you can use variable namerefs:
get() {
declare -n _get__res
_get_res="$1"
case "$2" in
firstname) _get_res="Kamil"; ;;
lastname) _get_res="Cuk"; ;;
esac
}
get LN lastname
get FN firstname
person="$LN, $FN"
Namerefs can still clash with variables from outer scope. Use long names for the namerefs, like here I used underscore, function name, two underscores and then variable name.

Get bash function path from name

A hitchhiker, waned by the time a function is taking to complete, wishes to find where a function is located, so that he can observe the function for himself by editting the file location. He does not wish to print the function body to the shell, simply get the path of the script file containing the function. Our hitchhiker only knows the name of his function, which is answer_life.
Imagine he has a function within a file universal-questions.sh, defined like this, the path of which is not known to our hitchhiker:
function answer_life() {
sleep $(date --date='7500000 years' +%s)
echo "42"
}
Another script, called hitchhiker-helper-scripts.sh, is defined below. It has the function above source'd within it (the hitchhiker doesn't understand source either, I guess. Just play ball.):
source "/usr/bin/universal-questions.sh"
function find_life_answer_script() {
# Print the path of the script containing `answer_life`
somecommand "answer_life" # Should output the path of the script containing the function.
}
So this, my intrepid scripter, is where you come in. Can you replace the comment with code in find_life_answer_script that allows our hitchhiker to find where the function is located?
In bash operating in extended debug mode, declare -F will give you the function name, line number, and path (as sourced):
function find_life_answer_script() {
( shopt -s extdebug; declare -F answer_life )
}
Like:
$ find_life_answer_script
answer_life 3 ./universal-questions.sh
Running a sub-shell lets you set extdebug mode without affecting any prior settings.
Your hitchhiker can also try to find the answer this way:
script=$(readlink -f "$0")
sources=$(grep -oP 'source\s+\K[\w\/\.]+' $script)
for s in "${sources[#]}"
do
matches=$(grep 'function\s+answer_life' $s)
if [ -n "${matches[0]}" ]; then
echo "$s: Nothing is here ("
else
echo "$s: Congrats! Here is your answer!"
fi
done
This is for case if debug mode will be unavailable on some planet )

In Bash, it is okay for a variable and a function to have the same name?

I have the following code in my ~/.bashrc:
date=$(which date)
date() {
if [[ $1 == -R || $1 == --rfc-822 ]]; then
# Output RFC-822 compliant date string.
# e.g. Wed, 16 Dec 2009 15:18:11 +0100
$date | sed "s/[^ ][^ ]*$/$($date +%z)/"
else
$date "$#"
fi
}
This works fine, as far as I can tell. Is there a reason to avoid having a variable and a function with the same name?
It's alright apart from being confusing. Besides, they are not the same:
$ date=/bin/ls
$ type date
date is hashed (/bin/date)
$ type $date
/bin/ls is /bin/ls
$ moo=foo
$ type $moo
-bash: type: foo: not found
$ function date() { true; }
$ type date
date is a function
date ()
{
true*emphasized text*
}
$ which true
/bin/true
$ type true
true is a shell builtin
Whenever you type a command, bash looks in three different places to find that command. The priority is as follows:
shell builtins (help)
shell aliases (help alias)
shell functions (help function)
hashed binaries files from $PATH ('leftmost' folders scanned first)
Variables are prefixed with a dollar sign, which makes them different from all of the above. To compare to your example: $date and date are not the same thing. So It's not really possible to have the same name for a variable and a function because they have different "namespaces".
You may find this somewhat confusing, but many scripts define "method variables" at the top of the file. e.g.
SED=/bin/sed
AWK=/usr/bin/awk
GREP/usr/local/gnu/bin/grep
The common thing to do is type the variable names in capitals. This is useful for two purposes (apart from being less confusing):
There is no $PATH
Checking that all "dependencies" are runnable
You can't really check like this:
if [ "`which binary`" ]; then echo it\'s ok to continue.. ;fi
Because which will give you an error if binary has not yet been hashed (found in a path folder).
Since you always have to use $ to dereference a variable in Bash, you're free to use any name you like.
Beware of overriding a global, though.
See also:
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_02.html
An alternative to using a variable: use bash's command keyword (see the manual or run help command from a prompt):
date() {
case $1 in
-R|--rfc-2822) command date ... ;;
*) command date "$#" ;;
esac
}

Create variable from string/nameonly parameter to extract data in bash?

I want to save the variable name and its contents easily from my script.
Currently :-
LOGFILE=/root/log.txt
TEST=/file/path
echo "TEST : ${TEST}" >> ${LOGFILE}
Desired :-
LOGFILE=/root/log.txt
function save()
{
echo "$1 : $1" >> ${LOGFILE}
}
TEST=/file/path
save TEST
Obviously the above save function just saves TEST : TEST
Want I want it to save is TEST : /file/path
Can this be done? How? Many thanks in advance!
You want to use Variable Indirection. Also, don't use the function keyword, it is not POSIX and also not necessary as long as you have () at the end of your function name.
LOGFILE=/root/log.txt
save()
{
echo "$1 : ${!1}" >> ${LOGFILE}
}
TEST=/file/path
save TEST
Proof of Concept
$ TEST=foo; save(){ echo "$1 : ${!1}"; }; save TEST
TEST : foo
Yes, using indirect expansion:
echo "$1 : ${!1}"
Quoting from Bash reference manual:
The basic form of parameter expansion is ${parameter} [...] If the first character of parameter is an exclamation point (!), a level of variable indirection is introduced. Bash uses the value of the variable formed from the rest of parameter as the name of the variable; this variable is then expanded and that value is used in the rest of the substitution, rather than the value of parameter itself. This is known as indirect expansion
Consider using the printenv function. It does exactly what it says on the tin, prints your environment. It can also take parameters
$ printenv
SSH_AGENT_PID=2068
TERM=xterm
SHELL=/bin/bash
LANG=en_US.UTF-8
HISTCONTROL=ignoreboth
...etc
You could do printenv and then grep for any vars you know you have defined and be done in two lines, such as:
$printenv | grep "VARNAME1\|VARNAME2"
VARNAME1=foo
VARNAME2=bar

Simple timer to measure seconds an operation took to complete

I run my own script to dump databases into files on a nightly basis.
I wanted to count time (in seconds) it takes to dump each database, so I was trying to write some functions to help me achieve it, but I'm running into problems.
I am no expert in scripting in bash, so if I'm doing it plain wrong, just say so and ideally suggest alternative, please.
Here's the script:
#!/bin/bash
declare -i time_start
function get_timestamp {
declare -i time_curr=`date -j -f "%a %b %d %T %Z %Y" "\`date\`" "+%s"`
echo "get_timestamp:" $time_curr
return $time_curr
}
function timer_start {
get_timestamp
time_start=$?
echo "timer_start:" $time_start
}
function timer_stop {
get_timestamp
declare -i time_curr=$?
echo "timer_stop:" $time_curr
declare -i time_diff=$time_curr-$time_start
return $time_diff
}
timer_start
sleep 3
timer_stop
echo $?
The code should really be quite self-explanatory. echo commands are only for debugging.
I expect the output to be something like this:
$ bash timer.sh
get_timestamp: 1285945972
timer_start: 1285945972
get_timestamp: 1285945975
timer_stop: 1285945975
3
Now this is not the case unfortunately. What I get is:
$ bash timer.sh
get_timestamp: 1285945972
timer_start: 116
get_timestamp: 1285945975
timer_stop: 119
3
As you can see, the value that local var time_curr gets from the command is a valid timestamp, but returning this value causes it to be changed to an integer between 0 and 255.
Can someone please explain to me why this is happening?
PS. This obviously is just my timer test script without any other logic.
UPDATE
Just to be perfectly clear, I want this to be part of a bash script very similar to this one, where I want to measure each loop cycle.
Unless of course I can do it with time, then please suggest a solution.
You don't need to do all this. Just run time <yourscript> in the shell.
$? is used to hold the exit status of a command and can only hold a value between 0 and 255. If you pass an exit code outside this range (say, in a C program calling exit(-1)), the shell will still receive a value in that range and set $? accordingly.
As a workaround, you could just set a different value in your bash function:
function get_timestamp {
declare -i time_curr=`date -j -f "%a %b %d %T %Z %Y" "\`date\`" "+%s"`
echo "get_timestamp:" $time_curr
get_timestamp_return_value=$time_curr
}
function timer_start {
get_timestamp
#time_start=$?
time_start=$get_timestamp_return_value
echo "timer_start:" $time_start
}
...
I believe you should be able to use the existing "time" function.
After Update to the question:
This was the bit of script from your link which was doing a for loop.
# dump each database in turn
for db in $databases; do
echo $db
$MYSQLDUMP --force --opt --user=$USER --password=$PASSWORD
--databases $db > "$OUTPUTDIR/$db.bak"
done
You could extract the inner portion of the loop into a new script (call it dump_one_db.sh)
and do this inside the loop:
# dump each database in turn
for db in $databases; do
time dump_one_db.sh $db
done
Make sure to write the output of the time against the db name into some file.
This is happening because return codes need to be between 0-255. You can't return an arbitrary number. If you continue to refuse to use the builtin time function and roll your own, change your functions to echo their stamp and use a process expansion ($()) to grab the value.

Resources