Bash and Test-Driven Development - bash

When writing more than a trivial script in bash, I often wonder how to make the code testable.
It is typically hard to write tests for bash code, due to the fact that it is low on functions that take a value and return a value, and high on functions that check and set some aspect in the environment, modify the file-system, invoke a program, etc. - functions that depend on the environment or have side effects. Thus the setup and test code become much more complicated than the code they test.
For example, consider a simple function to test:
function add_to_file() {
local f=$1
cat >> $f
sort -u $f -o $f
}
Test code for this function might consist of:
add_to_file.before:
foo
bar
baz
add_to_file.after:
bar
baz
foo
qux
And test code:
function test_add_to_file() {
cp add_to_file.{before,tmp}
add_to_file add_to_file.tmp
cmp add_to_file.{tmp,after} && echo pass || echo fail
rm add_to_file.tmp
}
Here 5 lines of code are tested by 6 lines of test code and 7 lines of data.
Now consider a slightly more complicated case:
function distribute() {
local file=$1 ; shift
local hosts=( "$#" )
for host in "${hosts[#]}" ; do
rsync -ae ssh $file $host:$file
done
}
I can't even say how to start write a test for that...
So, is there a good way to do TDD in bash scripts, or should I give up and put my efforts elsewhere?

So here is what I learned:
There are some testing frameworks written in bash and for bash,
however...
It is not so much that Bash is not suitable for TDD (although some
other languages come to mind that are a better fit), but the
typical tasks that Bash is used for (Installation, System
configuration), that are hard to write tests for, and in
particularly hard to setup the test.
The poor data structure support in Bash makes it hard to separate
logic from side-effect, and indeed there is typically little logic
in Bash scripts. That makes it hard to break scripts into
testable chunks. There are some functions that can be tested, but
that is the exception, not the rule.
Function are a good thing (tm), but they can only go so far.
Nested functions can be even better, but they are also limited.
At the end of the day, with major effort some coverage can be
obtained, but it will test the less interesting part of the code,
and will keep the bulk of the testing as a good (or bad) old manual
testing.
Meta: I decided to answer (and accept) my own question, because I was unable to choose between Sinan Ünür's (voted up) and mouviciel's (voted up) answers that where equally useful and insightful. I want to note Stefano Borini's answer, that although not impressed me initially, I learned to appreciate it over time. Also his design patterns or best practices for shell scripts answer (voted up) referred above was useful.

If you are writing code at the same time with tests, try to make it high on functions that don't use anything besides their parameters and don't modify environment. That is, if your function might as well run in a subshell, then it will be easy to test. It takes some arguments and outputs something to stdout, or to a file, or maybe it does something on the system, but caller does not feel side effects.
Yes, you will end up with big chain of functions passing down some WORKING_DIR variable that might as well be global, but this is minor inconvenience comparing to the task of tracking what does each function read and modify. Enabling unit tests is just a free bonus too.
Try to minimize cases where you need output. A little subshell abuse will go long way to keeping things nicely separated (at the expense of performance).
Instead of linear structure, where functions are called, set some environment, then other ones are called, all pretty much on one level, try to go for deep call tree with minimum data going back. Returning stuff in bash is inconvenient if you adopt self-imposed abstinence from global vars...

From an implementation point of view, I suggest shUnit2 or bats.
From a practical point of view, I suggest not to give up. I use TDD on bash scripts and I confirm that it is worth the effort.
Of course, I get about twice as many lines of test than of code but with complex scripts, efforts in testing are a good investment. This is true in particular when your client changes its mind near the end of the project and modifies some requirements. Having a regression test suite is a big aid in changing complex bash code.

If you code a bash program large enough to require TDD, you are using the wrong language.
I suggest you to read my previous post on best practices in bash programming, you will probably find something useful to make your bash program testable, but my statement above stays.
Design patterns or best practices for shell scripts

Writing what Meszaros calls consumer tests is hard in any language. Another approach is to verify the behavior of commands such as rsync manually, then write unit tests to prove specific functionality without hitting the network. In this slightly-modified example, $run is used to print the side-effects if the script is run with the keyword "test"
function distribute {
local file=$1 ; shift
for host in $# ; do
$run rsync -ae ssh $file $host:$file
done
}
if [[ $1 == "test" ]]; then
run="echo"
else
distribute schedule.txt $*
exit 0
fi
#
# Built-in self-tests
#
output=$(mktemp)
expected=$(mktemp)
set -e
trap "rm $got $expected" EXIT
distribute schedule.txt login1 login2 > $output
cat << EOF > $expected
rsync -ae ssh schedule.txt login1:schedule.txt
rsync -ae ssh schedule.txt login2:schedule.txt
EOF
diff $output $expected
echo -n '.'
echo; echo "PASS"

You might want to take a look at cucumber/aruba. Did quite a nice job for me.
Additionally, you can stub just about everything you want by doing something like this:
#
# code.sh
#
some_function_calling_some_external_binary()
{
if ! external_binary action_1; then
# ...
fi
if ! external_binary action_2; then
# ...
fi
}
#
# test.sh
#
# now for the test, simply stub your external binary:
external_binary()
{
if [ "$#" = "action_1" ]; then
# stub action_1
elif [ "$#" = "action_2" ]; then
# stub action_2
else
external_binary $#
fi
}

The advanced bash scripting guide has an example of an assert function but here is a simpler and more flexible assert function - just use eval of $* to test any condition.
assert() {
if ! eval $* ; then
echo
echo "===== Assertion failed: \"$*\" ====="
echo "File \"$0\", line:$LINENO line:${BASH_LINENO[*]}"
echo line:$(caller 0)
exit 99
fi
}
# e.g. USAGE:
assert [[ $r == 42 ]]
assert "((r==42))"
BASH_LINENO and caller bash builtin are bash shell specific.

take a look at Outthentic framework - it is designed to create scenarios which runs any Bash code and then analyze the stdout using formal DSL, it's pretty easy to build any Tdd/blackbox tests suite upon this tool.

Related

Conditional to move forward depending on what shell is used

I'm trying to write a .functions dotfile, with the purpose of loading it (source $HOME/.functions) in my bash, zsh and fish configuration files. I already did it with another (.aliases), successfully. However now I am facing a problem derived from fish not being posix-compliant.
The thing is that aliases share syntax among the three shells, but when it comes to functions fish has its own syntax (function my_func; #code; end instead of function my_func { #code; }). As an example, consider:
Fish:
function say_hello
echo "hello";
end
Bash/Zsh:
say_hello() {
echo "hello";
}
This disables me from just writing them in the file "as is", so I was thinking of writing a conditional such as if [ "$0" = "bash" ] || [ "$0" = "zsh" ]; then #functions_POSIX; else #functions_fish; fi. However, this conditional syntax is also not available in fish!
That's where I'm stuck rn. I would rather not have separate files for each shell.
Thank you in advance.
The only workable answer, in my opinion, is to separate the definitions.
Even if you figure out some way to hack around the fact that fish checks the syntax for the entire file (so wherever you put a bash function definition it will give a syntax error without executing anything), this won't yield a readable file that's nice to edit. You'll just be fighting the hacks.
And function definitions can't be shared anyway, as it's not just a simple search-and-replace of fi to end - the semantics are different, e.g. a command substitution will only be split on newlines, the various special variables ($#) work in different ways, arrays are entirely different, etc...
That means making it a single file isn't workable or helpful, so make your functions scripts instead (if they don't modify the shell's environment) or make a wrapper around a script that does the environment changing, or just write them twice.

How to Use `bats-mock` to Assert Against Calls to a Mocked Script in Bash Testing

I'm trying to use bats to test some critical shell scripts in a project I'm working on. I'd like to be able to mock scripts out in order to assert that a script calls another script with the correct arguments in a given situation. The bats-mock library seems like it should do the trick, but it hasn't been documented at all.
I've tried looking at the bats-mock code and several test helper scripts other people have created (like this one), but unfortunately I'm not comfortable enough with bash to be able to deduce how to correctly use the bats-mock library.
How can I use the bats-mock library to mock out a script and assert against calls to the mock?
Brief suggestion:
There is a newer more actively developer bats-mock that uses a slightly different approach that could be worth exploring. https://github.com/grayhemp/bats-mock
I'll be back later with.....MORE.
Back with more:
The main difference between them is which 'test double' style they implement. Brief explanation of some styles per Martin Fowler quoting a book covering many testing strategies, in his article mocksArentStubs
Meszaros uses the term Test Double as the generic term for any kind of
pretend object used in place of a real object for testing purposes.
The name comes from the notion of a Stunt Double in movies. (One of
his aims was to avoid using any name that was already widely used.)
Meszaros then defined four particular kinds of double:
Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in
memory database is a good example).
Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the
test.
Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records
how many messages it was sent.
Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are
expected to receive.
Of these kinds of doubles, only mocks insist upon behavior
verification. The other doubles can, and usually do, use state
verification. Mocks actually do behave like other doubles during the
exercise phase, as they need to make the SUT believe it's talking with
its real collaborators - but mocks differ in the setup and the
verification phases.
JasonKarns' appears to be primarily designed around enabling stubs where it returns dummy data for N number of calls to a script or binary and keeps an internal count of the number of calls to the stub and returns an error code if the calls don't match the N lines of fake data.
Grayhemp's version allows you to create a spy object and the output it should produce as well as its return code and any "side effects" that should be triggered when the mock is run (like a PID in the example below). You can then run a script or command that calls the command that the mock is shadowing and see how many times it was called and what the return code of the script was as well as the environment present when the mock command was called. Overall it seems to make it easier to assert what a script/binary was called with and how many times it was called.
BTW, if you too wasted hours wondering what in the $)*##$ would the code for get_timestamp look like in the jasonkarns example or what are the ${_DATE_ARGS}, here's my best guess based on the examples in this other answer for getting time since the epoch in milliseconds, https://serverfault.com/a/588705/266525:
You can copy and paste this into a Bash/POSIX shell to see that the first output matches what the first stub data line would be giving to get_timestamp and the second output matches the output that the first assert in the bats-mock example shows.
get_timestamp () {
# This should really be named get timestamp in milliseconds
# In truth it wouldn't accept input ie the ${1} below,
# but it is easier to show and test how it works with a fixed date (which is why we want to stub!)
GIVEN_DATE="$1"
# Pass in a human readable date and get back the epoch `%s` (seconds since 1-1-1970) and %N nanoseconds
# date +%s.%N -d'Mon Apr 18 03:19:58.184561556 CDT 2016'
EPOCH_NANO=$(date +%s.%N -d"$GIVEN_DATE")
echo "This reflects the data the date stub would return: $EPOCH_NANO"
# Accepts input in seconds.nanoseconds ie %s.%N and
# sets the output format to milliseconds,
# by combining the epoch `%s` (seconds since 1-1-1970) and
# first 3 digits of the nanoseconds with %3N
_DATE_ARGS='+%s%3N -d'
echo $(date ${_DATE_ARGS}"#${EPOCH_NANO}")
}
get_timestamp 'Mon Apr 18 03:19:58.184561556 CDT 2016' # The quotes make it a *single* argument $1 to the function
Example from jasonkarns/bats-mock docs, note to the left of the : are the incoming arguments required for the stub to match, if you called date with different arguments it may pass through and hit the real thing, but I haven't tested this since I already spent WAY too much time figuring out the original function in order to give a better comparison to the other bats-mock implementation.
# In bats you can declare globals outside your tests if you want them to apply
# to all tests in a file, or in a `fixture` or `vars` file and `load`or `source` it
declare -g _DATE_ARGS='+%s.%N -d'
# The interesting thing about the order of the mocked call returns is they are actually moving backwards in time,
# very interesting behavior and possibly needs another test that should throw a really big exception if this is encountered in the real world
# Original example below
#test "get_timestamp" {
stub date \
"${_DATE_ARGS} : echo 1460967598.184561556" \
"${_DATE_ARGS} : echo 1460967598.084561556" \
"${_DATE_ARGS} : echo 1460967598.004561556" \
"${_DATE_ARGS} : echo 1460967598.000561556" \
"${_DATE_ARGS} : echo 1460967598.000061556"
run get_timestamp
assert_success
assert_output 1460967598184
run get_timestamp
assert_success
assert_output 1460967598084
run get_timestamp
assert_success
assert_output 1460967598004
run get_timestamp
assert_success
assert_output 1460967598000
run get_timestamp
assert_success
assert_output 1460967598000
unstub date
}
Example from grayhemp/bats-mock's README, note the cool mock_set-* and mock_get_* options.
#test "postgres.sh starts Postgres" {
mock="$(mock_create)"
mock_set_side_effect "${mock}" "echo $$ > /tmp/postgres_started"
# Assuming postgres.sh expects the `_POSTGRES` variable to define a
# path to the `postgres` executable
_POSTGRES="${mock}" run postgres.sh
[[ "${status}" -eq 0 ]]
[[ "$(mock_get_call_num ${mock})" -eq 1 ]]
[[ "$(mock_get_call_user ${mock})" = 'postgres' ]]
[[ "$(mock_get_call_args ${mock})" =~ -D\ /var/lib/postgresql ]]
[[ "$(mock_get_call_env ${mock} PGPORT)" -eq 5432 ]]
[[ "$(cat /tmp/postgres_started)" -eq "$$" ]]
}
To get very similar behavior to jasonkarns version you need to inject the stub (aka symlink to the ${mock}) into the PATH yourself before calling the function. If you do this in the setup() method it occurs for every test, which might not be what you want, and you'll also want to make sure you remove the symlink in the teardown(), otherwise you can do the stubbing within the test and cleanup right at the end of the test (similar to the stub/unstub of jasonkarns version), though if you do it often you'll want to make it a test helper (basically reimplementing jasonkarns/bats-mock's stub inside grayhemp/bats-mock) and keep the helper with your tests so you can load or source it and reuse the functions in many tests. Or you could submit a PR to grayhemp/bats-mock to include the stubbing functionality (the race for DigitalOcean Hacktoberfest fame and infamy is on, and don't forget there is swag involved too!).
#test "get_timestamp" {
mocked_command="date"
mock="$(mock_create)"
mock_path="${mock%/*}" # Parameter expansion to get the folder portion of the temp mock's path
mock_file="${mock##*/}" # Parameter expansion to get the filename portion of the temp mock's path
ln -sf "${mock_path}/${mock_file}" "${mock_path}/${mocked_command}"
PATH="${mock_path}:$PATH" # Putting the stub at the beginning of the PATH so it gets picked up first
mock_set_output "${mock}" "1460967598.184561556" 1
mock_set_output "${mock}" "1460967598.084561556" 2
mock_set_output "${mock}" "1460967598.004561556" 3
mock_set_output "${mock}" "1460967598.000561556" 4
mock_set_output "${mock}" "1460967598.000061556" 5
mock_set_status "${mock}" 1 6
run get_timestamp
[[ "${status}" -eq 0 ]]
run get_timestamp
run get_timestamp
run get_timestamp
run get_timestamp
[[ "${status}" -eq 0 ]]
# Status is just of the previous invocation of `run`, so you can test every time or just once
# note that calling the mock more times than you set the output for does NOT change the exit status...
# unless you override it with `mock_set_status "${mock}" 1 6`
# Last bits are the exit code/status and index of call to return the status for
# This is a test to assert that mocked_command stub is in the path and points the right place
[[ "$(readlink -e $(which date))" == "$(readlink -e ${mock})" ]]
# This is a direct call to the stubbed command to show that it returns the `mock_set_status` defined code and shows up in the call_num
run ${mocked_command}
[[ "$status" -eq 1 ]]
[[ "$(mock_get_call_num ${mock})" -eq 6 ]]
# Check if your function exported something to the environment, the example get_timestamp function above does NOT
# [[ "$(mock_get_call_env ${mock} _DATE_ARGS 1)" -eq '~%s%3N' ]]
# Use the below line if you actually want to see all the arguments the function used to call the `date` 'stub'
# echo "# call_args: " $(mock_get_call_args ${mock} 1) >&3
# The actual args don't have the \ but the regex operator =~ treats + specially if it isn't escaped
date_args="\+%s%3N"
[[ "$(mock_get_call_args ${mock} 1)" =~ $date_args ]]
# Cleanup our stub and fixup the PATH
unlink "${mock_path}/${mocked_command}"
PATH="${PATH/${mock_path}:/}"
}
If anybody needs more clarification or wants to have a working repository let me know and I can push my code up.

How to find or make a Bash utility script library? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 23 days ago and failed to reopen the post:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
Improve this question
Is there any commonly used (or unjustly uncommonly used) utility "library" of bash functions? Something like Apache commons-lang for Java. Bash is so ubiquitous that it seems oddly neglected in the area of extension libraries.
If not, how would I make one?
Libraries for bash are out there, but not common. One of the reasons that bash libraries are scarce is due to the limitation of functions. I believe these limitations are best explained on "Greg's Bash Wiki":
Functions. Bash's "functions" have several issues:
Code reusability: Bash functions don't return anything; they only produce output streams. Every reasonable method of capturing that stream and either assigning it to a variable or passing it as an argument requires a SubShell, which breaks all assignments to outer scopes. (See also BashFAQ/084 for tricks to retrieve results from a function.) Thus, libraries of reusable functions are not feasible, as you can't ask a function to store its results in a variable whose name is passed as an argument (except by performing eval backflips).
Scope: Bash has a simple system of local scope which roughly resembles "dynamic scope" (e.g. Javascript, elisp). Functions see the locals of their callers (like Python's "nonlocal" keyword), but can't access a caller's positional parameters (except through BASH_ARGV if extdebug is enabled). Reusable functions can't be guaranteed free of namespace collisions unless you resort to weird naming rules to make conflicts sufficiently unlikely. This is particularly a problem if implementing functions that expect to be acting upon variable names from frame n-3 which may have been overwritten by your reusable function at n-2. Ksh93 can use the more common lexical scope rules by declaring functions with the "function name { ... }" syntax (Bash can't, but supports this syntax anyway).
Closures: In Bash, functions themselves are always global (have "file scope"), so no closures. Function definitions may be nested, but these are not closures, though they look very much the same. Functions are not "passable" (first-class), and there are no anonymous functions (lambdas). In fact, nothing is "passable", especially not arrays. Bash uses strictly call-by-value semantics (magic alias hack excepted).
There are many more complications involving: subshells; exported functions; "function collapsing" (functions that define or redefine other functions or themselves); traps (and their inheritance); and the way functions interact with stdio. Don't bite the newbie for not understanding all this. Shell functions are totally f***ed.
Source: http://mywiki.wooledge.org/BashWeaknesses
One example of a shell "library" is /etc/rc.d/functions on Redhat based system. This file contains functions commonly used in sysV init script.
I see some good info and bad info here. Let me share what I know since bash is the primary language I use at work (and we build libraries..).
Google has a decent write up on bash scripts in general that I thought was a good read: https://google.github.io/styleguide/shell.xml.
Let me start by saying you should not think of a bash library as you do libraries in other languages.
There are certain practices that must be enforced to keep a library in bash simple, organized, and most importantly, reusable.
There is no concept of returning anything from a bash function except for strings that it prints and the function's exit status (0-255).
There are expected limitations here and a learning curve especially if you're accustomed to functions of higher-level languages.
It can be weird at first, and if you find yourself in a situation where strings just aren't cutting it, you'll want to leverage an external tool such as jq.
If jq (or something like it) is available, you can start having your functions print formatted output to be parsed & utilized as you would an object, array, etc.
Function Declarations
There are two ways to declare a function in bash.
One operates within your current shell, we'll call is Fx0.
And one spawns a subshell to operate in, we'll call that Fx1.
Here are examples of how they're declared:
Fx0(){ echo "Hello from $FUNCNAME"; }
Fx1()( echo "Hello from $FUNCNAME" )
These 2 functions perform the same operation - indeed.
However, there is a key difference here.
Fx1 cannot perform any action that alters the current shell.
That means modifying variables, changing shell options and declaring other functions.
The latter is what can be exploited to prevent name spacing issues that can easily creep up on you.
# Fx1 cannot change the variable from a subshell
Fx0(){ Fx=0; }
Fx1()( Fx=1 )
Fx=foo; Fx0; echo $Fx
# 0
Fx=foo; Fx1; echo $Fx
# foo
That being said, The only time you should use an "Fx0" kind of function is when you're wanting to redeclare something in the current shell.
Always use "Fx1" functions because they are safer and you you don't have to worry about the naming of any functions declared within it.
As you can see below, the innocent function is overwritten inside of Fx1, however, it remains unscathed after the execution of Fx1.
innocent_function()(
echo ":)"
)
Fx1()(
innocent_function()( true )
innocent_function
)
Fx1 #prints nothing, just returns true
innocent_function
# :)
This would have (likely) unintended consequences if you had used curly braces.
Examples of useful "Fx0" type functions would be specifically for changing the current shell, like so:
use_strict(){
set -eEu -o pipefail
}
enable_debug(){
set -Tx
}
disable_debug(){
set +Tx
}
Regarding Declarations
The use of global variables, or at least those expected to have a value, is bad practice all the way around.
As you're building a library in bash, you don't ever want a function to rely on an external variable already being set.
Anything the function needs should be supplied to it via the positional parameters.
This is the main problem I see in libraries other folks try to build in bash.
Even if I find something cool, I can't use it because I don't know the names of the variables I need to have set ahead of time.
It leads to digging through all of the code and ultimately just picking out the useful pieces for myself.
By far, the best functions to create for a library are extremely small and don't utilize named variables at all, even locally.
Take the following for example:
serviceClient()(
showUsage()(
echo "This should be a help page"
) >&2
isValidArg()(
test "$(type -t "$1")" = "function"
)
isRunning()(
nc -zw1 "$(getHostname)" "$(getPortNumber)"
) &>/dev/null
getHostname()(
echo localhost
)
getPortNumber()(
echo 80
)
getStatus()(
if isRunning
then echo OK
else echo DOWN
fi
)
getErrorCount()(
grep -c "ERROR" /var/log/apache2/error.log
)
printDetails()(
echo "Service status: $(getStatus)"
echo "Errors logged: $(getErrorCount)"
)
if isValidArg "$1"
then "$1"
else showUsage
fi
)
Typically, what you would see near the top is local hostname=localhost and local port_number=80 which is fine, but it is not necessary.
It is my opinion that these things should be functional-ized as you're building to prevent future pain when all of a sudden some logic needs to be introduced for getting a value, like: if isHttps; then echo 443; else echo 80; fi.
You don't want that kind of logic placed in your main function or else you'll quickly make it ugly and unmanageable.
Now, serviceClient has internal functions that get declared upon invocation which adds an unnoticeable amount of overhead to each run.
The benefit is now you can have service2Client with functions (or external functions) that are named the same as what serviceClient has with absolutely no conflicts.
Another important thing to keep in mind is that redirections can be applied to an entire function upon declaring it. see: isRunning or showUsage
This gets as close to object-oriented-ness as I think you should bother using bash.
. serviceClient.sh
serviceClient
# This should be a help page
if serviceClient isRunning
then serviceClient printDetails
fi
# Service status: OK
# Errors logged: 0
I hope this helps my fellow bash hackers out there.
Here's a list of "worthy of your time" bash libraries that I found after spending an hour or so googling.
https://github.com/mietek/bashmenot/
bashmenot is a library that is used by Halcyon and Haskell on Heroku. The above link points to a complete list of available functions with examples -- impressive quality, quantity and documentation.
http://marcomaggi.github.io/docs/mbfl.html
MBFL offers a set of modules implementing common operations and a script template. Pretty mature project and still active on github
https://github.com/javier-lopez/learn/blob/master/sh/lib
You need to look at the code for a brief description and examples. It has a few years of development in its back.
https://github.com/martinburger/bash-common-helpers
This has the fewer most basic functions. For documentation you also have to look at the code.
Variables declared inside a function but without the local keyword are global.
It's good practice to declare variables only needed inside a function with local to avoid conflicts with other functions and globally (see foo() below).
Bash function libraries need to always be 'sourced'. I prefer using the 'source' synonym instead of the more common dot(.) so I can see it better during debugging.
The following technique works in at least bash 3.00.16 and 4.1.5...
#!/bin/bash
#
# TECHNIQUES
#
source ./TECHNIQUES.source
echo
echo "Send user prompts inside a function to stderr..."
foo() {
echo " Function foo()..." >&2 # send user prompts to stderr
echo " Echoing 'this is my data'..." >&2 # send user prompts to stderr
echo "this is my data" # this will not be displayed yet
}
#
fnRESULT=$(foo) # prints: Function foo()...
echo " foo() returned '$fnRESULT'" # prints: foo() returned 'this is my data'
echo
echo "Passing global and local variables..."
#
GLOBALVAR="Reusing result of foo() which is '$fnRESULT'"
echo " Outside function: GLOBALVAR=$GLOBALVAR"
#
function fn()
{
local LOCALVAR="declared inside fn() with 'local' keyword is only visible in fn()"
GLOBALinFN="declared inside fn() without 'local' keyword is visible globally"
echo
echo " Inside function fn()..."
echo " GLOBALVAR=$GLOBALVAR"
echo " LOCALVAR=$LOCALVAR"
echo " GLOBALinFN=$GLOBALinFN"
}
# call fn()...
fn
# call fnX()...
fnX
echo
echo " Outside function..."
echo " GLOBALVAR=$GLOBALVAR"
echo
echo " LOCALVAR=$LOCALVAR"
echo " GLOBALinFN=$GLOBALinFN"
echo
echo " LOCALVARx=$LOCALVARx"
echo " GLOBALinFNx=$GLOBALinFNx"
echo
The sourced function library is represented by...
#!/bin/bash
#
# TECHNIQUES.source
#
function fnX()
{
local LOCALVARx="declared inside fnX() with 'local' keyword is only visible in fnX()"
GLOBALinFNx="declared inside fnX() without 'local' keyword is visible globally"
echo
echo " Inside function fnX()..."
echo " GLOBALVAR=$GLOBALVAR"
echo " LOCALVARx=$LOCALVARx"
echo " GLOBALinFNx=$GLOBALinFNx"
}
Running TECHNIQUES produces the following output...
Send user prompts inside a function to stderr...
Function foo()...
Echoing 'this is my data'...
foo() returned 'this is my data'
Passing global and local variables...
Outside function: GLOBALVAR=Reusing result of foo() which is 'this is my data'
Inside function fn()...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVAR=declared inside fn() with 'local' keyword is only visible in fn()
GLOBALinFN=declared inside fn() without 'local' keyword is visible globally
Inside function fnX()...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVARx=declared inside fnX() with 'local' keyword is only visible in fnX()
GLOBALinFNx=declared inside fnX() without 'local' keyword is visible globally
Outside function...
GLOBALVAR=Reusing result of foo() which is 'this is my data'
LOCALVAR=
GLOBALinFN=declared inside fn() without 'local' keyword is visible globally
LOCALVARx=
GLOBALinFNx=declared inside fnX() without 'local' keyword is visible globally
I found a good but old article here that gave a comprehensive list of utility libraries:
http://dberkholz.com/2011/04/07/bash-shell-scripting-libraries/
I can tell you that the lack of available function libraries has nothing to do with Bash's limitations, but rather how Bash is used. Bash is a quick and dirty language made for automation, not development, so the need for a library is rare. Then, you start to have a fine line between a function that needs to be shared, and converting the function into a full fledged script to be called. This is from a coding perspective, to be loaded by a shell is another matter, but normally runs on personal taste, not need. So... again a lack of shared libraries.
Here are a few functions I use regularly
In my .bashrc
cd () {
local pwd="${PWD}/"; # we need a slash at the end so we can check for it, too
if [[ "$1" == "-e" ]]; then
shift
# start from the end
[[ "$2" ]] && builtin cd "${pwd%/$1/*}/${2:-$1}/${pwd##*/$1/}" || builtin cd "$#"
else
# start from the beginning
if [[ "$2" ]]; then
builtin cd "${pwd/$1/$2}"
pwd
else
builtin cd "$#"
fi
fi
}
And a version of a log()/err() exists in a function library at work for coders-- mainly so we all use the same style.
log() {
echo -e "$(date +%m.%d_%H:%M) $#"| tee -a $OUTPUT_LOG
}
err() {
echo -e "$(date +%m.%d_%H:%M) $#" |tee -a $OUTPUT_LOG
}
As you can see, the above utilities we use here, are not that exciting to share. I have another library to do tricks around bash limitations, which I think is the best use for them and I recommend creating your own.

How to parametrize verbosity of debug output (BASH)?

During the process of writing a script, I will use the command's output in varying ways, and to different degrees - in order to troubleshoot the task at hand.. For example, in this snippet, which reads an Application's icon resource and returns whether or not it has the typical .icns extension...
icns=`defaults read /$application/Contents/Info CFBundleIconFile`
if ! [[ $icns =~ ^(.*)(.icns)$ ]]; then
echo -e $icns "is NOT OK YOU IDIOT! **** You need to add .icns to "$icns"."
else
echo -e $icns "\t Homey, it's cool. That shits got its .icns, proper."
fi
Inevitably, as each bug is squashed, and the stdout starts relating more to the actual function vs. the debugging process, this feedback is usually either commented out, silenced, or deleted - for obvious reasons.
However, if one wanted to provide a simple option - either hardcoded, or passed as a parameter, to optionally show some, all, or none of "this kind" of message at runtime - what is the best way to provide that simple functionality? I am looking to basically duplicate the functionality of set -x but instead of a line-by rundown, it would only print the notifications that I had architected specificically.
It seems excessive to replace each and every echo with an if that checks for a debug=1|0, yet I've been unable to find a concise explanation of how to implement a getopts/getopt scheme (never can remember which one is the built-in), etc. in my own scripts. This little expression seemed promising, but there is very little documentation re: 2>$1 out there (although I'm sure this is key to this puzzle)
[ $DBG ] && DEBUG="" || DEBUG='</dev/null'
check_errs() {
# Parameter 1 is the return code Para. 2 is text to display on failure.
if [ "${1}" -ne "0" ]; then
echo "ERROR # ${1} : ${2}"
else
echo "SUCESSS "
fi }
Any concise and reusable tricks to this trade would be welcomed, and if I'm totally missing the boat, or if it was a snake, and it would be biting me - I apologize.
One easy trick is to simply replace your "logging" echo comamnd by a variable, i.e.
TRACE=:
if test "$1" = "-v"; then
TRACE=echo
shift
fi
$TRACE "You passed the -v option"
You can have any number of these for different types of messages if you wish so.
you may check a common open source trace library with support for bash.
http://sourceforge.net/projects/utalm/
https://github.com/ArnoCan/utalm
WKR
Arno-Can Uestuensoez

How to deal with NFS latency in shell scripts

I'm writing shell scripts where quite regularly some stuff is written
to a file, after which an application is executed that reads that file. I find that through our company the network latency differs vastly, so a simple sleep 2 for example will not be robust enough.
I tried to write a (configurable) timeout loop like this:
waitLoop()
{
local timeout=$1
local test="$2"
if ! $test
then
local counter=0
while ! $test && [ $counter -lt $timeout ]
do
sleep 1
((counter++))
done
if ! $test
then
exit 1
fi
fi
}
This works for test="[ -e $somefilename ]". However, testing existence is not enough, I sometimes need to test whether a certain string was written to the file. I tried
test="grep -sq \"^sometext$\" $somefilename", but this did not work. Can someone tell me why?
Are there other, less verbose options to perform such a test?
You can set your test variable this way:
test=$(grep -sq "^sometext$" $somefilename)
The reason your grep isn't working is that quotes are really hard to pass in arguments. You'll need to use eval:
if ! eval $test
I'd say the way to check for a string in a text file is grep.
What's your exact problem with it?
Also you might adjust your NFS mount parameters, to get rid of the root problem. A sync might also help. See NFS docs.
If you're wanting to use waitLoop in an "if", you might want to change the "exit" to a "return", so the rest of the script can handle the error situation (there's not even a message to the user about what failed before the script dies otherwise).
The other issue is using "$test" to hold a command means you don't get shell expansion when actually executing, just evaluating. So if you say test="grep \"foo\" \"bar baz\"", rather than looking for the three letter string foo in the file with the seven character name bar baz, it'll look for the five char string "foo" in the nine char file "bar baz".
So you can either decide you don't need the shell magic, and set test='grep -sq ^sometext$ somefilename', or you can get the shell to handle the quoting explicitly with something like:
if /bin/sh -c "$test"
then
...
Try using the file modification time to detect when it is written without opening it. Something like
old_mtime=`stat --format="%Z" file`
# Write to file.
new_mtime=$old_mtime
while [[ "$old_mtime" -eq "$new_mtime" ]]; do
sleep 2;
new_mtime=`stat --format="%Z" file`
done
This won't work, however, if multiple processes try to access the file at the same time.
I just had the exact same problem. I used a similar approach to the timeout wait that you include in your OP; however, I also included a file-size check. I reset my timeout timer if the file had increased in size since last it was checked. The files I'm writing can be a few gig, so they take a while to write across NFS.
This may be overkill for your particular case, but I also had my writing process calculate a hash of the file after it was done writing. I used md5, but something like crc32 would work, too. This hash was broadcast from the writer to the (multiple) readers, and the reader waits until a) the file size stops increasing and b) the (freshly computed) hash of the file matches the hash sent by the writer.
We have a similar issue, but for different reasons. We are reading s file, which is sent to an SFTP server. The machine running the script is not the SFTP server.
What I have done is set it up in cron (although a loop with a sleep would work too) to do a cksum of the file. When the old cksum matches the current cksum (the file has not changed for the determined amount of time) we know that the writes are complete, and transfer the file.
Just to be extra safe, we never overwrite a local file before making a backup, and only transfer at all when the remote file has two cksums in a row that match, and that cksum does not match the local file.
If you need code examples, I am sure I can dig them up.
The shell was splitting your predicate into words. Grab it all with $# as in the code below:
#! /bin/bash
waitFor()
{
local tries=$1
shift
local predicate="$#"
while [ $tries -ge 1 ]; do
(( tries-- ))
if $predicate >/dev/null 2>&1; then
return
else
[ $tries -gt 0 ] && sleep 1
fi
done
exit 1
}
pred='[ -e /etc/passwd ]'
waitFor 5 $pred
echo "$pred satisfied"
rm -f /tmp/baz
(sleep 2; echo blahblah >>/tmp/baz) &
(sleep 4; echo hasfoo >>/tmp/baz) &
pred='grep ^hasfoo /tmp/baz'
waitFor 5 $pred
echo "$pred satisfied"
Output:
$ ./waitngo
[ -e /etc/passwd ] satisfied
grep ^hasfoo /tmp/baz satisfied
Too bad the typescript isn't as interesting as watching it in real time.
Ok...this is a bit whacky...
If you have control over the file: you might be able to create a 'named pipe' here.
So (depending on how the writing program works) you can monitor the file in an synchronized fashion.
At its simplest:
Create the named pipe:
mkfifo file.txt
Set up the sync'd receiver:
while :
do
process.sh < file.txt
end
Create a test sender:
echo "Hello There" > file.txt
The 'process.sh' is where your logic goes : this will block until the sender has written its output. In theory the writer program won't need modifiying....
WARNING: if the receiver is not running for some reason, you may end up blocking the sender!
Not sure it fits your requirement here, but might be worth looking into.
Or to avoid synchronized, try 'lsof' ?
http://en.wikipedia.org/wiki/Lsof
Assuming that you only want to read from the file when nothing else is writing to it (ie, the writing process has finished) - you could check whether nothing else has file handle to it ?

Resources