why function ls { ls; } hangs in there? - bash

The following shell function definition hangs on there in bash console (RHEL/Ubuntu) in Cygwin it just quits the terminal when it is invoked.
$ function ls { ls; }
$ ls
Any reason why this behavior is?

Your defined ls command is recursively calling itself rather than the previous ls command.
If you want to call the actual ls from your redefined one, you can simply use which to get the full path name, such as redefining ls to give you the long format:
function ls { $(which ls) -l; }
That's effectively the same as:
function ls { /bin/ls -l; }
which won't give you the problems your solution has with recursion.
Another option is to use
function ls { command ls -l; }
command will suppress shell function lookup and only allow for built-ins or programs on the path.
Builtins (like cd) are handled slightly differently to programs since they aren't actually located on the file system. In that case, you can use builtin, rather then which, to call the built-in version.
If you want to define a function in terms of something that may already be a function, that's a bit trickier. You can use declare -f to get the current definition, then manipulate that to create a new definition.
An example of this (though contrived) follows. Let's say you declare a function to show all text files:
pax> showtxt()
...> {
...> ls *.txt
...> }
and you now want to give it a pretty heading. Using declare -f showtxt, you can see it's definition:
pax> declare -f showtxt
showtxt ()
{
ls *.txt
}
Running that may result in the following output:
pax> showtxt
passwords.txt p0rnsites.txt results.txt
Now say you wanted to change it to give it a heading. You can capture the output of declare -f and modify it to make a script which will redefine the function thus:
pax> declare -f showtxt | awk '$1=="ls"{print "echo Text files:"}{print}' >tmp.sh
pax> cat tmp.sh
showtxt ()
{
echo Text files:
ls *.txt
}
You can see that you now have a modified function definition which, when run, will replace the function:
pax> . ./tmp.sh
pax> declare -f showtxt
showtxt ()
{
echo Text files:;
ls *.txt
}
And, when you run the new function, it's behaviour has changed:
pax> showtxt
Text files:
passwords.txt p0rnsites.txt results.txt
Now that contrived example isn't that useful since you probably could have typed in in yourself. Where this comes in handy is when the original function is more complex or the changes you want to make to it are many and varied.

You named your function ls. Now, this overrides any other function(s) which were named ls before. So, as a result, your function calls itself recursively infinitely...
The best idea is to use unique names for your functions, i.e., this works fine:
function myls { ls; }

Related

Shell Script msg() echo "${RED}$#${NOCOLOR}", What does it mean [duplicate]

Sometimes I have a one-liner that I am repeating many times for a particular task, but will likely never use again in the exact same form. It includes a file name that I am pasting in from a directory listing. Somewhere in between and creating a bash script I thought maybe I could just create a one-liner function at the command line like:
numresults(){ ls "$1"/RealignerTargetCreator | wc -l }
I've tried a few things like using eval, using numresults=function..., but haven't stumbled on the right syntax, and haven't found anything on the web so far. (Everything coming up is just tutorials on bash functions).
Quoting my answer for a similar question on Ask Ubuntu:
Functions in bash are essentially named compound commands (or code
blocks). From man bash:
Compound Commands
A compound command is one of the following:
...
{ list; }
list is simply executed in the current shell environment. list
must be terminated with a newline or semicolon. This is known
as a group command.
...
Shell Function Definitions
A shell function is an object that is called like a simple command and
executes a compound command with a new set of positional parameters.
... [C]ommand is usually a list of commands between { and }, but
may be any command listed under Compound Commands above.
There's no reason given, it's just the syntax.
Try with a semicolon after wc -l:
numresults(){ ls "$1"/RealignerTargetCreator | wc -l; }
Don't use ls | wc -l as it may give you wrong results if file names have newlines in it. You can use this function instead:
numresults() { find "$1" -mindepth 1 -printf '.' | wc -c; }
You can also count files without find. Using arrays,
numresults () { local files=( "$1"/* ); echo "${#files[#]}"; }
or using positional parameters
numresults () { set -- "$1"/*; echo "$#"; }
To match hidden files as well,
numresults () { local files=( "$1"/* "$1"/.* ); echo $(("${#files[#]}" - 2)); }
numresults () { set -- "$1"/* "$1"/.*; echo $(("$#" - 2)); }
(Subtracting 2 from the result compensates for . and ...)
You can get a
bash: syntax error near unexpected token `('
error if you already have an alias with the same name as the function you're trying to define.
The easiest way maybe is echoing what you want to get back.
function myfunc()
{
local myresult='some value'
echo "$myresult"
}
result=$(myfunc) # or result=`myfunc`
echo $result
Anyway here you can find a good how-to for more advanced purposes

Modify a shell variable inside awk block of code

Is there any way to modify a shell variable inside awk block of code?
--------- [shell_awk.sh]---------------------
#!/bin/bash
shell_variable_1=<value A>
shell_variable_2=<value B>
shell_variable_3=<value C>
awk 'function A(X)
{ return X+1 }
{ a=A('$shell_variable_1')
b=A('$shell_variable_2')
c=A('$shell_variable_3')
shell_variable_1=a
shell_variable_2=b
shell_variable_3=c
}' FILE.TXT
--------- [shell_awk.sh]---------------------
This is a very simple example, the real script load a file and make some changes using functions, I need to keep each value before change into a specific variable, so then I can register into MySQL the before and after value.
The after value is received from parameters ($1, $2 and so on).
The value before I already know how to get it from the file.
All is done well, except the shell_variable been set by awk variable. Outside from awk block code is easy to set, but inside, is it possible?
No program -- in awk, shell, or any other language -- can directly modify a parent process's memory. That includes variables. However, of course, your awk can write contents to stdout, and the parent shell can read that content and modify its own variables itself.
Here's an example of awk that writes key/value pairs out to be read by bash. It's not perfect -- read the caveats below.
#!/bin/bash
shell_variable_1=10
shell_variable_2=20
shell_variable_3=30
run_awk() {
awk -v shell_variable_1="$shell_variable_1" \
-v shell_variable_2="$shell_variable_2" \
-v shell_variable_3="$shell_variable_3" '
function A(X) { return X+1 }
{ a=A(shell_variable_1)
b=A(shell_variable_2)
c=A(shell_variable_3) }
END {
print "shell_variable_1=" a
print "shell_variable_2=" b
print "shell_variable_3=" c
}' <<<""
}
while IFS="=" read -r key value; do
printf -v "$key" '%s' "$value"
done < <(run_awk)
for var in shell_variable_{1,2,3}; do
printf 'New value for %s is %s\n' "$var" "${!var}"
done
Advantages
Doesn't use eval. Content such as $(rm -rf ~) in the output from awk won't be executed by your shell.
Disadvantages
Can't handle variable contents with newlines. (You could fix this by NUL-delimiting output from your awk script, and adding -d '' to the read command).
A hostile awk script could modify PATH, LD_LIBRARY_PATH, or other security-sensitive variables. (You could fix this by reading variables into an associative array, rather than the global namespace, or by enforcing a prefix on their names).
The code above uses several ksh extensions also available in bash; however, it will not run with POSIX sh. Thus, be sure not to run this via sh scriptname (which only guarantees POSIX functionality).

Exporting the full environment to GNU Parallel

I find it somewhat annoying that I cannot use aliases in GNU Parallel:
alias gi="grep -i"
parallel gi bar ::: foo
/bin/bash: gi: command not found
I had somewhat come to terms with that it is just the way it is. But reading Accessing Associative Arrays in GNU Parallel I am starting to think: Does it really have to be this way?
Is is possible to make a bash function, that collects all of the environment into a function, exports that function and calls GNU Parallel, which will then import the environment in the spawned shell using that function?
So I am not talking about a specialized solution for the gi-alias, but a bash function that will take all aliases/functions/variables (without me having to name them explicitly), package those into a function, that can be activated by GNU Parallel.
Something similar to:
env_parallel() {
# [... gather all environment/all aliases/all functions into parallel_environment() ...]
foreach alias in all aliases {
append alias definition to definition of parallel_environment()
}
foreach variable in all variables (including assoc arrays) {
append variable definition to definition of parallel_environment()
# Code somewhat similar to https://stackoverflow.com/questions/24977782/accessing-associative-arrays-in-gnu-parallel
}
foreach function in all functions {
append function definition to definition of parallel_environment()
}
# make parallel_environment visible to GNU Parallel
export -f parallel_environment
# Running parallel_environment will now create an environment with
# all variables/all aliases/all functions set in current state
# (with the exception of the function parallel_environment of course)
# Inside GNU parallel:
# if set parallel_environment(): prepend it to the command to run
`which parallel` "$#"
}
# Set an example alias
alias fb="echo fubar"
# Set an example variable
BAZ=quux
# Make an example function
myfunc() {
echo $BAZ
}
# This will record the current environment including the 3 examples
# put it into parallel_environment
# run parallel_environment (to set the environment)
# use the 3 examples
env_parallel parallel_environment\; fb bar {}\; myfunc ::: foo
# It should give the same output as running:
fb bar foo; myfunc
# Outputs:
# fubar bar foo
# quux
Progress: This seems to be close to what I want activated:
env_parallel() {
export parallel_environment='() {
'"$(echo "shopt -s expand_aliases"; alias;typeset -p | grep -vFf <(readonly);typeset -f)"'
}'
`which parallel` "$#"
}
VAR=foo
myfunc() {
echo $VAR $1
}
alias myf=myfunc
env_parallel parallel_environment';
' myfunc ::: bar # Works (but gives errors)
env_parallel parallel_environment';
' myf ::: bar # Works, but requires the \n after ;
So now I am down to 1 issue:
weed out all the variables that cannot be assigned value (e.g BASH_ARGC)
How do I list those?
GNU Parallel 20140822 implements this. To activate it you will need to run this once (e.g. in .bashrc):
env_parallel() {
export parallel_bash_environment='() {
'"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
}'
# Run as: env_parallel ...
`which parallel` "$#"
unset parallel_bash_environment
}
And call GNU Parallel as:
env_parallel ...
That should put the myth to rest that it is impossible to export aliases: all you need is a little Behändigkeit (Thanks a lot to #rici for the inspiration).
In principle, it should be possible. But, as usual, there are a lot of details.
First, it is quite possible in bash for a name to be simultaneously a function, a variable (scalar or array) and an alias. Also, the function and the variable can be exported independently.
So there would be a certain ambiguity in env_parallel foo ... in the case that foo has more than one definition. Possibly the best solution would be to detect the situation and report an error, using a syntax like:
env_parallel -a foo -f bar
in order to be more specific, if necessary.
A simpler possibility is to just export the ambiguity, which is what I do below.
So the basic logic to the importer used in env_parallel might be something like this, leaving out lots of error checking and other niceties:
# Helper functions for clarity. In practice, since they are all short,
# I'd probably in-line all of these by hand to reduce name pollution.
get_alias_() { alias "$1" 2>/dev/null; }
get_func_() { declare -f "$1" 2>/dev/null; }
get_var_() { [[ -v "$1" ]] && declare -p "$1" | sed '1s/--\?/-g/'; }
make_importer() {
local name_
export $1='() {
'"$(for name_ in "${#:2}"; do
local got_=()
get_alias_ "$name_" && got_+=(alias)
get_func_ "$name_" && got_+=(function)
get_var_ "$name_" && got_+=(variable)
if [[ -z $got_ ]]; then
echo "Not found: $name_" >>/dev/stderr
elif (( ${#got_[#]} > 1 )); then
printf >>/dev/stderr \
"Ambiguous: %s is%s\n" \
$name_ "$(printf " %s" "${got_[#]}")"
fi
done)"'
}'
}
In practice, there's no real point defining the function in the local environment if the only purpose is to transmit it to a remote shell. It would be sufficient to print the export command. And, while it is convenient to package the import into a function, as in Accessing Associative Arrays in GNU Parallel,
it's not strictly necessary. It does make it a lot easier to pass the definitions through utilities like Gnu parallel, xargs or find, which is what I typically use this hack for. But depending on how you expect to use the definitions, you might be able to simplify the above by simply prepending the list of definitions to the given command. (If you do that, you won't need to fiddle the global flag with the sed in get_var_.)
Finding out what is in the environment
In case it is useful, here is how to get a list of all aliases, functions and variables:
Functions
declare -F | cut -d' ' -f3
Aliases (Note 1)
alias | awk '/^alias /{print substr($2,1,index($2,"=")-1)}'
Variables (Note 1)
declare -p | awk '$1=="declare"{o=(index($3, "="));print o?substr($3,1,o-1):$3}'
In the awk program, you could check for variable type by looking at $2, which will is usually -- but could be -A for an associative array, -a for an array with integer keys, -i for an integer, -x for exported and -r for readonly. (More than one option may apply; -aix is an "exported" (not implemented) integer array.
Note 1
The alias and declare -p commands produce "reusable" output, which could be eval'ed or piped into another bash, so the values are quoted. Unfortunately, the quoting is just good enough for eval; it's not good enough to avoid confusion. It is possible to define, for example:
x='
declare -a FAKE
'
in which case the output of declare -p will include:
declare -x='
declare -a FAKE
'
Consequently, the lists of aliases and variables need to be treated as "possible names": all names will be included, but it might be that everything included is not a name. Mostly that means being sure to ignore errors:
for a in "${_aliases[#]}"; do
if
defn=$(alias $a 2>>/dev/null)
then
# do something with $defn
fi
done
As is often the case, the solution is to use a function, not an alias. You must first export the function (since parallel and bash are both developed by GNU, parallel knows how to deal with functions as exported by bash).
gi () {
grep -i "$#"
}
export -f go
parallel gi bar ::: foo

How can you override function redirection in bash?

I recently discovered some bash code that used the little-known (well, little known to me anyway) feature of function redirection, such as the greatly simplified:
function xyzzy () {
echo hello
} >/dev/null
When you call the function with a simple xyzzy, it automatically applies the redirections attached to the function regardless of what you've done when calling it.
What I'd like to know is if there's any way to override this behaviour in the call to the function itself, to see the message being generated. I'm reticent to change the file containing all the functions since (1) it's large, (2) it changes regularly, and (3) it's heavily protected by the group that supports it.
I've tried:
xyzzy >&1
to try to override it but the output still doesn't show up (possibly because >&1 may be considered a no-op).
In other words, given the script:
function xyzzy () {
echo hello
} >/tmp/junk
rm -f /tmp/junk
echo ================
echo Standard output
echo ----------------
xyzzy # something else here
echo ================
echo Function capture
echo ----------------
cat /tmp/junk
echo ================
it currently outputs:
================
Standard output
----------------
================
Function capture
----------------
hello
================
What can I change the xyzzy call to, so as to get hello printed in the standard output section rather than the function capture section?
And this needs to be without trying to read the file /tmp/junk after it's created since the actual redirections may be to /dev/null so they won't be in a file.
The only thing I can think of would be to parse the output of declare -f function_name and remove the redirection.
This is perhaps the easiest approach. Note that you need to tailor the awk script to the specific function layout and it doesn't modify the body of the function at all. That means you can only turn off redirection at the top level. You could modify whole call trees of functions to turn off redirection but that would require a bash parser capable of recognising and changing function calls within the body.
The following script shows how to do it with your sample function. All the awk command does is create a new function my_xyzzy which mirrors the xyzzy function except for the final line, effectively turning it into:
function my_xyzzy () {
echo hello
}
And the complete script as per the specifications:
function xyzzy () {
echo hello
} >/tmp/qqqq
declare -f xyzzy | awk '
NR==1 {print "my_xyzzy ()"}
NR==2 {prev=$0}
NR>2 {print prev;prev=$0}
END {print "}"}' >$$.bash
. $$.bash
rm -f $$.bash
rm -f /tmp/qqqq
echo ================
echo Standard output
echo ----------------
my_xyzzy
echo ================
echo Function capture
echo ----------------
cat /tmp/qqqq
echo ================
The output of that is:
================
Standard output
----------------
hello
================
Function capture
----------------
cat: /tmp/qqqq: No such file or directory
================
I don't think Bash function redirections can be overridden dynamically in the call to the function itself although a temporarily altered shell context can be made use of by combining Bash aliases and functions (see Magic Aliases: A Layering Loophole in the Bourne Shell).
Non-dynamically it is the last redirection expression, i. e. the rightmost one, that overrides the previous ones if the redirection expressions refer to the same file descriptor.
# example
ls -ld / no_such_file 1>/dev/null 1>/dev/tty 1>&2 1>redirtest.txt
cat redirtest.txt
Therefore, glenn jackman's suggestion to use declare -f function_name seems the way to add a final stdout redirection expression to override the previous ones.
xyzzy() { echo 'Hello, world!'; } 1>/dev/null
#func="$(declare -f xyzzy) 1>&2"
func="$(declare -f xyzzy) 1>/dev/tty"
eval "$func"
xyzzy

read stdin in function in bash script

I have some set of bash functions which output some information:
find-modelname-in-epson-ppds
find-modelname-in-samsung-ppds
find-modelname-in-hp-ppds
etc ...
I've been writing functions which read output and filter it:
function filter-epson {
find-modelname-in-epson-ppds | sed <bla-blah-blah>
}
function filter-hp {
find-modelname-in-hp-ppds | sed <the same bla-blah-blah>
}
etc ...
But the I thought that it would be better do something like this:
function filter-general {
(somehow get input) | sed <bla-blah-blah>
}
and then call in another high-level functions:
function high-level-func {
# outputs filtered information
find-modelname-in-hp/epson/...-ppds | filter-general
}
How can I achieve that with the best bash practices?
If the question is How do I pass stdin to a bash function?, then the answer is:
Shellscript functions take stdin the ordinary way, as if they were commands or programs. :)
input.txt:
HELLO WORLD
HELLO BOB
NO MATCH
test.sh:
#!/bin/sh
myfunction() {
grep HELLO
}
cat input.txt | myfunction
Output:
hobbes#metalbaby:~/scratch$ ./test.sh
HELLO WORLD
HELLO BOB
Note that command line arguments are ALSO handled in the ordinary way, like this:
test2.sh:
#!/bin/sh
myfunction() {
grep "$1"
}
cat input.txt | myfunction BOB
Output:
hobbes#metalbaby:~/scratch/$ ./test2.sh
HELLO BOB
To be painfully explicit that I'm piping from stdin, I sometimes write
cat - | ...
A very simple means to get stdin into a variable is to use read. By default, it reads file descriptor "0", i.e. stdin i.e., /dev/stdin.
Example Function:
input(){ local in; read in; echo you said $in; }
Example implementation:
echo "Hello World" | input
Result:
you said Hello World
Additional info
You don't need to declare a variable as being local, of course. I just included that for the sake of good form. Plain old read in does what you need.
So you understand how read works, by default it reads data off the given file descriptor (or implicit stdin) and blocks until it encounters a newline. Much of the time, you'll find that will implicitly be attached to your input, even if you weren't aware of it. If you have a function that seems to hang with this mechanism just keep this detail in mind (there are other ways of using read to deal with that...).
More robust solutions
Adding on to the basic example, here's a variation that lets you pass the input via a stdin OR an argument:
input()
{
local in=$1; if [ -z "$in" ]; then read in; fi
echo you said $in
}
With that tweak, you could ALSO call the function like:
input "Hello World"
How about handling an stdin option plus other arguments? Many standard nix utilities, especially those which typically work with stdin/stdout adhere to the common practice of treating a dash - to mean "default", which contextually means either stdin or stdout, so you can follow the convention, and treat an argument specified as - to mean "stdin":
input()
{
local a=$1; if [ "$a" == "-" ]; then read a; fi
local b=$2
echo you said $a $b
}
Call this like:
input "Hello" "World"
or
echo "Hello" | input - "World"
Going even further, there is actually no reason to only limit stdin to being an option for only the first argument! You might create a super flexible function that could use it for any of them...
input()
{
local a=$1; if [ "$a" == "-" ]; then read a; fi
local b=$2; if [ "$b" == "-" ]; then read b; fi
echo you said $a $b
}
Why would you want that? Because you could formulate, and pipe in, whatever argument you might need...
myFunc | input "Hello" -
In this case, I pipe in the 2nd argument using the results of myFunc rather than the only having the option for the first.
Call sed directly. That's it.
function filter-general {
sed <bla-blah-blah>
}

Resources