Why are bash variables 'different'? - bash

Is there some reason why bash 'variables' are different from variables in other 'normal' programming languages?
Is it due to the fact that they are set by the output of previous programs or have to be set by some kind of literal text, ie they have to be set by the output of some program or something outputting text through standard input/output or the console or such like?
I am at a loss to use the right vocabulary, but can anyone who can understands what I trying to say and perhaps use the right words or point me some docs where I can understand bash variable concepts better.

In most languages, variables can contain different kinds of values. For example, in Python a variable can be a number that you can do arithmetics on (a-1), an array or string you can split (a[3:]), or a custom, nested object (person.name.first_name).
In bash, you can't do any of this directly. If I understood you right, you asked why this is.
There are two reasons why you can't really do the same in bash.
One: environment variables are (conventionally) simple key=value strings, and the original sh was a pretty thin wrapper on top of the Unix process model. Bash works the same, for technical and compatibility reasons. Since all variables are (based on) strings, you can't really have rich, nested types.
This also means that you can't set a variable in a subshell/subscript you call. The variable won't be set in the parent script, because that's not how environment variables work.
Two: Original sh didn't separate code and data, since this makes it easier to work with interactively. Sh treated all non-special characters as literal. I.e. find / -name foo was considered four literal strings: a command and three arguments.
Bash can't just decide that find / -name now means "the value of the variable find divided by the negated value of variable name", since that would mean everyone's find commands would start breaking. This is why you can't have the simple dereferencing syntax other languages do.
Even $name-1 can't be used to substract, because it could just as easily be intended as part of $name-1-12-2012.tar.gz, a filename with a timestamp.

I would say it has to do with Bash functions. Bash functions cannot return a value, only a status code.
So with Bash you can have a function
foo ()
{
grep bar baz
}
But if you try to "save" the return value of the function
quux=$?
It is merely saving the exit status, not any value. Contrast this with a language such as Javascript, functions can actually return values.
foo ()
{
return document.getElementById("dog").getAttribute("cat");
}
and save like this
quux = foo();

Related

How does $RANDOM work in Unix shells? Looks like a variable but it actually assumes different values each time it's called

I recently used the $RANDOM variable and I was truly curious about the under-the-hood implementation of it: the syntax says it's a variable but the behavior says it's like a function as it returns a different value each time it's called.
This is not "in Unix shells"; this is a Bash-specific feature.
It's not hard to guess what's going on under the hood; the shell special-cases this variable so that each attempt to read it instead fetches two bytes from a (pseudo-) random number generator.
To see the definition, look at get_random in variables.c (currently around line 1363).
about the under-the-hood implementation of it
There are some special "dynamic variables" with special semantics - $PWD $HOME $LINENO etc. When bash gets the value of the variable, it executes a special function.
RANDOM "variable" is setup here bash/variables.c and get_random() just sets the value of the variable, taking random from a simple generator implementation in bash/random.c.

write a bash function that mimic 'return' builtin

I'm trying to write a bash function that would do the equivalent of the return builtin, it could be used like this:
f() {
echo a
my_return 15
echo c
return 17
}
It would have to behave exactly as return builtin is expected to work.
The context is that I'm looking in the depth of bash for fun/experiments to mainly see if we can implement some higher language constructs and notions as are exceptions, continuations or similar concepts in bash.
I actually managed to implement the breaking of the instruction flow by using DEBUG trap returning a value of 2 (as explained in extdebug part of the bash manual, this simulates a return in the current function, but doesn't set the return value of it), the only problem is that this return value of 2 is then passed as the return value of the whole function, thus removing me the possibility to set it to an arbitrary value.
Using RETURN trap did not seem to work neither. It seems we can't set the return value of the function in any way. I achieved already pretty impressive flow control feats thanks to DEBUG/RETURN traps, and the goal seems not so far anymore.
So would you know about a way to achieve this implementation of my_return in pure bash, with current implementation ? if not, what minimal modification would you suggest in bash implementation to allow this ? (I'm thinking of patches maybe on trap.c on run_{debug,return}_trap functions, allowing to set the return value of the returning function, or maybe adding a custom bash builtin, as it seems to be quite pluggable here).

How to programmatically unset global bash variables/manage Bash global scope?

Context:
We have several pieces of our infrastructure that are managed by large sets of Bash scripts that interact with each other via the source command, including (and frequently chaining includes) of files that are created compliant with a standard Bash template we use. I know that this is a situation that should probably never have been allowed, but it's what we have.
The template basically looks like this:
set_params() {
#for parameters in a file that need to be accessed by other methods
#in that file and have the same value from initialization of that
#file to its conclusion:
global_param1=value
#left blank for variables that are going to be used by other methods
#in the file, but don't have a static value assigned immediately:
global_param2=
}
main_internals() {
#user-created code goes here.
}
main() {
set_params
#generic setup stuff/traps go here
main_internals arg arg arg
#generic teardown stuff goes here
}
Using this structure, we have files include other files via the source command and then call the included files main methods, which wraps and modularizes most operations well enough.
Problem:
Some of the thorniest problems with this infrastructure arise when a new module is added to the codebase that uses a global variable name that is used somewhere else, unrelatedly, in the same sourced chain/set of files. I.e if file1.sh has a variable called myfile which is uses for certain things, then sources file2.sh, and then does some more stuff with myfile, and the person writing file2.sh doesn't know that (in many cases they can't be expected to--there are a lot of files chained together), they might put a non-local variable called myfile in file2.sh, changing the value in the variable with the same name in file1.sh
Question:
Assuming that global variable name conflicts will arise, and that localing everything can't completely solve the problem, is there some way to programmatically unset all variables that have been set in the global scope during the execution of a particular function or invocations below that? Is there a way to unset them without unsetting other variables with the same names that are held by files that source the script in question?
The answer might very well be "no", but after looking around and not finding much other than "keep track of variable names and unset anything after you're done using it" (which will inevitably lead to a costly mistake), I figured I'd ask.
Phrased another way: is there a way to make/hack something that works like a third scope in Bash? Something between "local to a function" and "visible to everything running in this file and any files sourced by this one"?
The following is untested.
You can save a lot of your variables like this:
unset __var __vars
saveIFS=$IFS
IFS=$'\n'
__vars=($(declare -p))
IFS=$saveIFS
or save them based on a common prefix by changing the next to last line above to:
__vars=($(declare -p "${!foo#}"))
Then you can unset the ones you need to:
unset foo bar baz
or unset them based on a common prefix:
unset "${!foo#}"
To restore the variables:
for __var in "${__vars[#]}"
do
$i
done
Beware that
variables with embedded newlines will do the wrong thing
values with whitespace will do the wrong thing
if the matching prefix parameter expansion returns an empty result, the declare -p command will return all variables.
Another technique that's more selective might be that you know specifically which variables are used in the current function so you can selectively save and restore them:
# save
for var in foo bar baz
do
names+=($var)
values+=("${!var}")
done
# restore
for index in "${!names[#]}"
do
declare "${names[index]}"="${values[index]}"
done
Using variable names instead of "var", "index", "names" and "values" that are unlikely to collide with others. Use export instead of declare inside functions since declare forces variables to be local, but then the variables will be exported which may or may not have undesirable consequences.
Recommendation: replace the mess, use fewer globals or use a different language.
Otherwise, experiment with what I've outlined above and see if you can make any of it work with the code you have.

Any reason NOT to always use keyword arguments?

Before jumping into python, I had started with some Objective-C / Cocoa books. As I recall, most functions required keyword arguments to be explicitly stated. Until recently I forgot all about this, and just used positional arguments in Python. But lately, I've ran into a few bugs which resulted from improper positions - sneaky little things they were.
Got me thinking - generally speaking, unless there is a circumstance that specifically requires non-keyword arguments - is there any good reason NOT to use keyword arguments? Is it considered bad style to always use them, even for simple functions?
I feel like as most of my 50-line programs have been scaling to 500 or more lines regularly, if I just get accustomed to always using keyword arguments, the code will be more easily readable and maintainable as it grows. Any reason this might not be so?
UPDATE:
The general impression I am getting is that its a style preference, with many good arguments that they should generally not be used for very simple arguments, but are otherwise consistent with good style. Before accepting I just want to clarify though - is there any specific non-style problems that arise from this method - for instance, significant performance hits?
There isn't any reason not to use keyword arguments apart from the clarity and readability of the code. The choice of whether to use keywords should be based on whether the keyword adds additional useful information when reading the code or not.
I follow the following general rule:
If it is hard to infer the function (name) of the argument from the function name – pass it by keyword (e.g. I wouldn't want to have text.splitlines(True) in my code).
If it is hard to infer the order of the arguments, for example if you have too many arguments, or when you have independent optional arguments – pass it by keyword (e.g. funkyplot(x, y, None, None, None, None, None, None, 'red') doesn't look particularly nice).
Never pass the first few arguments by keyword if the purpose of the argument is obvious. You see, sin(2*pi) is better than sin(value=2*pi), the same is true for plot(x, y, z).
In most cases, stable mandatory arguments would be positional, and optional arguments would be keyword.
There's also a possible difference in performance, because in every implementation the keyword arguments would be slightly slower, but considering this would be generally a premature optimisation and the results from it wouldn't be significant, I don't think it's crucial for the decision.
UPDATE: Non-stylistical concerns
Keyword arguments can do everything that positional arguments can, and if you're defining a new API there are no technical disadvantages apart from possible performance issues. However, you might have little issues if you're combining your code with existing elements.
Consider the following:
If you make your function take keyword arguments, that becomes part of your interface.
You can't replace your function with another that has a similar signature but a different keyword for the same argument.
You might want to use a decorator or another utility on your function that assumes that your function takes a positional argument. Unbound methods are an example of such utility because they always pass the first argument as positional after reading it as positional, so cls.method(self=cls_instance) doesn't work even if there is an argument self in the definition.
None of these would be a real issue if you design your API well and document the use of keyword arguments, especially if you're not designing something that should be interchangeable with something that already exists.
If your consideration is to improve readability of function calls, why not simply declare functions as normal, e.g.
def test(x, y):
print "x:", x
print "y:", y
And simply call functions by declaring the names explicitly, like so:
test(y=4, x=1)
Which obviously gives you the output:
x: 1
y: 4
or this exercise would be pointless.
This avoids having arguments be optional and needing default values (unless you want them to be, in which case just go ahead with the keyword arguments! :) and gives you all the versatility and improved readability of named arguments that are not limited by order.
Well, there are a few reasons why I would not do that.
If all your arguments are keyword arguments, it increases noise in the code and it might remove clarity about which arguments are required and which ones are optionnal.
Also, if I have to use your code, I might want to kill you !! (Just kidding), but having to type the name of all the parameters everytime... not so fun.
Just to offer a different argument, I think there are some cases in which named parameters might improve readability. For example, imagine a function that creates a user in your system:
create_user("George", "Martin", "g.m#example.com", "payments#example.com", "1", "Radius Circle")
From that definition, it is not at all clear what these values might mean, even though they are all required, however with named parameters it is always obvious:
create_user(
first_name="George",
last_name="Martin",
contact_email="g.m#example.com",
billing_email="payments#example.com",
street_number="1",
street_name="Radius Circle")
I remember reading a very good explanation of "options" in UNIX programs: "Options are meant to be optional, a program should be able to run without any options at all".
The same principle could be applied to keyword arguments in Python.
These kind of arguments should allow a user to "customize" the function call, but a function should be able to be called without any implicit keyword-value argument pairs at all.
Sometimes, things should be simple because they are simple.
If you always enforce you to use keyword arguments on every function call, soon your code will be unreadable.
When Python's built-in compile() and __import__() functions gain keyword argument support, the same argument was made in favor of clarity. There appears to be no significant performance hit, if any.
Now, if you make your functions only accept keyword arguments (as opposed to passing the positional parameters using keywords when calling them, which is allowed), then yes, it'd be annoying.
I don't see the purpose of using keyword arguments when the meaning of the arguments is obvious
Keyword args are good when you have long parameter lists with no well defined order (that you can't easily come up with a clear scheme to remember); however there are many situations where using them is overkill or makes the program less clear.
First, sometimes is much easier to remember the order of keywords than the names of keyword arguments, and specifying the names of arguments could make it less clear. Take randint from scipy.random with the following docstring:
randint(low, high=None, size=None)
Return random integers x such that low <= x < high.
If high is None, then 0 <= x < low.
When wanting to generate a random int from [0,10) its clearer to write randint(10) than randint(low=10) in my view. If you need to generate an array with 100 numbers in [0,10) you can probably remember the argument order and write randint(0, 10, 100). However, you may not remember the variable names (e.g., is the first parameter low, lower, start, min, minimum) and once you have to look up the parameter names, you might as well not use them (as you just looked up the proper order).
Also consider variadic functions (ones with variable number of parameters that are anonymous themselves). E.g., you may want to write something like:
def square_sum(*params):
sq_sum = 0
for p in params:
sq_sum += p*p
return sq_sum
that can be applied a bunch of bare parameters (square_sum(1,2,3,4,5) # gives 55 ). Sure you could have written the function to take an named keyword iterable def square_sum(params): and called it like square_sum([1,2,3,4,5]) but that may be less intuitive, especially when there's no potential confusion about the argument name or its contents.
A mistake I often do is that I forget that positional arguments have to be specified before any keyword arguments, when calling a function. If testing is a function, then:
testing(arg = 20, 56)
gives a SyntaxError message; something like:
SyntaxError: non-keyword arg after keyword arg
It is easy to fix of course, it's just annoying. So in the case of few - lines programs as the ones you mention, I would probably just go with positional arguments after giving nice, descriptive names to the parameters of the function. I don't know if what I mention is that big of a problem though.
One downside I could see is that you'd have to think of a sensible default value for everything, and in many cases there might not be any sensible default value (including None). Then you would feel obliged to write a whole lot of error handling code for the cases where a kwarg that logically should be a positional arg was left unspecified.
Imagine writing stuff like this every time..
def logarithm(x=None):
if x is None:
raise TypeError("You can't do log(None), sorry!")

Django Python: Eval syntax for multiple fields created at runtime

My eval syntax isn't right. Namely, for each category, I'd like to output a ModelChoiceField named category_task, ie. if category were 'fun', then a radio select field
'fun_tasks' would be output.
categories = Category.objects.all()
for category in categories:
eval(category)_tasks = form.ModelChoiceField(
queryset = Task.objects.filter(method__category=category),
widget = RadioSelect
)
“eval is evil.”
OK, it has its uses, but 90% of eval usage (in any language) is misconceived, so if you find yourself writing an eval you should stop and examine what you're doing with extreme suspicion.
eval(category)_tasks = x
If you are doing an assignment, that's a statement rather than an expression, so you'd have to use exec rather than eval:
exec category+'_tasks= x'
However exec is just as evil as eval!
You can write a variable in Python without having to parse/evaluate Python code:
locals()[category+'_tasks']= x
or, if you want to write a global variable instead of one in the current scope, replace locals() with globals().
Although this is better than eval/exec, it is still rather code-smelly. You rarely actually want completely dynamically-named variables; a lookup is usually much cleaner:
catlookup= {}
catlookup[category]= x
although without more context it's difficult to say what's best for your case.

Resources