How to programmatically unset global bash variables/manage Bash global scope? - bash

Context:
We have several pieces of our infrastructure that are managed by large sets of Bash scripts that interact with each other via the source command, including (and frequently chaining includes) of files that are created compliant with a standard Bash template we use. I know that this is a situation that should probably never have been allowed, but it's what we have.
The template basically looks like this:
set_params() {
#for parameters in a file that need to be accessed by other methods
#in that file and have the same value from initialization of that
#file to its conclusion:
global_param1=value
#left blank for variables that are going to be used by other methods
#in the file, but don't have a static value assigned immediately:
global_param2=
}
main_internals() {
#user-created code goes here.
}
main() {
set_params
#generic setup stuff/traps go here
main_internals arg arg arg
#generic teardown stuff goes here
}
Using this structure, we have files include other files via the source command and then call the included files main methods, which wraps and modularizes most operations well enough.
Problem:
Some of the thorniest problems with this infrastructure arise when a new module is added to the codebase that uses a global variable name that is used somewhere else, unrelatedly, in the same sourced chain/set of files. I.e if file1.sh has a variable called myfile which is uses for certain things, then sources file2.sh, and then does some more stuff with myfile, and the person writing file2.sh doesn't know that (in many cases they can't be expected to--there are a lot of files chained together), they might put a non-local variable called myfile in file2.sh, changing the value in the variable with the same name in file1.sh
Question:
Assuming that global variable name conflicts will arise, and that localing everything can't completely solve the problem, is there some way to programmatically unset all variables that have been set in the global scope during the execution of a particular function or invocations below that? Is there a way to unset them without unsetting other variables with the same names that are held by files that source the script in question?
The answer might very well be "no", but after looking around and not finding much other than "keep track of variable names and unset anything after you're done using it" (which will inevitably lead to a costly mistake), I figured I'd ask.
Phrased another way: is there a way to make/hack something that works like a third scope in Bash? Something between "local to a function" and "visible to everything running in this file and any files sourced by this one"?

The following is untested.
You can save a lot of your variables like this:
unset __var __vars
saveIFS=$IFS
IFS=$'\n'
__vars=($(declare -p))
IFS=$saveIFS
or save them based on a common prefix by changing the next to last line above to:
__vars=($(declare -p "${!foo#}"))
Then you can unset the ones you need to:
unset foo bar baz
or unset them based on a common prefix:
unset "${!foo#}"
To restore the variables:
for __var in "${__vars[#]}"
do
$i
done
Beware that
variables with embedded newlines will do the wrong thing
values with whitespace will do the wrong thing
if the matching prefix parameter expansion returns an empty result, the declare -p command will return all variables.
Another technique that's more selective might be that you know specifically which variables are used in the current function so you can selectively save and restore them:
# save
for var in foo bar baz
do
names+=($var)
values+=("${!var}")
done
# restore
for index in "${!names[#]}"
do
declare "${names[index]}"="${values[index]}"
done
Using variable names instead of "var", "index", "names" and "values" that are unlikely to collide with others. Use export instead of declare inside functions since declare forces variables to be local, but then the variables will be exported which may or may not have undesirable consequences.
Recommendation: replace the mess, use fewer globals or use a different language.
Otherwise, experiment with what I've outlined above and see if you can make any of it work with the code you have.

Related

Local declaration of (built-in) Lua functions to reduce overhead

It is often said that one should re-declare (certain) Lua functions locally, as this reduces the overhead.
But what is the exact rule / principle behind this? How do I know for which functions this should be done and for which it is superfluous? Or should it be done for EVERY function, even your own?
Unfortunately I can't figure it out from the Lua manual.
The principle is that every time you write table.insert for example, the Lua interpreter looks up the "insert" entry in the table called table. Actually, it means _ENV.table.insert - _ENV is where the "global variables" are in Lua 5.2+. Lua 5.1 has something similar but it's not called _ENV. The interpreter looks up the string "table" in _ENV and then looks up the string "insert" in that table. Two table lookups every time you call table.insert, before the function actually gets called.
But if you put it in a local variable then the interpreter gets the function directly from the local variable, which is faster. It still has to look it up, to fill in the local variable.
It is superfluous if you only call the function once within the scope of the local variable, but that is pretty rare. There is no reason to do it for functions which are already declared as local. It also makes the code harder to read, so typically you won't do it except when it actually matters (in code that runs a lot of times).
My favorit tool for speed up things in Lua is to place all the useable stuff for a table in a metatable called: __index
A common example for this is the datatype: string
It has all string functions in his __index metatable as methods.
Therefore you can do things like that directly on a string...
print(('istaqsinaayok'):upper():reverse())
-- Output: KOYAANISQATSI
The Logic above...
The lookup for a method in a string fails directly and therefore the __index metamethod will be looked up for that method.
I like to implement same behaviour for the datatype number...
-- do debug.setmetatable() only once for all further defined/used numbers
math.pi = debug.setmetatable(math.pi, {__index = math})
-- From now numbers are objects ;-)
-- Lets output Pi but not using Pi this time
print((180):rad()) -- Pi calcing with method rad()
-- Output: 3.1415926535898
The Logic: If not exists then lookup __index
Is only one step behind: local
...imho.
Another Example, that works with this method...
-- koysenv.lua
_G = setmetatable(_G,
{ -- Metamethods
__index = {}, -- Table constructor
__name = 'Global Environment'
})
-- Reference whats in _G into __index
for key, value in pairs(_G) do
getmetatable(_G)['__index'][key] = value
end
-- Remove all whats in __index now from _G
for key, value in pairs(getmetatable(_G)['__index']) do
_G[key] = nil
end
return _G
When started as a last require it move all in _G into fresh created metatable method __index.
After that _G looks totally empty ;-P
...but the environment is working like nothing happen.
To add to what #user253751 already said:
Code Quality
Lua is a very flexible language. Other languages require you to import the parts of the standard library you use; Lua doesn't. Lua usually provides one global environment not to be polluted. If you play with the environment _ENV (setfenv/getfenv on Lua 5.1 / LuaJIT), you'll want to be able to still access Lua libraries. For that purpose you may to localize them before changing the environment; you can then use your "clean" environment for your module / API table / class / whatever. Another option here is to use metatables; metatable chains may quickly get hairy though and are likely to harm performance, as a failed table lookup is required each time to trigger indexing metamethods. localizing otherwise global variables can thus be seen as a way of importing them; to give a minimal & rough example:
local print = print -- localize ("import") everything we need first
_ENV = {} -- set environment to clean table for module
function hello() -- this writes to _ENV instead of _G
print("Hello World!")
end
hello() -- inside the environment, all variables set here are accessible
return _ENV -- "export" the API table
Performance
Very minor nitpick: Local variables aren't strictly always faster. In very extreme cases (i.e. lots of upvalues), indexing a table (which doesn't need an upvalue if it's the environment, the string metatable or the like) may actually be faster.
I imagine that localizing variables is required for many optimizations of optimizing compilers such as LuaJIT to be applicable though; otherwise Lua makes very little code. A global like print might as well be overwritten somewhere in a deep code path - thus the indexing operation has to be repeated every time; for a local on the other hand, the interpreter will has way more guarantees regarding its scope. It is thus able to detect constants that are only written to once, on initialization for instance; for globals very little code analysis is possible.

How does $RANDOM work in Unix shells? Looks like a variable but it actually assumes different values each time it's called

I recently used the $RANDOM variable and I was truly curious about the under-the-hood implementation of it: the syntax says it's a variable but the behavior says it's like a function as it returns a different value each time it's called.
This is not "in Unix shells"; this is a Bash-specific feature.
It's not hard to guess what's going on under the hood; the shell special-cases this variable so that each attempt to read it instead fetches two bytes from a (pseudo-) random number generator.
To see the definition, look at get_random in variables.c (currently around line 1363).
about the under-the-hood implementation of it
There are some special "dynamic variables" with special semantics - $PWD $HOME $LINENO etc. When bash gets the value of the variable, it executes a special function.
RANDOM "variable" is setup here bash/variables.c and get_random() just sets the value of the variable, taking random from a simple generator implementation in bash/random.c.

Perl, Alias sub to variable

I'm currently doing micro-optimization of a perl program and like to optimize some getters.
I have a package with this getter-structure:
package Test;
our $ABC;
sub GetABC { $ABC } # exported sub...
Calling GetABC() creates a lot of sub-related overhead. Accessing the variable directly via $Test::ABC is insanely faster.
Is there a way to alias the getter to the variable to gain the same performanceboost as if I would call the variable directly? Inlining hint with "()" doesn seem to work...
There is no way to turn a variable into an accessor sub, or to replace a sub with a variable access. You will have to live with the overhead.
Non-solutions:
Using a () prototype does not turn calls into your sub to constant accesses because that prototype merely makes a sub potentially eligible for inlining. Since the body of the sub is not itself constant, this sub cannot be a constant.
The overhead is per-call as perl has to do considerable bookkeeping for each call. Therefore, rewriting that accessor in XS won't help much.
Creating a constant won't help because the constant will be a copy, not an alias of your variable.
But looking at the constant.pm source code seems to open up an interesting solution. Note that this a hack, and may not work in all versions of Perl: When we assign a scalar ref to a symbol table entry directly where that entry does not yet contain a typeglob, then an inlineable sub springs into place:
package Foo;
use strict;
use warnings;
use feature 'say';
my $x = 7;
BEGIN { $Foo::{GetX} = \$x } # don't try this at home
say GetX; #=> 7
$x = 3;
say GetX; #=> 3
This currently works on most of my installed perl versions (5.14, 5.22, 5.24, 5.26). However, my 5.22-multi and 5.26-multi die with “Modification of a read-only value attempted”. This is not a problem for the constant module since it makes the reference target readonly first and (more importantly) never modifies that variable.
So not only doesn't this work reliably, this will also completely mess up constant folding.
If the function call overhead is indeed unbearable (e.g. takes a double-digit percentage of your processing time), then doing the inlining yourself in the source code is going to be your best bet. Even if you have a lot of call locations, you can probably create a simple script that fixes the easy cases for you: select all files that import your module and only have a single package declaration. Within such files, replace calls to GetABC (with or without parens) to fully qualified variable accesses. Hopefully that token is not mentioned within any strings. Afterwards, you can manually inspect the few remaining occurrences of these calls.

Why are bash variables 'different'?

Is there some reason why bash 'variables' are different from variables in other 'normal' programming languages?
Is it due to the fact that they are set by the output of previous programs or have to be set by some kind of literal text, ie they have to be set by the output of some program or something outputting text through standard input/output or the console or such like?
I am at a loss to use the right vocabulary, but can anyone who can understands what I trying to say and perhaps use the right words or point me some docs where I can understand bash variable concepts better.
In most languages, variables can contain different kinds of values. For example, in Python a variable can be a number that you can do arithmetics on (a-1), an array or string you can split (a[3:]), or a custom, nested object (person.name.first_name).
In bash, you can't do any of this directly. If I understood you right, you asked why this is.
There are two reasons why you can't really do the same in bash.
One: environment variables are (conventionally) simple key=value strings, and the original sh was a pretty thin wrapper on top of the Unix process model. Bash works the same, for technical and compatibility reasons. Since all variables are (based on) strings, you can't really have rich, nested types.
This also means that you can't set a variable in a subshell/subscript you call. The variable won't be set in the parent script, because that's not how environment variables work.
Two: Original sh didn't separate code and data, since this makes it easier to work with interactively. Sh treated all non-special characters as literal. I.e. find / -name foo was considered four literal strings: a command and three arguments.
Bash can't just decide that find / -name now means "the value of the variable find divided by the negated value of variable name", since that would mean everyone's find commands would start breaking. This is why you can't have the simple dereferencing syntax other languages do.
Even $name-1 can't be used to substract, because it could just as easily be intended as part of $name-1-12-2012.tar.gz, a filename with a timestamp.
I would say it has to do with Bash functions. Bash functions cannot return a value, only a status code.
So with Bash you can have a function
foo ()
{
grep bar baz
}
But if you try to "save" the return value of the function
quux=$?
It is merely saving the exit status, not any value. Contrast this with a language such as Javascript, functions can actually return values.
foo ()
{
return document.getElementById("dog").getAttribute("cat");
}
and save like this
quux = foo();

How do I use a predefined variable/constant in my Cucumber testing Scenario

I have defined a variable as userid in env.rb.
userid='1234'
In my Cucumber testing, Scenario, I wish to confirm that my response contains the correct userid. However, I do not wish to hard code the same in my Senario or step definition. Is it possible to do so?
I would place an additional file, let's say test_constanst.rb in the features/ dir. There, I would define a module like this:
module TestConstants
def self.user_id
1234
end
end
Like this, you have it separated from test configuration and code. You would just have to requrire the file from env.rb.
Variable scope in Ruby is controlled by sigils to some degree. Variables starting with $ are global, variables with # are instance variables, ## means class variables, and names starting with a capital letter are constants. Make the variable global and it will be available everywhere, i.e
$userid='1234'

Resources