Local declaration of (built-in) Lua functions to reduce overhead - performance

It is often said that one should re-declare (certain) Lua functions locally, as this reduces the overhead.
But what is the exact rule / principle behind this? How do I know for which functions this should be done and for which it is superfluous? Or should it be done for EVERY function, even your own?
Unfortunately I can't figure it out from the Lua manual.

The principle is that every time you write table.insert for example, the Lua interpreter looks up the "insert" entry in the table called table. Actually, it means _ENV.table.insert - _ENV is where the "global variables" are in Lua 5.2+. Lua 5.1 has something similar but it's not called _ENV. The interpreter looks up the string "table" in _ENV and then looks up the string "insert" in that table. Two table lookups every time you call table.insert, before the function actually gets called.
But if you put it in a local variable then the interpreter gets the function directly from the local variable, which is faster. It still has to look it up, to fill in the local variable.
It is superfluous if you only call the function once within the scope of the local variable, but that is pretty rare. There is no reason to do it for functions which are already declared as local. It also makes the code harder to read, so typically you won't do it except when it actually matters (in code that runs a lot of times).

My favorit tool for speed up things in Lua is to place all the useable stuff for a table in a metatable called: __index
A common example for this is the datatype: string
It has all string functions in his __index metatable as methods.
Therefore you can do things like that directly on a string...
print(('istaqsinaayok'):upper():reverse())
-- Output: KOYAANISQATSI
The Logic above...
The lookup for a method in a string fails directly and therefore the __index metamethod will be looked up for that method.
I like to implement same behaviour for the datatype number...
-- do debug.setmetatable() only once for all further defined/used numbers
math.pi = debug.setmetatable(math.pi, {__index = math})
-- From now numbers are objects ;-)
-- Lets output Pi but not using Pi this time
print((180):rad()) -- Pi calcing with method rad()
-- Output: 3.1415926535898
The Logic: If not exists then lookup __index
Is only one step behind: local
...imho.
Another Example, that works with this method...
-- koysenv.lua
_G = setmetatable(_G,
{ -- Metamethods
__index = {}, -- Table constructor
__name = 'Global Environment'
})
-- Reference whats in _G into __index
for key, value in pairs(_G) do
getmetatable(_G)['__index'][key] = value
end
-- Remove all whats in __index now from _G
for key, value in pairs(getmetatable(_G)['__index']) do
_G[key] = nil
end
return _G
When started as a last require it move all in _G into fresh created metatable method __index.
After that _G looks totally empty ;-P
...but the environment is working like nothing happen.

To add to what #user253751 already said:
Code Quality
Lua is a very flexible language. Other languages require you to import the parts of the standard library you use; Lua doesn't. Lua usually provides one global environment not to be polluted. If you play with the environment _ENV (setfenv/getfenv on Lua 5.1 / LuaJIT), you'll want to be able to still access Lua libraries. For that purpose you may to localize them before changing the environment; you can then use your "clean" environment for your module / API table / class / whatever. Another option here is to use metatables; metatable chains may quickly get hairy though and are likely to harm performance, as a failed table lookup is required each time to trigger indexing metamethods. localizing otherwise global variables can thus be seen as a way of importing them; to give a minimal & rough example:
local print = print -- localize ("import") everything we need first
_ENV = {} -- set environment to clean table for module
function hello() -- this writes to _ENV instead of _G
print("Hello World!")
end
hello() -- inside the environment, all variables set here are accessible
return _ENV -- "export" the API table
Performance
Very minor nitpick: Local variables aren't strictly always faster. In very extreme cases (i.e. lots of upvalues), indexing a table (which doesn't need an upvalue if it's the environment, the string metatable or the like) may actually be faster.
I imagine that localizing variables is required for many optimizations of optimizing compilers such as LuaJIT to be applicable though; otherwise Lua makes very little code. A global like print might as well be overwritten somewhere in a deep code path - thus the indexing operation has to be repeated every time; for a local on the other hand, the interpreter will has way more guarantees regarding its scope. It is thus able to detect constants that are only written to once, on initialization for instance; for globals very little code analysis is possible.

Related

Is there a way to use a dynamic function name in Elixir from string interpolation like in Ruby?

I want to be able to construct a function call from a string in elixir. Is this possible? The equivalent ruby method call would be:
"uppercase".send("u#{:pcase}")
Although the answer by #fhdhsni is perfectly correct, I’d add some nitpicking clarification.
The exact equivalent of Kernel#send from ruby in elixir is impossible, because Kernel#send allows to call private methods on the receiver. In elixir, private functions do not ever exist in the compiled code.
If you meant Kernel#public_send, it might be achieved with Kernel.apply/3, as mentioned by #fhdhsni. The only correction is since the atom table is not garbage collected, and one surely wants to call an indeed existing function, it should be done with String.to_existing_atom/1.
apply(
String,
String.to_existing_atom("u#{:pcase}"),
["uppercase"]
)
Also, one might use macros during the compilation stage to generate respective clauses when the list of functions to call is predictable (when it’s not, the code already smells.)
defmodule Helper do
Enum.each(~w|upcase|a, fn fname ->
def unquote(fname)(param),
do: String.unquote(fname)(param)
# or
# defdelegate unquote(fname)(param), to: String
end)
end
Helper.upcase("uppercase")
#⇒ "UPPERCASE"
In Elixir module and function names are atoms. You can use apply to call them dynamically.
apply(String, String.to_atom("u#{:pcase}"), ["uppercase"]) # "UPPERCASE"
Depending on your use case it might not be a good idea to create atoms dynamically (since the atom table is not garbage collected).

Perl, Alias sub to variable

I'm currently doing micro-optimization of a perl program and like to optimize some getters.
I have a package with this getter-structure:
package Test;
our $ABC;
sub GetABC { $ABC } # exported sub...
Calling GetABC() creates a lot of sub-related overhead. Accessing the variable directly via $Test::ABC is insanely faster.
Is there a way to alias the getter to the variable to gain the same performanceboost as if I would call the variable directly? Inlining hint with "()" doesn seem to work...
There is no way to turn a variable into an accessor sub, or to replace a sub with a variable access. You will have to live with the overhead.
Non-solutions:
Using a () prototype does not turn calls into your sub to constant accesses because that prototype merely makes a sub potentially eligible for inlining. Since the body of the sub is not itself constant, this sub cannot be a constant.
The overhead is per-call as perl has to do considerable bookkeeping for each call. Therefore, rewriting that accessor in XS won't help much.
Creating a constant won't help because the constant will be a copy, not an alias of your variable.
But looking at the constant.pm source code seems to open up an interesting solution. Note that this a hack, and may not work in all versions of Perl: When we assign a scalar ref to a symbol table entry directly where that entry does not yet contain a typeglob, then an inlineable sub springs into place:
package Foo;
use strict;
use warnings;
use feature 'say';
my $x = 7;
BEGIN { $Foo::{GetX} = \$x } # don't try this at home
say GetX; #=> 7
$x = 3;
say GetX; #=> 3
This currently works on most of my installed perl versions (5.14, 5.22, 5.24, 5.26). However, my 5.22-multi and 5.26-multi die with “Modification of a read-only value attempted”. This is not a problem for the constant module since it makes the reference target readonly first and (more importantly) never modifies that variable.
So not only doesn't this work reliably, this will also completely mess up constant folding.
If the function call overhead is indeed unbearable (e.g. takes a double-digit percentage of your processing time), then doing the inlining yourself in the source code is going to be your best bet. Even if you have a lot of call locations, you can probably create a simple script that fixes the easy cases for you: select all files that import your module and only have a single package declaration. Within such files, replace calls to GetABC (with or without parens) to fully qualified variable accesses. Hopefully that token is not mentioned within any strings. Afterwards, you can manually inspect the few remaining occurrences of these calls.

excel performance using Range

I am new to VBA and I'm now working on a project where speed is absolutely everything. So as I'm writing the code, I noticed a lot of the cells in the sheet are named ranges and are referenced in the functions explicitly like this:
function a()
if range("x") > range("y") then
end if
... (just imagine a lot of named ranges)
end function
My question is, should i modify these functions so that the values in these named ranges are passed in as parameters like this:
'i can pass in the correct cells when i call the function
function a(x as int, y as int)
if x > y then
end if
...
end function
Will that speed things up a little bit? These functions are called almost constantly (except when the process is put to sleep on purpose) to communicate with a RTD server.
VBA is much slower at making "connections" to your worksheet than it is at dealing with its own variables. If your function refers to the same cell (or range) more than once then it would be advantageous to load those into memory before VBA interacts with them. For example if range("x")>range("y") is the only time in the function that either x or y are referred to then it won't matter. If you have if range("x")>range("a") and if range("x")>range("b") and so on then you'd be much better off starting your function with
varX=range("x")
varY=range("y")
and then working with the VBA variables.
It might seem that by parameterizing the function as your second example shows accomplishes my recommendation. This may or may not be the case because Excel might just treat those variables as references to the worksheet and not as values (I'm not sure). Just to be safe you should specifically define new variables at the beginning of your function and then only refer to those variables in the rest of your function.
To sum up the above wall of text, your goal should be to minimize the number of times VBA "connects" to the worksheet.

Caching of data in Mathematica

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:
myDataset = Module[{fname, data},
fname = "cached-data.mx";
If[FileExistsQ[fname],
Get[fname],
data = Evaluate[timeConsumingOperation[]];
Put[data, fname];
data]
];
timeConsumingOperation[]:=Module[{},
(* lot of work here *)
{"data"}
];
However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)
How do you cache your data?
Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:
f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]
Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:
f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])
Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.
#rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:
g := g = someExpensiveThing[...]
There's no harm in using them, though.
In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,
f[x_Integers]:= x
which won't match anything. Instead, I meant
f[x_Integer]:=x
In your case, though, you have no pattern to match: timeConsumingOperation[].
You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.
Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.
All told, your caching methodology looks exactly how I would go about it.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Resources