Lua optimize memory - performance

I what to optimize my code. I have 3 option don't know which is better for memory in Lua:
1)
local Test = {}
Test.var1 = function ()
-- Code
end
Test.var2 = function ()
-- Code
end
2) Or
function var1()
-- Code
end
function var2()
-- Code
end
3) Or maybe
local var1 = function ()
-- Code
end
local var2 = function ()
-- Code
end

Quoting from Lua Programming Gem, the two maxims of program optimization:
Rule #1: Don’t do it.
Rule #2: Don’t do it yet. (for experts only)
Back to your examples, the second piece of code is a little bit worse as the access to global ones is slower. But the performance difference is hardly noticeable.
It depends on your needs, the first one uses an extra table than the third one, but the namespace is cleaner.

None will really affect memory, barring the use of a table in #1 (so some 40 bytes + some per entry).
If its performance you want, then option #3 is far better, assuming you can access said functions at the local scope.

If it's about memory usage more than processing and you're using object-oriented programming where you're instantiating multiple instances of Test as you showed above, you have a fourth option with metatables.
TestMt = {}
TestMt.func1 = function(self, ...)
...
end
TestMt.func2 = function(self, ...)
...
end
TestMt.func3 = function(self, ...)
...
end
function new_test()
local t = {}
t.data = ...
setmetatable(t, {__index = TestMt})
return t
end
foo = new_test()
foo:func1()
foo:func2()
foo:func3()
If you're doing object-oriented kind of programming, metatables can lead to a massive savings in memory (I accidentally used over 1 gigabyte once for numerous mathematical vectors this way, only to reduce it down to 40 megabytes by using the metatable).
If it's not about objects and tables that get instantiated many times, and just about organizing your globally-accessible functions, worrying about memory here is ridiculous. It's like putting the entirety of your lua code into one file in order to reduce file system overhead. You're talking about such negligible savings that you should really need an extraordinary use case backed by meticulous measurements to even concern yourself with that.
If it's about processing, then you can get some small improvements by keeping your global functions out of nested tables, and by favoring locals when possible.

Related

Local declaration of (built-in) Lua functions to reduce overhead

It is often said that one should re-declare (certain) Lua functions locally, as this reduces the overhead.
But what is the exact rule / principle behind this? How do I know for which functions this should be done and for which it is superfluous? Or should it be done for EVERY function, even your own?
Unfortunately I can't figure it out from the Lua manual.
The principle is that every time you write table.insert for example, the Lua interpreter looks up the "insert" entry in the table called table. Actually, it means _ENV.table.insert - _ENV is where the "global variables" are in Lua 5.2+. Lua 5.1 has something similar but it's not called _ENV. The interpreter looks up the string "table" in _ENV and then looks up the string "insert" in that table. Two table lookups every time you call table.insert, before the function actually gets called.
But if you put it in a local variable then the interpreter gets the function directly from the local variable, which is faster. It still has to look it up, to fill in the local variable.
It is superfluous if you only call the function once within the scope of the local variable, but that is pretty rare. There is no reason to do it for functions which are already declared as local. It also makes the code harder to read, so typically you won't do it except when it actually matters (in code that runs a lot of times).
My favorit tool for speed up things in Lua is to place all the useable stuff for a table in a metatable called: __index
A common example for this is the datatype: string
It has all string functions in his __index metatable as methods.
Therefore you can do things like that directly on a string...
print(('istaqsinaayok'):upper():reverse())
-- Output: KOYAANISQATSI
The Logic above...
The lookup for a method in a string fails directly and therefore the __index metamethod will be looked up for that method.
I like to implement same behaviour for the datatype number...
-- do debug.setmetatable() only once for all further defined/used numbers
math.pi = debug.setmetatable(math.pi, {__index = math})
-- From now numbers are objects ;-)
-- Lets output Pi but not using Pi this time
print((180):rad()) -- Pi calcing with method rad()
-- Output: 3.1415926535898
The Logic: If not exists then lookup __index
Is only one step behind: local
...imho.
Another Example, that works with this method...
-- koysenv.lua
_G = setmetatable(_G,
{ -- Metamethods
__index = {}, -- Table constructor
__name = 'Global Environment'
})
-- Reference whats in _G into __index
for key, value in pairs(_G) do
getmetatable(_G)['__index'][key] = value
end
-- Remove all whats in __index now from _G
for key, value in pairs(getmetatable(_G)['__index']) do
_G[key] = nil
end
return _G
When started as a last require it move all in _G into fresh created metatable method __index.
After that _G looks totally empty ;-P
...but the environment is working like nothing happen.
To add to what #user253751 already said:
Code Quality
Lua is a very flexible language. Other languages require you to import the parts of the standard library you use; Lua doesn't. Lua usually provides one global environment not to be polluted. If you play with the environment _ENV (setfenv/getfenv on Lua 5.1 / LuaJIT), you'll want to be able to still access Lua libraries. For that purpose you may to localize them before changing the environment; you can then use your "clean" environment for your module / API table / class / whatever. Another option here is to use metatables; metatable chains may quickly get hairy though and are likely to harm performance, as a failed table lookup is required each time to trigger indexing metamethods. localizing otherwise global variables can thus be seen as a way of importing them; to give a minimal & rough example:
local print = print -- localize ("import") everything we need first
_ENV = {} -- set environment to clean table for module
function hello() -- this writes to _ENV instead of _G
print("Hello World!")
end
hello() -- inside the environment, all variables set here are accessible
return _ENV -- "export" the API table
Performance
Very minor nitpick: Local variables aren't strictly always faster. In very extreme cases (i.e. lots of upvalues), indexing a table (which doesn't need an upvalue if it's the environment, the string metatable or the like) may actually be faster.
I imagine that localizing variables is required for many optimizations of optimizing compilers such as LuaJIT to be applicable though; otherwise Lua makes very little code. A global like print might as well be overwritten somewhere in a deep code path - thus the indexing operation has to be repeated every time; for a local on the other hand, the interpreter will has way more guarantees regarding its scope. It is thus able to detect constants that are only written to once, on initialization for instance; for globals very little code analysis is possible.

Should we pass a variable that can be acquired in O(1) into nested functions, or recall that function within every nested function?

The example below explains the question best. Which is standard: (A) or (B)?
A) Passing Variable in Each Nested Function:
def bar(X, y, n_rows, n_cols):
# Do stuff....
return stuff
def foo(X, y, n_rows, n_cols):
stuff = bar(X, y, n_rows, n_cols)
# Do stuff...
return stuff
B) Recalling O(1) Function within each Nested Function:
def bar(X, y):
n_rows = get_number_of_rows(X) # X.shape[0]
n_cols = get_number_of_cols(X) # X.shape[1]
# Do stuff....
return stuff
def foo(X, y):
n_rows = get_number_of_rows(X) # X.shape[0]
n_cols = get_number_of_cols(X) # X.shape[1]
stuff = bar(X, y)
# Do stuff...
return stuff
It seems kind of verbose to pass the number of rows and columns into nested functions, but on the other hand it seems inefficient to keep setting variables. I am not even too sure if the number of rows and columns can be acquired in constant time O(1). I wrote this in Python (using R in reality), but my goal was to frame this question generically enough that it could be applied to any language.
Helpful Links:
Are there guidelines on how many parameters a function should accept? (Suggests making this an object to avoid multiple parameters)
Questions:
Which is better for readability?
Which is better for performance?
Should we code in a way that this situation never arrises?
Could the answer for (1) and (2) be different for different languages?
P.S. Need a little help with the naming of this question – less verbose more generic.
I think that
B is preferred because it avoids some problems with changing the number or names of parameters. Option A has a code smell data clumps, and B is his solution called preserve whole object.
Also both approaches have a trump data smell. Which can be solved by making the OO re-design and making the passed variables members of the class. Or the introduction of a global or context variable (but this is quite rare)
If there are no unwanted side effects or copying large objects by value, then performance can be neglected. And I would advise you to adhere to the following rule when developing: Don't do premature optimization.
The word "never" does not apply in software design. It all depends on the context.
I think so, and not only for different languages, but also on the problem being solved within the framework of one language, the tools used, libraries, frameworks, etc.

Updating vector to avoid excess memory use

I have a function that returns an vector. Since I call this function many times, I want it to update a vector I provide to it rather than create a new vector. This is to avoid use of memory and so increase speed.
The original code essentially looks like:
function!(prob1,pi,prob0)
prob1=pi'*prob0
return prob1
end
Of course this creates a new prob1 vector each time. I've attempted to amend this in two different ways:
function!(prob1,pi,prob0)
for i in 1:length(prob1)
prob1[i]=pi[:,i]'*prob0
end
return prob1
end
#OR
function!(prob1,pi,prob0)
for i in 1:length(prob1)
prob1[i]=dot(pi[:,i],prob0)
end
return prob1
end
However, both run slower than the original code although they do use less memory. Any suggestions for improving performance time would be great.
You actually don't need to define a function, there already is one (albeit undocumented): At_mul_B!(prob1,pi,prob0) should give you what you want.

try catch or type conversion performance in julia - (Julia 73 seconds, Python 0.5 seconds)

I have been playing with Julia because it seems syntactically similar to python (which I like) but claims to be faster. However, I tried making a similar script to something I have in python for tesing where numerical values are within a text file which uses this function:
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
For some reason, this takes a great deal of time for a text file with a reasonable amount of rows of text (~500000).
Why would this be? Is there a better way to do this? What general feature of the language can I understand from this to apply to other languages?
Here are the two exact scripts i ran with the times for reference:
python: ~0.5 seconds
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
start = time.time()
file_data = open('SMW100.asc').readlines()
file_data = map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data)
bools = [(all(map(is_number, x)), x) for x in file_data]
print time.time() - start
julia: ~73.5 seconds
start = time()
function isFloat(s)
try:
float64(s)
return true
catch:
return false
end
end
x = map(x-> split(replace(x, ",", " ")), open(readlines, "SMW100.asc"))
u = [(all(map(isFloat, i)), i) for i in x]
print(start - time())
Note also that you can use the float64_isvalid function in the standard library to (a) check whether a string is a valid floating-point value and (b) return the value.
Note also that the colons (:) after try and catch in your isFloat code are wrong in Julia (this is a Pythonism).
A much faster version of your code should be:
const isFloat2_out = [1.0]
isFloat2(s::String) = float64_isvalid(s, isFloat2_out)
function foo(L)
x = split(L, ",")
(all(isFloat2, x), x)
end
u = map(foo, open(readlines, "SMW100.asc"))
On my machine, for a sample file with 100,000 rows and 10 columns of data, 50% of which are valid numbers, your Python code takes 4.21 seconds and my Julia code takes 2.45 seconds.
This is an interesting performance problem that might be worth submitting to julia-users to get more focused feedback than SO will probably provide. At a first glance, I think you're hitting problems because (1) try/catch is just slightly slow to begin with and then (2) you're using try/catch in a context where there's a very considerable amount of type uncertainty because of lots of function calls that don't return stable types. As a result, the Julia interpreter spend its time trying to figure out the types of objects rather than doing your computation. It's a bit hard to tell exactly where the big bottlenecks are because you're doing a lot of things that are not very idiomatic in Julia. Also you seem to be doing your computations in the global scope, where Julia's compiler can't perform many meaningful optimizations due to additional type uncertainty.
Python is oddly ambiguous on the subject of whether using exceptions for control flow is good or bad. See Python using exceptions for control flow considered bad?. But even in Python, the consensus is that user code shouldn't use exceptions for control flow (although for some reason generators are allowed to do this). So basically, the simple answer is that you should not be doing that – exceptions are for exceptional situations, not for control flow. That is why almost zero effort has been put into making Julia's try/catch construct faster – you shouldn't be using it like that in the first place. Of course, we will probably get around to making it faster at some point.
That said, the onus is on us as the designers of Julia's standard library to make sure that we provide APIs that never force you to use exceptions for control flow. In this case, you need a function that allows you to try to parse something as a floating-point value and indicate whether that was possible or not – not by throwing an exception, but rather by returning normal values. We don't provide such an API, so this ultimately a shortcoming of Julia's standard library – as it exists right now. I've opened an issue to discuss this API design question: https://github.com/JuliaLang/julia/issues/5704. We'll see how it pans out.

improving performance of matlab code with anonymous-function bottlenecks

I'm running into serious performance issues with anonymous functions in matlab 2011a, where the overhead introduced by an anonymous container function is far greater than the time taken by the enclosed function itself.
I've read a couple of related questions in which users have helpfully explained that this is a problem that others experience, showing that I could increase performance dramatically by doing away with the anonymous containers. Unfortunately, my code is structured in such a way that I'm not sure how to do that without breaking a lot of things.
So, are there workarounds to improve performance of anonymous functions without doing away with them entirely, or design patterns that would allow me to do away with them without bloating my code and spending a lot of time refactoring?
Some details that might help:
Below is the collection of anonymous functions, which are stored as a class property. Using an int array which is in turn used by a switch statement could replace the array in principle, but the content of GPs is subject to change -- there are other functions with the same argument structure as traingps that could be used there -- and GPs' contents may in some cases be determined at runtime.
m3.GPs = {#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,1,params,[1 0]');
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,1,params,[-1 1]');
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,2,params,0);
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,3,params,0);
#(X,ytrain,xStar,noisevar,params)traingp(X,ytrain,xStar,noisevar,4,params,[0 0 0]')};
Later, elements of GPs are called by a member function of the class, like so:
GPt = GPs{t(j)}(xj,yj,gridX(xi),thetaT(1),thetaT(2:end));
According to the profiler, the self-time for the anonymous wrapper takes 95% of the total time (1.7 seconds for 44 calls!), versus 5% for the contained function. I'm using a similar approach elsewhere, where the anonymous wrapper's cost is even greater, proportionally speaking.
Does anyone have any thoughts on how to reduce the overhead of the anonymous calls, or, absent that, how to replace the anonymous function while retaining the flexibility they provide (and not introducing a bunch of additional bookkeeping and argument passing)?
Thanks!
It all comes down to how much pain are you willing to endure to improve performance. Here's one trick that avoids anonymous functions. I don't know how it will profile for you. You can put these "tiny" functions at end of class files I believe (I know you can put them at the end of regular function files.)
function [output] = GP1(x,ytrain,xstar,noisevar,params)
output = traingp(X,ytrain,xStar,noisevar,1,params,[1 0]);
end
...
m3.GPS = {#GP1, #GP2, ...};
Perhaps a function "factory" would help:
>> factory = #(a,b,c) #(x,y,z) a*x+b*y+c*z;
>> f1 = factory(1,2,3);
>> f2 = factory(0,1,2);
>> f1(1,2,3)
ans =
14
>> f1(4,5,6)
ans =
32
>> f2(1,2,3)
ans =
8
>> f2(4,5,6)
ans =
17
Here, factory is a function that return a new function with different arguments. Another example could be:
factory = #(a,b,c) #(x,y,z) some_function(x,y,z,a,b,c)
which returns a function of x,y,z with a,b,c specified.

Resources