For loops run SUPER slow in Julia when outside a function - for-loop

There is a very peculiar slow down in Julia. When running, for example, a for loop by calling a function
function TestFunc(num)
for i=1:num
end
end
It is MUCH faster than when I just run a for loop for the exact same num ...
for i=1:num
end
The slow down isn't marginal either, it is magnitudes slower, the following image shows me running it.
For Loop Code
In some of my other code, the opposite actually happens but I just feel like I am missing something fundamental about the way Julia runs. How do I keep my code optimal and why do these differences exist?

Anything you can write outside a function, you can write inside a function. So just like in C, you can write
function main()
print("Hello World\n")
end
main()
So just pretend it is a C program and write your stuff inside the main() function.
Why is it so slow outside a function, it is because any variable inside a function is protected from being changed by another thread or task. So for a for loop in the global scope must check its variables for its type everytime it is access by the for loop, just in case it was change by another thread or task. All these checking is slowing it down FOR SAFETY.
The first Performance Law of Julia is
Global is slow
The performance tips in the Julia Documentation says
A global variable might have its value, and therefore its type, change at any point. This makes it difficult for the compiler to optimize code using global variables. Variables should be local, or passed as arguments to functions, whenever possible.
Any code that is performance critical or being benchmarked should be inside a function.

Related

"defer" for whole package in GoLang [duplicate]

I know you can define functions called init in any package, and these function will be executed before main. I use this to open my log file and my DB connection.
Is there a way to define code that will be executed when the program ends, either because it reaches the end of the main function or because it was interrupted ? The only way I can think of is by manually calling a deffered terminate function on each package used by main, but that's quite verbose and error prone.
The C atexit functionality was considered by the Go developers and the idea of adopting it was rejected.
From one of the related thread at golang-nuts:
Russ Cox:
Atexit may make sense in single-threaded, short-lived
programs, but I am skeptical that it has a place in a
long-running multi-threaded server.
I've seen many C++ programs that hang on exit because
they're running global destructors that don't really need to
run, and those destructors are cleaning up and freeing
memory that would be reclaimed by the operating system
anyway, if only the program could get to the exit system call.
Compared to all that pain, needing to call Flush when you're
one with a buffer seems entirely reasonable and is
necessary anyway for correct execution of long-running
programs.
Even ignoring that problem, atexit introduces even more
threads of control, and you have to answer questions like
do all the other goroutines stop before the atexit handlers
run? If not, how do they avoid interfering? If so, what if
one holds a lock that the handler needs? And on and on.
I'm not at all inclined to add Atexit.
Ian Lance Taylor:
The only fully reliable mechanism is a wrapper program that invokes the
real program and does the cleanup when the real program completes. That
is true in any language, not just Go.
In my somewhat unformed opinion, os.AtExit is not a great idea. It is
an unstructured facility that causes stuff to happen at program exit
time in an unpredictable order. It leads to weird scenarios like
programs that take a long time just to exit, an operation that should be
very fast. It also leads to weird functions like the C function _exit,
which more or less means exit-but-don't-run-atexit-functions.
That said, I think a special exit function corresponding to the init
function is an interesting idea. It would have the structure that
os.AtExit lacks (namely, exit functions are run in reverse order of when
init functions are run).
But exit functions won't help you if your program gets killed by the
kernel, or crashes because you call some C code that gets a segmentation
violation.
In general, I agree with jnml's answer. Should you however still want to do it, you could use defer in the main() function, like this: http://play.golang.org/p/aUdFXHtFOM.

First call to Julia is slow

This questions deals with first load performance of Julia
I am running a Julia program from command line. The program is intended to be an API where user of the API doesn't have to initialize internal objects.
module Internal
type X
c::T1
d::T2
.
..
end
function do_something(a::X, arg:Any)
#some function
end
export do_something
end
API.jl
using Internal
const i_of_X = X()
function wrapper_do_something(args::Any)
do_something(i_of_X, args)
end
Now, this API.jl is exposed to third party user so that they don't have to bother about instantiating the internal objects. However, API.jl is not a module and hence cannot be precompiled. As there are many functions in API.jl, the first load takes a "very" long time.
Is there a way to improve the performance? I tried wrapping API.jl in a module too but I don't know if wrapping const initialized variables in a module is the way to go. I also get segmentation fault on doing so (some of the const are database connections and database collections along with other complex objects).
I am using v0.5 on OSX
[EDIT]
I did wrap API.jl in a module but there is no performance improvement.
I digged deeper and a big performance hit comes from the first call to linear regression function (GLM module based OLS lm(y~X, df)). The df has only 2 columns and 3 rows so it's not the run time issues but compilation slowness.
The other big hit comes from calling a highly overloaded function. The overloaded function fetches data from the database and can accept variety of input formats.
Is there a way to speed these up? Is there a way to fully precompile the julia program?
For a little more background, API based program is called once via command-line and any persistent first compilation advantages are lost as command-line closes the Julia process.
$julia run_api_based_main_func.jl
One hacky way to use the compilation benefits is to somehow copy/paste the code in already active julia process. Is this doable/recommended?(I am desperate to make it fast. Waiting 15-20s for a 2s analysis doesn't seem right)
It is OK to wrap const values in a module. They can be exported, as need be.
As Fengyang said, wrapping independent components of a larger design in modules is helpful and will help in this situation. When there is a great deal going on inside a module, the precompile time that accompanies each initial function call can add up. There is a way to avoid that -- precompile the contents before using the module:
__precompile__(true)
module ModuleName
# ...
end # module ModuleName
Please Note (from the online help):
__precompile__() should not be used in a module unless all of its dependencies are also using __precompile__(). Failure to do so can
result in a runtime error when loading the module.

Lua newbie: Alternate (Knuth) pseudo-random function performance?

I'm developing a roguelike in Lua for iOS and OSX. Pretty new to Lua and discovered to my dismay how nonrandom math.random is on my platform. I already had already setup my calls for random numbers set up through a function:
function rollD(max)
return math.random(max)
end
So I found a fantastic answer in response to this post which turns out I think will solve my problem (it's pretty critical for a roguelike that the game be different each time) But in order to make the following tweaked function:
function rollD(max)
return srandom(seedobj,1,max)
end
work, I had to make:
local seedobj = { seed = -232343 }
from Donati's Knuth adaptation not be local anymore, and then actually modified it to use (os.time()*-1). This actually works perfectly so far and my (very rudimentary) roguelike is rolling up random bad guys and dungeons just like I want it to. But I worry when things work right...
With a high number of calls to srandom (probably upwards of a thousand calls per level) am I going to take some kind of performance hit by having seedobj be global? I would like to think that, because it's nested in the table, that seed is a reference and that I'm worrying for nothing. But otherwise: is there a way I should modify this function so I can call it more efficiently?
Accessing a global variable in Lua is like accessing a table field. If seedobj is global the following code:
function rollD(max)
return srandom(seedobj,1,max)
end
in Lua 5.2 is equivalent to:
function rollD(max)
return srandom(_ENV.seedobj,1,max)
end
or in Lua 5.1 is (roughly) equivalent to
function rollD(max)
return srandom(_G.seedobj,1,max)
end
Where _ENV is the variable holding the current environment table and _G is the variable holding the global table.
Therefore whenever you call rollD you incur a small performance penalty for that indirect access, compared to a local variable. In general this penalty is significant or not depending on the complexity of the other operations performed when you call rollD.
In your specific case that penalty is unlikely to be noticeable, since the srandom implementation already performs much more intensive computations (among which some table accesses as well).

What's the most efficient way to ignore code in lua?

I have a chunk of lua code that I'd like to be able to (selectively) ignore. I don't have the option of not reading it in and sometimes I'd like it to be processed, sometimes not, so I can't just comment it out (that is, there's a whole bunch of blocks of code and I either have the option of reading none of them or reading all of them). I came up with two ways to implement this (there may well be more - I'm very much a beginner): either enclose the code in a function and then call or not call the function (and once I'm sure I'm passed the point where I would call the function, I can set it to nil to free up the memory) or enclose the code in an if ... end block. The former has slight advantages in that there are several of these blocks and using the former method makes it easier for one block to load another even if the main program didn't request it, but the latter seems the more efficient. However, not knowing much, I don't know if the efficiency saving is worth it.
So how much more efficient is:
if false then
-- a few hundred lines
end
than
throwaway = function ()
-- a few hundred lines
end
throwaway = nil -- to ensure that both methods leave me in the same state after garbage collection
?
If it depends a lot on the lua implementation, how big would the "few hundred lines" need to be to reliably spot the difference, and what sort of stuff should it include to best test (the main use of the blocks is to define a load of possibly useful functions)?
Lua's not smart enough to dump the code for the function, so you're not going to save any memory.
In terms of speed, you're talking about a different of nanoseconds which happens once per program execution. It's harming your efficiency to worry about this, which has virtually no relevance to actual performance. Write the code that you feel expresses your intent most clearly, without trying to be clever. If you run into performance issues, it's going to be a million miles away from this decision.
If you want to save memory, which is understandable on a mobile platform, you could put your conditional code in it's own module and never load it at all of not needed (if your framework supports it; e.g. MOAI does, Corona doesn't).
If there is really a lot of unused code, you can define it as a collection of Strings and loadstring() it when needed. Storing functions as strings will reduce the initial compile time, however of most functions the string representation probably takes up more memory than it's compiled form and what you save when compiling is probably not significant before a few thousand lines... Just saying.
If you put this code in a table, you could compile it transparently through a metatable for minimal performance impact on repeated calls.
Example code
local code_uncompiled = {
f = [=[
local x, y = ...;
return x+y;
]=]
}
code = setmetatable({}, {
__index = function(self, k)
self[k] = assert(loadstring(code_uncompiled[k]));
return self[k];
end
});
local ff = code.f; -- code of x gets compiled here
ff = code.f; -- no compilation here
for i=1, 1000 do
print( ff(2*i, -i) ); -- no compilation here either
print( code.f(2*i, -i) ); -- no compile either, but table access (slower)
end
The beauty of it is that this compiles as needed and you don't really have to waste another thought on it, it's just like storing a function in a table and allows for a lot of flexibility.
Another advantage of this solution is that when the amount of dynamically loaded code gets out of hand, you could transparently change it to load code from external files on demand through the __index function of the metatable. Also, you can mix compiled and uncompiled code by populating the "code" table with "real" functions.
Try the one that makes the code more legible to you first. If it runs fast enough on your target machine, use that.
If it doesn't run fast enough, try the other one.
lua can ignore multiple lines by:
function dostuff()
blabla
faaaaa
--[[
ignore this
and this
maybe this
this as well
]]--
end

Condition, Block, Module - which way is the most memory and computationally efficient?

There are always several ways to do the same thing in Mathematica. For example, when adapting WReach's solution for my recent problem I used Condition:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] /; (Unset[done]; True) :=
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]; Unset[done]]]
However, we can do the same thing with Block:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] :=
Block[{done},
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]
Or with Module:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] :=
Module[{done},
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]
Probably there are several other ways to do the same. Which way is the most efficient from the point of view of memory and CPU use (f may return very large arrays of data - but may return very small)?
Both Module and Block are quite efficient, so the overhead induced by them is only noticable when the body of a function whose variables you localize does very little. There are two major reasons for the overhead: scoping construct overhead (scoping constructs must analyze the code they enclose to resolve possible name conflicts and bind variables - this takes place for both Module and Block), and the overhead of creation and destruction of new symbols in a symbol table (only for Module). For this reason, Block is somewhat faster. To see how much faster, you can do a simple experiment:
In[14]:=
Clear[f,fm,fb,fmp];
f[x_]:=x;
fm[x_]:=Module[{xl = x},xl];
fb[x_]:=Block[{xl = x},xl];
Module[{xl},fmp[x_]:= xl=x]
We defined here 4 functions, with the simplest body possible - just return the argument, possibly assigned to a local variable. We can expect the effect to be most pronounced here, since the body does very little.
In[19]:= f/#Range[100000];//Timing
Out[19]= {0.063,Null}
In[20]:= fm/#Range[100000];//Timing
Out[20]= {0.343,Null}
In[21]:= fb/#Range[100000];//Timing
Out[21]= {0.172,Null}
In[22]:= fmp/#Range[100000];//Timing
Out[22]= {0.109,Null}
From these timings, we see that Block is about twice faster than Module, but that the version that uses persistent variable created by Module in the last function only once, is about twice more efficient than Block, and almost as fast as a simple function invokation (because persistent variable is only created once, and there is no scoping overhead when applying the function).
For real functions, and most of the time, the overhead of either Module or Block should not matter, so I'd use whatever is safer (usually, Module). If it does matter, one option is to use persistent local variables created by Module only once. If even this overhead is significant, I'd reconsider the design - since then obviously your function does too little.There are cases when Block is more beneficial, for example when you want to be sure that all the memory used by local variables will be automatically released (this is particularly relevant for local variables with DownValues, since they are not always garbage - collected when created by Module). Another reason to use Block is when you expect a possibility of interrupts such as exceptions or aborts, and want the local variables to automatically be reset (which Block does). By using Block, however, you risk name collisions, since it binds variables dynamically rather than lexically.
So, to summarize: in most cases, my suggestion is this: if you feel that your function has serious memory or run-time inefficiency, look elsewhere - it is very rare for scoping constructs to be the major bottleneck. Exceptions would include not garbage-collected Module variables with accumulated data, very light-weight functions used very frequently, and functions which operate on very efficient low-level structures such as packed arrays and sparse arrays, where symbolic scoping overhead may be comparable to the time it takes a function to process its data, since the body is very efficient and uses fast functions that by-pass the main evaluator.
EDIT
By combining Block and Module in the fashion suggested here:
Module[{xl}, fmbp[x_] := Block[{xl = x}, xl]]
you can have the best of both worlds: a function as fast as Block - scoped one and as safe as the one that uses Module.

Resources