Does recursive function parameters order affect performance - performance

Consider a frequently called recursive function having some parameters which do vary a lot among executions, and some which don't, representing some kind of a context information. For example, a tree traversal might look like this
private void Visit(Node node, List<Node> results)
{
if (IsMatch(node)) {
results.Add(node);
}
Visit(node.Left, results);
Visit(node.Right, results);
}
...
Visit(root, new List<Node>());
Obviously results collection is created once and the same reference used throughout all traversal calls.
The question is, does it matter for performance whether the function is declared as Visit(Node, List<Node>) or Visit(List<Node>, Node)? Is there a convention for the arguments order?
The vague idea is that fixed parameters might not be pushed in or popped out of stack constantly, improving the performance, but I'm not sure how feasible would that be.
I'm primarily interested in C# or Java, but would like to hear about any language for which the order matters.
Note: sometimes I happen to have three of four parameters overall. I realize that it is possible to create a class holding a context, or an anonymous function closing the context variables, but the question is about plain recursion.

This is going into the deepest conjecture territory for me, since there are an explosive number of variables that could affect the outcome ranging from the calling conventions of the platform to the characteristics of the compiler.
The question is, does it matter for performance whether the function
is declared as Visit(Node, List) or Visit(List, Node)? Is
there a convention for the arguments order?
In this kind of simple example, probably the most concise answer I can give is, "no unless proven otherwise" -- "innocent until proven guilty" sort of thing. It's highly unlikely.
But there are interesting scenarios that crop up in a more complex example like this:
API_EXPORT void f1(small a, small b, small c, small d, big e);
... vs:
API_EXPORT void f2(big a, small b, small c, small d, small e);
In this case, f1 might actually be more efficient than f2 in some cases.
It's because some calling conventions, like Win x64, allow the first four arguments passed to a function to be passed directly through registers without stack spills provided that they fit into a register and make up the first four parameters of the function.
In this case, with f1, the first four small arguments might fit into registers but the fifth one (regardless of size) needs to be spilled to the stack.
With f2, the first argument might be too big to fit into a register, so it needs to be spilled into the stack along with the fifth argument, so we could end up with twice as many variables being spilled to the stack and take a small performance hit (I've never actually measured a case like this, however).
This is a rare case scenario though even if it occurs, and it was critical for me to put that API_EXPORT specifier there because, unless this function is being exported, the compiler (or linker in some cases) can do all kinds of magic and inline the function and things of that sort and obliterate the need to pass arguments in this kind of precise way defined by its ABI.
In this kind of case, it might help a little bit to have parameters sorted in ascending order from smallest types that fit in registers to biggest types that don't.
But even then, is List<Node> a big type or a small type? Can it fit into a general-purpose register, e.g.? Languages like Java and C# treat everything referring to a UDT as a reference to garbage-collected memory. That might boil down to a simple memory address which does fit into a register. I don't know what goes on here under the hood, but I would suspect the reference itself can fit into a register and is actually a small type. A big type would be like a 4x4 matrix being passed by value (as a deep copy).
Anyway, this is all case-by-case and drawing upon a lot of conjecture, but these are potential factors that might sway things one way or another in some rare scenario.
Nevertheless, for the example you cited, I'd suggest a strong "no unless proven otherwise -- highly improbable".
I realize that it is possible to create a class holding a context, or
an anonymous function closing the context variables [...].
Often languages that provide classes generate machine instructions as though there is one extra implicit parameter in a function. For example:
class Foo
{
public:
void f();
};
... might translate to something analogical to:
void f(Foo* this);
It's worth keeping that in mind when it comes to thinking about calling conventions, aliasing, and how parameters are passed through registers and stack.
... but the question is about plain recursion
Recursion probably wouldn't affect this so much except that there might come a point where the compiler might cease to inline recursive functions, e.g. after a few levels, at which point we inevitably pay the full cost required of the ABI. But otherwise it's mostly the same: a function call is a function call whether it's recursive or not and the same concerns tend to apply for both. Potentially relevant things here most likely have much more to do with compiler and platform ABI than whether or not a function is being called recursively or non-recursively.

Related

R: Passing a data frame by reference

R has pass-by-value semantics, which minimizes accidental side effects (a good thing). However, when code is organized into many functions/methods for reusability/readability/maintainability and when that code needs to manipulate large data structures through, e.g., big data frames, through a series of transformations/operations the pass-by-value semantics leads to a lot of copying of data around and much heap thrashing (a bad thing). For example, a data frame that takes 50Mb on the heap that is passed as a function parameter will be copied at a minimum the same number of times as the function call depth and the heap size at the bottom of the call stack will be N*50Mb. If the functions return a transformed/modified data frame from deep in the call chain then the copying goes up by another N.
The SO question What is the best way to avoid passing a data frame around? touches this topic but is phrased in a way that avoids directly asking the pass-by-reference question and the winning answer basically says, "yes, pass-by-value is how R works". That's not actually 100% accurate. R environments enable pass-by-reference semantics and OO frameworks such as proto use this capability extensively. For example, when a proto object is passed as a function argument, while its "magic wrapper" is passed by value, to the R developer the semantics are pass-by-reference.
It seems that passing a big data frame by reference would be a common problem and I'm wondering how others have approached it and whether there are any libraries that enable this. In my searching I have not discovered one.
If nothing is available, my approach would be to create a proto object that wraps a data frame. I would appreciate pointers about the syntactic sugar that should be added to this object to make it useful, e.g., overloading the $ and [[ operators, as well as any gotchas I should look out for. I'm not an R expert.
Bonus points for a type-agnostic pass-by-reference solution that integrates nicely with R, though my needs are exclusively with data frames.
The premise of the question is (partly) incorrect. R works as pass-by-promise and there is repeated copying in the manner you outline only when further assignments and alterations to the dataframe are made as the promise is passed on. So the number of copies will not be N*size where N is the stack depth, but rather where N is the number of levels where assignments are made. You are correct, however, that environments can be useful. I see on following the link that you have already found the 'proto' package. There is also a relatively recent introduction of a "reference class" sometimes referred to as "R5" where R/S3 was the original class system of S3 that is copied in R and R4 would be the more recent class system that seems to mostly support the BioConductor package development.
Here is a link to an example by Steve Lianoglou (in a thread discussing the merits of reference classes) of embedding an environment inside an S4 object to avoid the copying costs:
https://stat.ethz.ch/pipermail/r-help/2011-September/289987.html
Matthew Dowle's 'data.table' package creates a new class of data object whose access semantics using the "[" are different than those of regular R data.frames, and which is really working as pass-by-reference. It has superior speed of access and processing. It also can fall back on dataframe semantics since in later years such objects now inherit the 'data.frame' class.
You may also want to investigate Hesterberg's dataframe package.

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Inlining Algorithm

Does anyone know of any papers discussion inlining algorithms? And closely related, the relationship of parent-child graph to call graph.
Background: I have a compiler written in Ocaml which aggressively inlines functions, primarily as a result of this and some other optimisations it generates faster code for my programming language than most others in many circumstances (including even C).
Problem #1: The algorithm has trouble with recursion. For this my rule is only to inline children into parents, to prevent infinite recursion, but this precludes sibling functions inlining once into each other.
Problem #2: I do not know of a simple way to optimise inlining operations. My algorithm is imperative with mutable representation of function bodies because it does not seem even remotely possible to make an efficient functional inlining algorithm. If the call graph is a tree, it is clear that a bottom up inlining is optimal.
Technical information: Inlining consists of a number of inlining steps. The problem is the ordering of the steps.
Each step works as follows:
we make a copy of the function to be inlined and beta-reduce by
replacing both type parameters and value parameters with arguments.
We then replace return statement with an assignment to a new variable
followed by a jump to the end of the function body.
The original call to the function is then replaced by this body.
However we're not finished. We must also clone all the children of
the function, beta-reducting them as well, and reparent the clones to
the calling function.
The cloning operation makes it extremely hard to inline recursive functions. The usual trick of keeping a list of what is already in progress and just checking to see if we're already processing this call does not work in naive form because the recursive call is now moved into the beta-reduced code being stuffed into the calling function, and the recursion target may have changed to a cloned child. However that child, in calling the parent, is still calling the original parent which calls its child, and now the unrolling of the recursion will not stop. As mentioned I broke this regress by only allowing inlining a recursive call to a child, preventing sibling recursions being inlined.
The cost of inlining is further complicated by the need to garbage collect unused functions. Since inlining is potentially exponential, this is essential. If all the calls to a function are inlined, we should get rid of the function if it has not been inlined into yet, otherwise we'll waste time inlining into a function which is no longer used. Actually keeping track of who calls what is extremely difficult, because when inlining we're not working with an actual function representation, but an "unravelled" one: for example, the list of instructions is being processed sequentially and a new list built up, and at any one point in time there may not be a coherent instruction list.
In his ML compiler Steven Weeks chose to use a number of small optimisations applied repeatedly, since this made the optimisations easy to write and easy to control, but unfortunately this misses a lot of optimisation opportunities compared to a recursive algorithm.
Problem #3: when is it safe to inline a function call?
To explain this problem generically: in a lazy functional language, arguments are wrapped in closures and then we can inline an application; this is the standard model for Haskell. However it also explains why Haskell is so slow. The closures are not required if the argument is known, then the parameter can be replaced directly with its argument where is occurs (this is normal order beta-reduction).
However if it is known the argument evaluation is not non-terminating, eager evaluation can be used instead: the parameter is assigned the value of the expression once, and then reused. A hybrid of these two techniques is to use a closure but cache the result inside the closure object. Still, GHC hasn't succeeded in producing very efficient code: it is clearly very difficult, especially if you have separate compilation.
In Felix, I took the opposite approach. Instead of demanding correctness and gradually improving efficiency by proving optimisations preserved semantics, I mandate that the optimisation defines the semantics. This guarantees correct operation of the optimiser at the expense of uncertainty about what how certain code will behave. The idea is to provide ways for the programmer to force the optimiser to conform to intended semantics if the default optimisation strategy is too aggressive.
For example, the default parameter passing mode allows the compiler to chose whether to wrap the argument in a closure, replace the parameter with the argument, or assign the argument to the parameter. If the programmer wants to force a closure, they can just pass in a closure. If the programmer wants to force eager evaluation, they mark the parameter var.
The complexity here is much greater than a functional programming language: Felix is a procedural language with variables and pointers. It also has Haskell style typeclasses. This makes the inlining routine extremely complex, for example, type-class instances replace abstract functions whenever possible (due to type specialisation when calling a polymorphic function, it may be possible to find an instance whilst inlining, so now we have a new function we can inline).
Just to be clear I have to add some more notes.
Inlining and several other optimisations such as user defined term reductions, typeclass instantiations, linear data flow checks for variable elimination, tail rec optimisation, are done all at once on a given function.
The ordering problem isn't the order to apply different optimisations, the problem is to order the functions.
I use a brain dead algorithm to detect recursion: I build up a list of everything used directly by a each function, find the closure, and then check if the function is in the result. Note the usage set is built up many times during optimisation, and this is a serious bottleneck.
Whether a function is recursive or not can change unfortunately. A recursive function might become non-recursive after tail rec optimisation. But there is a much harder case: instantiating a typeclass "virtual" function can make what appeared to be non-recursive recursive.
As to sibling calls, the problem is that given f and g where f calls g and g calls f I actually want to inline f into g, and g into f .. once. My infinite regress stopping rule is to only allow inlining of f into g if they're mutually recursive if f is a child of g, which excludes inlining siblings.
Basically I want to "flatten out" all code "as much as possible".
I realize you probably already know all this, but it seems important to still write it in full, at least for further reference.
In the functional community, there is some litterature mostly from the GHC people. Note that they consider inlining as a transformation in the source language, while you seem to work at a lower level. Working in the source language -- or an intermediate language of reasonably similar semantics -- is, I believe, a big help for simplicity and correctness.
GHC Wiki : Inlining (contains a bibliography)
Secrets of the Glasgow Haskell inliner
For the question of the ordering compiler passes, this is quite arcane. Still in a Haskell setting, there is the Compilation by Transformation in a Non-strict Functional Language PhD Thesis which discusses the ordering of different compiler passes (and also inlining).
There is also the quite recent paper on Compilation by Equality Saturation which propose a novel approach to optimisation passes ordering. I'm not sure it has yet demonstrated applicability at a large scale, but it's certainly an interesting direction to explore.
As for the recursion case, you could use Tarjan algorithm on your call graph to detect circular dependency clusters, and exclude them from inlining. It won't affect sibling calls.
http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Game loop performance and component approach

I have an idea of organising a game loop. I have some doubts about performance. May be there are better ways of doing things.
Consider you have an array of game components. They all are called to do some stuff at every game loop iteration. For example:
GameData data; // shared
app.registerComponent("AI", ComponentAI(data) );
app.registerComponent("Logic", ComponentGameLogic(data) );
app.registerComponent("2d", Component2d(data) );
app.registerComponent("Menu", ComponentMenu(data) )->setActive(false);
//...
while (ok)
{
//...
app.runAllComponents();
//...
}
Benefits:
good component-based application, no dependencies, good modularity
we can activate/deactivate, register/unregister components dynamically
some components can be transparently removed or replaced and the system still will be working as nothing have happened (change 2d to 3d)(team-work: every programmer creates his/her own components and does not require other components to compile the code)
Doubts:
inner loop in the game loop with virtual calls to Component::run()
I would like Component::run() to return bool value and check this value. If returned false, component must be deactivated. So inner loop becomes more expensive.
Well, how good is this solution? Have you used it in real projects?
Some C++ programmers have way too many fears about the overhead of virtual functions. The overhead of the virtual call is usually negligible compared to whatever the function does. A boolean check is not very expensive either.
Do whatever results in the easiest-to-maintain code. Optimize later only if you need to do so. If you do find you need to optimize, eliminating virtual calls will probably not be the optimization you need.
In most "real" games, there are pretty strict requirements for interdependencies between components, and ordering does matter.
This may or may not effect you, but it's often important to have physics take effect before (or after) user interaction proecssing, depending on your scenario, etc. In this situation, you may need some extra processing involved for ordering correctly.
Also, since you're most likely going to have some form of scene graph or spatial partitioning, you'll want to make sure your "components" can take advantage of that, as well. This probably means that, given your current description, you'd be walking your tree too many times. Again, though, this could be worked around via design decisions. That being said, some components may only be interested in certain portions of the spatial partition, and again, you'd want to design appropriately.
I used a similar approach in a modular synthesized audio file generator.
I seem to recall noticing that after programming 100 different modules, there was an impact upon performance when coding new modules in.
On the whole though,I felt it was a good approach.
Maybe I'm oldschool, but I really don't see the value in generic components because I don't see them being swapped out at runtime.
struct GameObject
{
Ai* ai;
Transform* transform;
Renderable* renderable;
Collision* collision;
Health* health;
};
This works for everything from the player to enemies to skyboxes and triggers; just leave the "components" that you don't need in your given object NULL. You want to put all of the AIs into a list? Then just do that at construction time. With polymorphism you can bolt all sorts of different behaviors in there (e.g. the player's "AI" is translating the controller input), and beyond this there's no need for a generic base class for everything. What would it do, anyway?
Your "update everything" would have to explicitly call out each of the lists, but that doesn't change the amount of typing you have to do, it just moves it. Instead of obfuscatorily setting up the set of sets that need global operations, you're explicitly enumerating the sets that need the operations at the time the operations are done.
IMHO, it's not that virtual calls are slow. It's that a game entity's "components" are not homogenous. They all do vastly different things, so it makes sense to treat them differently. Indeed, there is no overlap between them, so again I ask, what's the point of a base class if you can't use a pointer to that base class in any meaningful way without casting it to something else?

Why is determining if a function is pure difficult?

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.
That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:
function isPure(functionMetadata): boolean;
begin
result = true;
for each variable in functionMetadata.variablesModified
result = result and variable.isLocalToThisFunction;
for each dependency in functionMetadata.functionsCalled
result = result and isPure(dependency);
end;
That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.
Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.
Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.
Does anyone with a bit more knowledge in this area know what I'm missing?
You also need to annotate every system call, every FFI, ...
And furthermore the tiniest 'leak' tends to leak into the whole code base.
It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.
As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.
And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.
It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.
In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.
Also, you could write a "pure" function that did modify variables.
Here's the first thing that popped into my mind when I read your question.
Class Hierarchies
Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.
But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.
Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.
I think the main problem would be doing it efficiently.
D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.
Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.
Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl
proc myproc {a b} {
if { $a > $b } {
return $a
} else {
return $b
}
}
Every single piece of that could be modified at any time. For example:
the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command
Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.

Resources