In any language, is it better to use functions which are a bit long to write on the main function if you're only going to use it a few times? I heard that in Python, doing so will make the function faster. Is it also true in Javascript or in any other language?
Example:
function main(){
blahblahblahblah1();
blahblahblahblah2();
blahblahblahblah3();
blahblahblahblah1();
blahblahblahblah2();
blahblahblahblah3();
}
setInterval(main,1);
Is it better to group the code (blahblahblah) to make it look like this:
function blahblahblah(){
blahblahblah1();
blahblahblah2();
blahblahblah3();
}
function main(){
blahblahblah();
}
setInterval(main,1);
The performance impact of this decision is too small to measure in any language I'm aware of - the interpreter/compiler optimizes this away. The only cases where this may matter is if you're writing software for extremely constrained devices like embedded systems (but you wouldn't be doing that in Python or Javascript), or where you're writing software with extreme performance requirements like graphics engines for video games (but again, you wouldn't be using Python or Javascript for that).
Instead of optimizing for performance, I'd encourage you to optimize for readability, testability and bug resistance. Your second example is not equivalent to the first (it only executes the functions once, the first example executes them twice). Assuming you meant to have them equivalent, the second example is a little more readable, because the logic of executing the three functions is abstracted into a higher-level function.
if you use a logic only once in your code, you shouldnt create a function - the compiler/interpreter will optimize the code anyway. if you plan to reuse the logic, then create a function - it makes the code smaller and better to maintain for you - even its minimal slower for the extra functionlookup and call, these are minimal performance effects...
Related
I'm going to test the performance for some math function like pow, exp and log etc. Is there any reliable test data for that?
As those functions were highly optimized in exist modern system library like glibm or in OpenJDK, the general random inputs may lead to a quick convergence, or triggered some short path.
Call them in a loop to test throughput or latency, depending on whether the input to the next call depends on the output of the previous or not. Probably with data from a small to medium sized array of random values for the throughput test.
You want your compiler to produce an asm loop that does minimal work beyond the function call, so use appropriate techniques for whatever language and compiler you choose. (Idiomatic way of performance evaluation?)
You might disassemble or single-step through their execution to look for data-dependent branching to figure out which ranges of inputs might be faster or slower. (Or for open-source math libraries like glibc, the commented source could be good to look at.)
I'm working on a large performance-critical project that is very branch heavy. In the process of designing algorithms for this product, my employer often reminds me to write code that is more "human logical", or written in a manner that more closely aligns with the way we logically think.
While this makes sense to me from a few different perspectives (e.g. ease of understanding/remembering, code maintenance, etc.), I'm also wondering whether this approach could also ever be expected to lead to a more optimized compiled output.
Could this be the case due to the fact that compilers are written by humans, and optimizers are often designed to recognize familiar code blocks?
I would love to hear some thoughts on why this could/not be the case.
Consider two different kinds of code, library code and application code.
Library code (like a string class library) is likely to own the program counter a lot of the time, like this:
while(some test){
massage some data, while seldom calling sub-functions
}
That kind of code will benefit from compiler optimization.
(So to answer your question, people write benchmark functions like this, and the compiler-writers use those as test cases.)
On the other hand, application code tends to look like this:
if (some test){
do a bunch of things, including many function calls
} else if (some other test){
do a bunch of things, including many function calls
} else {
do a bunch of things, including many function calls
}
In this case, the time you save by branch prediction or cycle-shaving might be 1 time unit, say, while the do a bunch of things... might spend from 10^2 to 10^8 time units, with or without I/O.
So the benefit of compiler optimization of this code tends to be completely lost in the noise.
That's not to say it can't be optimized.
It's just that the compiler can't do it - it's your job.
If you want to make the latter kind of code run fast, the best way is to find out which lines of code are on the call stack a high percent of time, and if possible, finding a way to avoid doing them.
(Here's an example of a 43x speedup.)
What is "human logical" probably varies from human to human.
For instance, if I am a newbie performing tasks according to written instructions I will (usually), over time, learn some tasks by heart whereas for others I will return to the instructions simply because the tasks are not performed often enough/are too boring or both. Others in the same situation may or may not function similiarly and it is not certain that the tasks they'll learn will be the ones I learn.
For programming it works similarly. Some may construct a loop in one manner and perform a test inside it for the sake of readability while I might do the test outside for performance reasons. What is more wrong and what is more right?
There is a widespread belief that compilers will optimize anything. This is true but as I've written (drastically) in another post, GIGO (Garbage In = Garbage Out) applies. Compilers don't operate in a vacuum: given a set of rules they'll perform safe optimizations on source code to the extent of their (the compilers') constructors' imagination and competence in code optimizations. Bloat source code will become optimized bloat machine code. In the same manner lean and mean source code will become optimized lean and mean machine code. In critical places it is possible to feed the compiler source code that it "feels" (YES! they do have personalities) absolutely comfortable in optimizing and the resulting machine code will fly.
We've all experienced poorly performing software. If we're lucky we've experienced software that performs incredibly well. One developer can learn to write a piece of code that performs well in the same amount of time that another writes code that performs poorly.
In a recent conversation with a fellow programmer, I asserted that "if you're writing the same code more than once, it's probably a good idea to refactor that functionality such that it can be called once from each of those places."
My fellow programmer buddy instead insisted that the performance impact of making these function calls was not acceptable.
Now, I'm not looking for validation of who was right. I'm simply curious to know if there are situations or patterns where I should consider the performance impact of a function call before refactoring.
"My fellow programmer buddy instead insisted that the performance impact of making these function calls was not acceptable."
...to which the proper answer is "Prove it."
The old saw about premature optimization applies here. Anyone who isn't familiar with it needs to be educated before they do any more harm.
IMHO, if you don't have the attitude that you'd rather spend a couple hours writing a routine that can be used for both than 10 seconds cutting and pasting code, you don't deserve to call yourself a coder.
Don't even consider the effect of calling overhead if the code isn't in a loop that's being called millions of times, in an area where the user is likely to notice the difference. Once you've met those conditions, go ahead and profile to see if your worries are justified.
Modern compilers of languages such as Java will inline certain function calls anyway. My opinion is that the design is way more important over the few instructions spent with function call. The only situation I can think about would be writing some really fine tuned code in assembler.
You need to ask yourself several questions:
Cost of time spent on optimizing code vs cost of throwing more hardware at it.
How does this impact maintainability?
How does going in either direction impact your deadline?
Does this really beg optimization when many modern compilers will do it for you anyway? Do not try to outsmart the compiler.
And of course, which will help you sleep better at night? :)
My bet is that there was a time in which the performance cost of a call to an external method or function WAS something to be concerned with, in the same way that the lengths of variable names and such all needed to be evaluated with respect to performance implications.
With the monumental increases in processor speed and memory resources int he last two decades, I propose that these concerns are no longer as pertinent as they once were.
We have been able use long variable names without concern for some time, and the cost of a call to external code is probably negligible in most cases.
There might be exceptions. If you place a function call within a large loop, you may see some impact, depending upon the number of iterations.
I propose that in most cases you will find that refactoring code into discrete function calls will have a negligible impact. There might be occasions in which there IS an impact. However, proper TESTING of a refactoring will reveal this. In those minority of cases, your friend might be correct. For most of the rest of the time, I propose that your friend is clining a little to closely to practices which pre-date most modern processors and storage media.
You care about function call overhead the same time you care about any other overhead: when your performance profiling tool indicates that it's a problem.
for the c/c++ family:
the 'cost' of the call is not important. if it needs to be fast, you just have to make sure the compiler is able to inline it. that means that:
the body must be visible to the compiler
the body is indeed small enough to be considered an inline candidate.
the method does not require dynamic dispatch
there are a few ways to break this default ability. for example:
huge instruction count already in the callsite. even with early inlining, the compiler may pop a trivial function out of line (even though it could generate more instructions/slower execution). early inlining is the compiler's ability to inline a function early on, when it sees the call costs more than the inline.
recursion
the inline keyword is more or less useless in this era, regarding its original intent. however, many compilers offer a means to restore the meaning, with a compiler specific directive. using this directive (correctly) helps considerably. learning how to use it correctly takes time. if in doubt, omit the directive and leave it up to the compiler.
assuming you are using a modern compiler, there is no excuse to avoid the function, unless you're also willing to go down to assembly for this particular program.
as it stands, and if performance is crucial, you really have two choices:
1) learn to write well organized programs for speed. downside: longer compile times
2) maintain a poorly written program
i prefer 1. any day.
(yes, i have spent a lot of time writing performance critical programs)
My question has Bash and PowerShell scripts in mind, but I suppose it applies to other languages as well.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times. This decreases the amount of code in the script and it also makes it easier to maintain.
With that in mind, if you discover that your script only calls a function one time then there's no reason for that function to exist as a function. Instead, you should take the function's code and place it in the location where that function is being called.
Having said all that, here's my question:
If I have a complicated script, should I move each section of code into its own function even though each function will only be called once? This would greatly increase the script's readability because its logic (the functions) would all be at the top of the script and the flow of execution would be at the bottom of the script. Since 50 lines of code would be represented by just 1 line, it would be much easier to understand what the script is doing.
Do other people do this? Are there disadvantages to this approach?
Having functions also increases readability. So a bash script might look better and be easier to follow if it reads:
getParams()
startTask()
doSomethingElse()
finishTask()
# implement functions below
even if the function implementations are simple, it reads better.
Code readability is indeed a major concern, usually (nowadays) more important than sheer amount of code or performance. Not to mention that inlining function calls may not necessarily have noticeable performance benefits (very language specific).
So lots of developers (I venture to say that the better of the breed :-) create small functions/methods like you describe, to partition their code into logically cohesive parts.
A function does a well-defined task. If you have a mega function that does 5 different things, it strongly suggests it should be calling 5 smaller functions.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times.
Well, it is my understanding that a function is a discrete entity that performs a specific, well defined task.
With that in mind, if you discover that your script calls a given function AT LEAST ONCE, then it's doing its job.
Focus on being able to read and easily understand your code.
Having clear, readable code is definitely more a payoff than being afraid of function calls overhead. That's just a premature optimisation.
Plus, the goal of a function is to accomplish a particular task. A task can be a sub-task, there's nothing wrong with that!
Read this book
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Here are some quotes from the book.
"Small!
The first rule of functions is that they should be small. The second rule of functions is that
they should be smaller than that."
"FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL.THEY SHOULD DO IT ONLY."
As far as my knowledge is concerned, a function represents a Sequence of steps which become a part of larger program.
Coming to your question, I strongly agree that function(s) improve readability and re-usability. But at the same time breaking every thing into pieces might not be a good practice.
Finally, I want to give one statement : "Anything In Excess Is Not Beneficial!"
Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.