Is iteration slower than linear code? Which one is preferable? - coding-style

I have a question in my mind from last many days, that while writing a code in ruby, is the linear code is faster and preferable than an iteration?
Let me have an example. There is a block of code for same functionality written in two different ways:
Way 1:
['dog', 'cat', 'tiger'].each do |pet_name|
puts "I have many pets, one of them is #{pet_name}."
end
Way 2:
puts "I have many pets, one of them is dog."
puts "I have many pets, one of them is cat."
puts "I have many pets, one of them is tiger."
So, I want to know which one is better and preferable? As per my view, I think 2nd one will take less time and memory. But I want to confirm.

Sequential logic is faster (see the benchmark below the fold), but it almost never matters. The clearer and more maintainable code should always be selected. Only a demonstrated need should cause one to cease optimizing for the programmer and begin optimizing for the machine. By demonstrated, I mean measured--you ran it and found it to be too slow.
The second example violates the DRY (Don't Repeat Yourself) principle and is a minor maintenance problem.
require 'benchmark'
LOOPS = 100000
Benchmark.bm(10) do |x|
x.report('iteration') do
LOOPS.times do
['dog', 'cat', 'tiger'].each do |pet_name|
"I have many pets, one of them is #{pet_name}."
end
end
end
x.report('sequence') do
LOOPS.times do
"I have many pets, one of them is dog."
"I have many pets, one of them is cat."
"I have many pets, one of them is tiger."
end
end
end
# => user system total real
# => iteration 0.200000 0.000000 0.200000 ( 0.202054)
# => sequence 0.010000 0.000000 0.010000 ( 0.012195)

In both cases, the time spent running the actual Ruby code is going to be completely dominated by the time it takes to print the text to the screen. Remember: console output is slow. Really slow. Painfully slow.
Since in both cases the code prints the same amount of text (and, in fact, the same text) to the screen, any miniscule performance differences that might or might not exist are going to be lost in the noise.
I think 2nd one will take less time and memory.
Don't think. Look.
Here's a crazy idea: if you want to know which one runs faster, run them and see which one runs faster!

There is always cost for calling a function, creating an array or creating loop.. However this is what programming languages were built for, so to answer your question: yes, the second code would be faster, nanoseconds maybe. But the first code is more general, you never know when you will buy a new pet. It's more useful, maybe someone will give you list of their pets, and you will want to talk about them? Generally speaking well - second code is unnoticeably faster, but first is better and preferable.

I realize that your example is very simplified and unlikely to happen in real world, but taking it literally:
The first example will create intermediate objects (strings and array), so you may say that in fact it will take more memory. However these object will be garbage-collected later, so you'll get your memory back. (This would not be the case if you defined array of symbols, as symbols are not garbage-collected).
It is also faster, because it doesn't need to fetch the objects internally from the array during each iteration. But the difference if obviously unnoticeable and should not be taken into account. What should be taken into account is readability here.
If you were a performance freak that you should probably define your methods without parentheses around the arguments as well, because this would result in smaller parse tree created by Ruby interpreter.
# slower
def meth(arg)
end
# faster
def meth arg
end
But considering it a valid reason would be silly of course.
Edit: if you're looking for a good Ruby style guide check this: https://github.com/bbatsov/ruby-style-guide

If one of those two options were clearly better and therefore preferable, the language wouldn't provide both options. As always, it depends on the circumstances. Questions you should be asking to decide include
which solution is more readable? (For habitual Ruby programmers, the first one is.) The more readable, the better.
which solution is faster? (You can only decide this by measuring. Take care to use a realistic example - the time difference for only three animals may not even be measurable reliably.) The faster, the better.
how much duplication is introduced by "unrolling" the more idiomatic version? (In this case, not very much - the puts and part of the string literal.) The less duplication you would introduce, the better.
As you see, the answers contradict each other, so you have to understand the context of the decision you're making and weigh the factors correctly to find out which is best overall.

Strictly speaking, yes, there is an overhead involved in iteration. There will be in any language you use (although some use compiler tricks to reduce this). The first version of your code will run a negligible amount quicker due to the cost of iteration, and of building the array you've defined. Also, the string building will probably add another tiny slice of cost.
That being said, it's such a tiny difference you'd have to be very picky to notice, or even care about any of this. It's negligible. Try benchmarking it yourself, you won't notice a significant difference.
I'm not prepared to write 10,000 lines like that for some method, are you? I find that iteration looks a lot cleaner, especially for non-trivial code, and is often preferable in terms of readability and clean code. Not to mention it's more DRY.

Related

What is the maximum length a method or function should be?

A friend from work was arguing with me that it is ok for some methods to be 200 lines long, it can still be easy to understand.
And I was arguing that if you can split big methods, you should.
As a rule of thumb, no method should be, for example, bigger than 30 lines.
I've search a lot of places (at google of course) for a reasonable explanation in favor of each way, but couldn't find anything besides "the code will be more maintainable" or "divide and conquer", but no exact explanation of why.
Could anyone explain to me the concrete reasons as for why big methods are bad (or not)?
A method, function, or any piece of code is good if someone other than the author can understand it. If your methods start getting long, there is a good chance that some re-factoring or an OOP design principle can be employed to separate it into smaller chunks that have less total responsibility.
(I.e. Single Responsibility Principle)
Writing code is more art than science (sorry Comp. Sci. peeps...its true) I don't know of any compiler that will care if your code is long winded, but if you need to change something later on it will be a royal pain....trust me.
Regardless of what you do, there is always be another programmer with a theory or pattern to apply, but in the end you need to write code that works. No end user will ever compliment you on how elegant your code is, or on how wonderful you used of a design pattern...all they care about is does it work!
So, where is my advise...
Write code with lots of comments
Spend time reading about OOP patterns
Download and inspect others code often
If you have a method of 200 lines, and you give this code to someone else, that person might want to change it, but before he can do that he will have to go through 200 lines before he understands what to edit. It's just not efficient because it's not easily readable.
In practice, a 200 line method can always be split up in multiple methods (divide and conquer as you call it). The advantage of this, is that these methods can be re-used, and your method becomes more readable. Now if you need to edit that piece of code, you only need to edit it once, instead of having to copy your code twice.
That said, I think a maximum of 30 lines for a method is quite appropiate. But as a rule of thumb you should always ask yourself if parts of your method could be re-used, split up for readability purposes, or if you can use other methods to make your code smaller and therefore more readable. By having parameters in your methods you can make your method more flexible which is also a good thing, because you can re-use it in other methods.

Imperative vs Functional Programming in Ruby

I am reading this article about how to program in Ruby in Functional Style.
https://code.google.com/p/tokland/wiki/RubyFunctionalProgramming
One of the examples that took my attention is the following:
# No (mutable):
output = []
output << 1
output << 2 if i_have_to_add_two
output << 3
# Yes (immutable):
output = [1, (2 if i_have_to_add_two), 3].compact
Whereas the "mutable" option is less safe because we changing the value of the array, the inmutable one seems less efficient because it calls .compact. That means that it has to iterate the array to return a new one without the nil's values.
In that case, which option is preferible? And in general how do you choose between immutability (functional) vs performance (in the case when the imperative solution is faster)?
You're not wrong. It is often the case that a purely functional solution will be slower than a destructive one. Immutable values generally mean a lot more allocation has to go on unless the language is very well optimized for them (which Ruby isn't).
However, it doesn't often matter. Worrying about the performance of specific operations is not a great use of your time 99% of the time. Shaving off a microsecond from a piece of code that runs 100 times a second is simply not a win.
The best approach is usually to do whatever makes your code cleanest. Very often this means taking advantage of the functional features of the language — for example, map and select instead of map! and keep_if. Then, if you need to speed things up, you have a nice, clean codebase that you can make changes to without fear that your changes will make one piece of code stomp over another piece's data.

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Writing shorter code/algorithms, is more efficient (performance)?

After coming across the code golf trivia around the site it is obvious people try to find ways to write code and algorithms as short as the possibly can in terms of characters, lines and total size, even if that means writing something like:
//Code by: job
//Topic: Code Golf - Collatz Conjecture
n=input()
while n>1:n=(n/2,n*3+1)[n%2];print n
So as a beginner I start to wonder whether size actually matters :D
It is obviously a very subjective question highly dependent on the actual code being used, but what is the rule of thumb in the real world.
In the case that size wont matter, how come then we don't focus more on performance rather than size?
I hope this does not become a flame war. Good code has many attributes, including:
Solving the use-case properly.
Readability.
Maintainability.
Performance.
Testability.
Low memory signature.
Good user interface.
Reusability.
The brevity of code is not that important in 21st century programming. It used to be more important when memory was really scarce. Please see this question, including my answer, for books referencing the attributes above.
A lot of good answers already about what's important versus what's not. In real life, (almost) nobody writes code like code golf, wtih shortened identifiers, minimal whitespace, and the fewest possible statements.
That said, "more code" does correlate with more bugs and complexity, and "less code" tends to correlate with better readability and performance. So all other things being equal, it's useful to strive for shorter code, but only in the sense of "these simple 30 lines of code do the same as that 100 complex lines of code".
Writing "code golf" solutions are often to do with showing how "clever" you are in getting the job done in the most succinct way even at the expense of readability. Quite often, however, more verbose code including, for example, memoization of function results, can be faster. Code size can matter for performance, smaller blocks of code can fit in the L1 CPU cache but this is an extreme case of optimization and a faster algorithm will most always be better. "Code Golf" code is not like production code - always write for clarity & readability of the solution rather than terseness if anyone, including yourself, ever intend to read that code again.
Whitespace has no effect on performance. So code like that is just silly (or perhaps the golf score was based on the character count?). The number of lines also has no effect, although the number of statements can have an effect. (exception: python, where whitespace is significant)
The effect is complex, however. It's not at all uncommon to discover that you have to add statements to a function in order to improve it's performance.
Still, without knowing anything else, bet that more statements is a larger object file, and a slower program. But more lines doesn't do anything other than make code more readable up to a point (after which adding more lines makes it less readable ;)
I don't believe that Code Golf has any practical significance. In practice, readable code is what counts. Which in itself is a conflicting requirement: readable code should be concise, but still easy to understood.
However, I would like to answer your question yet differently. Usually, there are fast and simple algorithms. However, if the speed is top priority, things can get complex real fast (and the resulting code will be longer). I don't believe that simplicity equals speed.
There are many aspects to performance. Performance can for example be measured by memory footprint, speed of execution, bandwith consumption, framerate, maintainability, supportability and so on. Performance usually means spending as little as possible of the most scarce resource.
When applied to networking, brevity IS performance. If your webserver serves a little javascript snippet on every page, it doesn't exactly hurt to keep the variable names short. Pull up www.google.com and view source!
Some times DRY is not helping performance. An example is that Microsoft has found that they don't want a to loop through an array unless it is bigger than 3 elements.
String.Format has signatures for one, two and three arguments, and then for array.
There are many ways of trading one aspect for another. This is usually called caching.
You can for example trade memory footprint for speed of execution. For example by doing lookup instead of execution. It is just a matter of replacing () with [] in most popular languages. If you plan it so that the spaceship in your game can only go in a fixed number of directions, you can save on trigonometric function calls.
Or you can use a proxy server with a cache for looking up things over a network. DNS servers do this all the time.
Finally, if development team availability is the most scarce resource, clarity of code is the best bet for maintainability performance, even if it doesn't run quite as fast or is quite as interesting or "elegant" in code.
Absolutely not. Code size and performance (however you measure it) are only very loosly connected. To make matter worse whats a neat trick on one chip/compiler/OS may very well be the worse thing you can do in another archictecture.
Its counter-intuitive but a clear well written simple as possible implmentation is often far more efficient than than a devious bag of tricks. Today's optimizing compilers like clear uncomplicated code just as much as humans and complex trickery can cause them to abandon thier best optimizing strategies.
Writing fewer lines of code tends to be better for a bunch of reasons. For example, the less code you have, the less chance for bugs. See for example Paul Graham's essay, "Succinctness is Power"
Notwithstanding that, the level reached by Code Golf is usually far beyond what makes sense. In Code Golf, people are trying to write code that is as short as possible, even if they know that it's less readable.
Efficiency is a much harder thing to decide. I'm guessing that less code is usually more efficient, but there are many cases where this isn't true.
So to answer the real question, why do we even have Code Golf competitions which aim at a low character count, if that's not a very important thing?
Two reasons:
Making code as short as possible means you have to be both clever, and know a language pretty well to find all kinds of tricks. This makes it a fun riddle.
Also, it's the easiest measure to use for a code competition. Efficiency, for example, is very hard to measure, especially using many different languages, especially since some solutions are more efficient in some cases, but less in others (big input vs small). Readability: that's a very personal thing, which often leads to heated debates.
In short, I don't think there is any way of doing Code Golf style competitions without using "shortness of code" as the criterion.
This is from "10 Commandments for Java Developers"
Keep in Mind - "Less is more" is not always better. - Code efficiency is a great thing, but > in many situations writing less lines of code does not improve the efficiency of that code.
This is (probably) true for all programming languages (though in assembly it could be different).
It makes a difference if you're talking about little academic-style algorithms or real software, which can be thousands of lines of code. I'm talking about the latter.
Here's an example where a reasonably well-written program was speeded up by a factor of 43x, and it's code size was reduced by 4x.
"Code golf" is just squeezing code, like cramming undergraduates into a phone booth. I'm talking about reducing code by rewriting it in a form that is declarative, like a domain-specific-language (DSL). Since it is declarative, it maps more directly onto its requirements, so it is not puffed up with code that exists only for implementation's sake. That link shows an example of doing that.
This link shows a way of reducing size of UI code in a similar way.
Good performance is achieved by avoiding doing things that don't really have to be done. Of course, when you write code, you're not intentionally making it do unnecessary work, but if you do aggressive performance tuning as in that example, you'd be amazed at what you can remove.
The point of code golf is to optimise for one thing (source length), at the potential expense of everything else (performance, comprehensibility, robustness). If you accidentally improve performance that's a fluke - if you could shave a character off by doubling the runtime, then you would.
You ask "how come then we don't focus more on performance rather than size", but the question is based on a false premise that programmers focus more on code size than on performance. They don't, "code golf" is a minority interest. It's challenging and fun, but it's not important. Look at the number of questions tagged "code-golf" against the number tagged "performance".
As other people point out, making code shorter often means making it simpler to understand, by removing duplication and opportunities for obscure errors. That's usually more important than running speed. But code golf is a completely different thing, where you remove whitespace, comments, descriptive names, etc. The purpose isn't to make the code more comprehensible.

When should I break a function?

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?
I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.
I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".
One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.
Never write a function that, when printed on fanfold paper, is taller than you are.
I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)
I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!
The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.
Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.
It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.
personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother
The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...
If there is independent code (in your case specifics for each message type) those areas should be broken out.
Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.
There are many reasons to break a long function into its constituent pieces. Most important is:
readability
maintainability
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.
If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.

Resources