Difference between two types of iterations - coding-style

This is a core question
please don't say with regard to syntax or semantics,
the question is that what is the actual difference between
WHILE loop and FOR loop, everything written in for loop can be done
with while loop then why two loops?
This is asked in a seminar at the university of Cambridge.
so i think we have to explain in terms of performance overheads and WC complexity.
I think we have to go in terms of Floyd-Hoare logic

As far as performance overheads go, that will depend on the compiler and language you're using.
The major difference between them is that for loops are easier to read when iterating over a collection of something, and while loops are easier to read when a specific condition or boolean flag is being evaluated.

The only difference between the two looping constructs is the ability to do pre-loop initialization and post-loop changes in the header of the for loop. There is no performance difference between
while (condition) {
...
}
and
for ( ; condition ; ) {
...
}
constructs. The alternative is added for convenience and readability, there are no other implications.

The difference is that a for() loop condenses a number of steps into a single line. Ultimately, all answers will boil down to syntax or semantics in this way, so there can't be an answer that really pleases you.
A while() loop only states the condition under which the loop will terminate, while the for() loop can stipulate a variable to measure by as well as a counter in addition to this.
(Actually, for() loops are very versatile and a lot can happen in their declarations to make them pretty fancy, but the above statement sums up almost all of the for() loops you will ever see, even in a production environment.)

One occasionally relevant difference between the two is that a while loop can have a more complex condition imposed on it than a for loop
The FOR loop is best exployed when looping through a collection of predictable size
This I guess would be considered syntactical, but I consider it to refute the concept that the for loop can do anything the while loop can, though the reverse is indeed correct

There is obviously no need for two loops, you need only one. And indeed, many textbooks consider a language (usually named Imp) which desugars for to while with an initialization statement before the while.
That being said, if you try working out the difference in the loop invariants, and associated rules in Hoare logic for these two loops, they differ just a bit, because of the init block. Since this looks like homework, I'll let you work out the details.
So the answer is: "It makes things just a little easier, on the syntax and proof side, but it's merely cosmetic, for all intents an purposes you really only need one..."
So far none of the other answers point out what I think the really important part of the distinction is for your question: which is probably that they change things a little bit with the Hoare connectives...
Also, note that speculating on performance based on Hoare logic is silly. If you care about the semantics, your professor probably doesn't give a damn or want to hear about performance :-)

everything can be done with if-goto statement. so why any loops at all? it's a syntax sugar to make developer's life easier and to allow writing better structured (smaller and more readable) code. it has nothing to do with performance

Related

Why should a function have only one exit-point? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I've always heard about a single exit-point function as a bad way to code because you lose readability and efficiency. I've never heard anybody argue the other side.
I thought this had something to do with CS but this question was shot down at cstheory stackexchange.
There are different schools of thought, and it largely comes down to personal preference.
One is that it is less confusing if there is only a single exit point - you have a single path through the method and you know where to look for the exit. On the minus side if you use indentation to represent nesting, your code ends up massively indented to the right, and it becomes very difficult to follow all the nested scopes.
Another is that you can check preconditions and exit early at the start of a method, so that you know in the body of the method that certain conditions are true, without the entire body of the method being indented 5 miles off to the right. This usually minimises the number of scopes you have to worry about, which makes code much easier to follow.
A third is that you can exit anywhere you please. This used to be more confusing in the old days, but now that we have syntax-colouring editors and compilers that detect unreachable code, it's a lot easier to deal with.
I'm squarely in the middle camp. Enforcing a single exit point is a pointless or even counterproductive restriction IMHO, while exiting at random all over a method can sometimes lead to messy difficult to follow logic, where it becomes difficult to see if a given bit of code will or won't be executed. But "gating" your method makes it possible to significantly simplify the body of the method.
My general recommendation is that return statements should, when practical, either be located before the first code that has any side-effects, or after the last code that has any side-effects. I would consider something like:
if (!argument) // Check if non-null
return ERR_NULL_ARGUMENT;
... process non-null argument
if (ok)
return 0;
else
return ERR_NOT_OK;
clearer than:
int return_value;
if (argument) // Non-null
{
.. process non-null argument
.. set result appropriately
}
else
result = ERR_NULL_ARGUMENT;
return result;
If a certain condition should prevent a function from doing anything, I prefer to early-return out of the function at a spot above the point where the function would do anything. Once the function has undertaken actions with side-effects, though, I prefer to return from the bottom, to make clear that all side-effects must be dealt with.
With most anything, it comes down to the needs of the deliverable. In "the old days", spaghetti code with multiple return points invited memory leaks, since coders that preferred that method typically did not clean up well. There were also issues with some compilers "losing" the reference to the return variable as the stack was popped during the return, in the case of returning from a nested scope. The more general problem was one of re-entrant code, which attempts to have the calling state of a function be exactly the same as its return state. Mutators of oop violated this and the concept was shelved.
There are deliverables, most notably kernels, which need the speed that multiple exit points provide. These environments normally have their own memory and process management, so the risk of a leak is minimized.
Personally, I like to have a single point of exit, since I often use it to insert a breakpoint on the return statement and perform a code inspect of how the code determined that solution. I could just go to the entrance and step through, which I do with extensively nested and recursive solutions. As a code reviewer, multiple returns in a function requires a much deeper analysis - so if you're doing it to speed up the implementation, you're robbing Peter to save Paul. More time will be required in code reviews, invalidating the presumption of efficient implementation.
-- 2 cents
Please see this doc for more details: NISTIR 5459
Single entry and exit point was original concept of structured programming vs step by step Spaghetti Coding. There is a belief that multiple exit-point functions require more code since you have to do proper clean up of memory spaces allocated for variables. Consider a scenario where function allocates variables (resources) and getting out of the function early and without proper clean up would result in resource leaks. In addition, constructing clean-up before every exit would create a lot of redundant code.
I used to be an advocate of single-exit style. My reasoning came mostly from pain...
Single-exit is easier to debug.
Given the techniques and tools we have today, this is a far less reasonable position to take as unit tests and logging can make single-exit unnecessary. That said, when you need to watch code execute in a debugger, it was much harder to understand and work with code containing multiple exit points.
This became especially true when you needed to interject assignments in order to examine state (replaced with watch expressions in modern debuggers). It was also too easy to alter the control flow in ways that hid the problem or broke the execution altogether.
Single-exit methods were easier to step through in the debugger, and easier to tease apart without breaking the logic.
In my view, the advice to exit a function (or other control structure) at only one point often is oversold. Two reasons typically are given to exit at only one point:
Single-exit code is supposedly easier to read and debug. (I admit that I don't think much of this reason, but it is given. What is substantially easier to read and debug is single-entry code.)
Single-exit code links and returns more cleanly.
The second reason is subtle and has some merit, especially if the function returns a large data structure. However, I wouldn't worry about it too much, except ...
If a student, you want to earn top marks in your class. Do what the instructor prefers. He probably has a good reason from his perspective; so, at the very least, you'll learn his perspective. This has value in itself.
Good luck.
The answer is very context dependent. If you are making a GUI and have a function which initialises API's and opens windows at the start of your main it will be full of calls which may throw errors, each of which would cause the instance of the program to close. If you used nested IF statements and indent your code could quickly become very skewed to the right. Returning on an error at each stage might be better and actually more readable while being just as easy to debug with a few flags in the code.
If, however, you are testing different conditions and returning different values depending on the results in your method it may be much better practice to have a single exit point. I used to work on image processing scripts in MATLAB which could get very large. Multiple exit points could make the code extremely hard to follow. Switch statements were much more appropriate.
The best thing to do would be to learn as you go. If you are writing code for something try finding other people's code and seeing how they implement it. Decide which bits you like and which bits you don't.
If you feel like you need multiple exit points in a function, the function is too large and is doing too much.
I would recommend reading the chapter about functions in Robert C. Martin's book, Clean Code.
Essentially, you should try to write functions with 4 lines of code or less.
Some notes from Mike Long’s Blog:
The first rule of functions: they should be small
The second rule of functions: they should be smaller than that
Blocks within if statements, while statements, for loops, etc should be one line long
…and that line of code will usually be a function call
There should be no more than one or maybe two levels of indentation
Functions should do one thing
Function statements should all be at the same level of abstraction
A function should have no more than 3 arguments
Output arguments are a code smell
Passing a boolean flag into a function is truly awful. You are by definition doing two --things in the function.
Side effects are lies.

Code structure: should I use lots of functions to increase readability?

My question has Bash and PowerShell scripts in mind, but I suppose it applies to other languages as well.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times. This decreases the amount of code in the script and it also makes it easier to maintain.
With that in mind, if you discover that your script only calls a function one time then there's no reason for that function to exist as a function. Instead, you should take the function's code and place it in the location where that function is being called.
Having said all that, here's my question:
If I have a complicated script, should I move each section of code into its own function even though each function will only be called once? This would greatly increase the script's readability because its logic (the functions) would all be at the top of the script and the flow of execution would be at the bottom of the script. Since 50 lines of code would be represented by just 1 line, it would be much easier to understand what the script is doing.
Do other people do this? Are there disadvantages to this approach?
Having functions also increases readability. So a bash script might look better and be easier to follow if it reads:
getParams()
startTask()
doSomethingElse()
finishTask()
# implement functions below
even if the function implementations are simple, it reads better.
Code readability is indeed a major concern, usually (nowadays) more important than sheer amount of code or performance. Not to mention that inlining function calls may not necessarily have noticeable performance benefits (very language specific).
So lots of developers (I venture to say that the better of the breed :-) create small functions/methods like you describe, to partition their code into logically cohesive parts.
A function does a well-defined task. If you have a mega function that does 5 different things, it strongly suggests it should be calling 5 smaller functions.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times.
Well, it is my understanding that a function is a discrete entity that performs a specific, well defined task.
With that in mind, if you discover that your script calls a given function AT LEAST ONCE, then it's doing its job.
Focus on being able to read and easily understand your code.
Having clear, readable code is definitely more a payoff than being afraid of function calls overhead. That's just a premature optimisation.
Plus, the goal of a function is to accomplish a particular task. A task can be a sub-task, there's nothing wrong with that!
Read this book
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Here are some quotes from the book.
"Small!
The first rule of functions is that they should be small. The second rule of functions is that
they should be smaller than that."
"FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL.THEY SHOULD DO IT ONLY."
As far as my knowledge is concerned, a function represents a Sequence of steps which become a part of larger program.
Coming to your question, I strongly agree that function(s) improve readability and re-usability. But at the same time breaking every thing into pieces might not be a good practice.
Finally, I want to give one statement : "Anything In Excess Is Not Beneficial!"

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Are there any cases where LINQ's .Where() will be faster than O(N)?

Think the title describes my thoughts pretty well :)
I've seen a lot of people lately that swear to LINQ, and while I also believe it's awesome, I also think you shouldn't be confused about the fact that on most (all?) IEnumerable types, it's performance is not that great. Am I wrong in thinking this? Especially queries where you nest Where()'s on large datasets?
Sorry if this question is a bit vague, I just want to confirm my thoughts in that you should be "cautious" when using LINQ.
[EDIT] Just to clarify - I'm thinking in terms of Linq to Objects here :)
It depends on the provider. For Linq to Objects, it's going to be O(n), but for Linq to SQL or Entities it might ultimately use indices to beat that. For Objects, if you need the functionality of Where, you're probably going to need O(n) anyway. Linq will almost certainly have a bigger constant, largely due to the function calls.
It depends on how you are using it and to what you compare.
I've seen many implementations using foreaches which would have been much faster with linq. Eg. because they forget to break or because they return too many items. The trick is that the lambda expressions are executed when the item is actually used. When you have First at the end, it could end up it just one single call.
So when you chain Wheres, if an item does not pass the first condition, it will also not be tested for the second condition. it's similar to the && operator, which does not evaluate the condition on the right side if the first is not met.
You could say it's always O(N). But N is not the number of items in the source, but the minimal number of items required to create the target set. That's a pretty good optimization IMHO.
Here's a project that promises to introduce indexing for LINQ2Objects. This should deliver better asymptotic behavior: http://i4o.codeplex.com/

When should I break a function?

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?
I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.
I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".
One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.
Never write a function that, when printed on fanfold paper, is taller than you are.
I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)
I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!
The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.
Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.
It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.
personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother
The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...
If there is independent code (in your case specifics for each message type) those areas should be broken out.
Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.
There are many reasons to break a long function into its constituent pieces. Most important is:
readability
maintainability
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.
If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.

Resources