Why should a function have only one exit-point? [closed] - coding-style

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I've always heard about a single exit-point function as a bad way to code because you lose readability and efficiency. I've never heard anybody argue the other side.
I thought this had something to do with CS but this question was shot down at cstheory stackexchange.

There are different schools of thought, and it largely comes down to personal preference.
One is that it is less confusing if there is only a single exit point - you have a single path through the method and you know where to look for the exit. On the minus side if you use indentation to represent nesting, your code ends up massively indented to the right, and it becomes very difficult to follow all the nested scopes.
Another is that you can check preconditions and exit early at the start of a method, so that you know in the body of the method that certain conditions are true, without the entire body of the method being indented 5 miles off to the right. This usually minimises the number of scopes you have to worry about, which makes code much easier to follow.
A third is that you can exit anywhere you please. This used to be more confusing in the old days, but now that we have syntax-colouring editors and compilers that detect unreachable code, it's a lot easier to deal with.
I'm squarely in the middle camp. Enforcing a single exit point is a pointless or even counterproductive restriction IMHO, while exiting at random all over a method can sometimes lead to messy difficult to follow logic, where it becomes difficult to see if a given bit of code will or won't be executed. But "gating" your method makes it possible to significantly simplify the body of the method.

My general recommendation is that return statements should, when practical, either be located before the first code that has any side-effects, or after the last code that has any side-effects. I would consider something like:
if (!argument) // Check if non-null
return ERR_NULL_ARGUMENT;
... process non-null argument
if (ok)
return 0;
else
return ERR_NOT_OK;
clearer than:
int return_value;
if (argument) // Non-null
{
.. process non-null argument
.. set result appropriately
}
else
result = ERR_NULL_ARGUMENT;
return result;
If a certain condition should prevent a function from doing anything, I prefer to early-return out of the function at a spot above the point where the function would do anything. Once the function has undertaken actions with side-effects, though, I prefer to return from the bottom, to make clear that all side-effects must be dealt with.

With most anything, it comes down to the needs of the deliverable. In "the old days", spaghetti code with multiple return points invited memory leaks, since coders that preferred that method typically did not clean up well. There were also issues with some compilers "losing" the reference to the return variable as the stack was popped during the return, in the case of returning from a nested scope. The more general problem was one of re-entrant code, which attempts to have the calling state of a function be exactly the same as its return state. Mutators of oop violated this and the concept was shelved.
There are deliverables, most notably kernels, which need the speed that multiple exit points provide. These environments normally have their own memory and process management, so the risk of a leak is minimized.
Personally, I like to have a single point of exit, since I often use it to insert a breakpoint on the return statement and perform a code inspect of how the code determined that solution. I could just go to the entrance and step through, which I do with extensively nested and recursive solutions. As a code reviewer, multiple returns in a function requires a much deeper analysis - so if you're doing it to speed up the implementation, you're robbing Peter to save Paul. More time will be required in code reviews, invalidating the presumption of efficient implementation.
-- 2 cents
Please see this doc for more details: NISTIR 5459

Single entry and exit point was original concept of structured programming vs step by step Spaghetti Coding. There is a belief that multiple exit-point functions require more code since you have to do proper clean up of memory spaces allocated for variables. Consider a scenario where function allocates variables (resources) and getting out of the function early and without proper clean up would result in resource leaks. In addition, constructing clean-up before every exit would create a lot of redundant code.

I used to be an advocate of single-exit style. My reasoning came mostly from pain...
Single-exit is easier to debug.
Given the techniques and tools we have today, this is a far less reasonable position to take as unit tests and logging can make single-exit unnecessary. That said, when you need to watch code execute in a debugger, it was much harder to understand and work with code containing multiple exit points.
This became especially true when you needed to interject assignments in order to examine state (replaced with watch expressions in modern debuggers). It was also too easy to alter the control flow in ways that hid the problem or broke the execution altogether.
Single-exit methods were easier to step through in the debugger, and easier to tease apart without breaking the logic.

In my view, the advice to exit a function (or other control structure) at only one point often is oversold. Two reasons typically are given to exit at only one point:
Single-exit code is supposedly easier to read and debug. (I admit that I don't think much of this reason, but it is given. What is substantially easier to read and debug is single-entry code.)
Single-exit code links and returns more cleanly.
The second reason is subtle and has some merit, especially if the function returns a large data structure. However, I wouldn't worry about it too much, except ...
If a student, you want to earn top marks in your class. Do what the instructor prefers. He probably has a good reason from his perspective; so, at the very least, you'll learn his perspective. This has value in itself.
Good luck.

The answer is very context dependent. If you are making a GUI and have a function which initialises API's and opens windows at the start of your main it will be full of calls which may throw errors, each of which would cause the instance of the program to close. If you used nested IF statements and indent your code could quickly become very skewed to the right. Returning on an error at each stage might be better and actually more readable while being just as easy to debug with a few flags in the code.
If, however, you are testing different conditions and returning different values depending on the results in your method it may be much better practice to have a single exit point. I used to work on image processing scripts in MATLAB which could get very large. Multiple exit points could make the code extremely hard to follow. Switch statements were much more appropriate.
The best thing to do would be to learn as you go. If you are writing code for something try finding other people's code and seeing how they implement it. Decide which bits you like and which bits you don't.

If you feel like you need multiple exit points in a function, the function is too large and is doing too much.
I would recommend reading the chapter about functions in Robert C. Martin's book, Clean Code.
Essentially, you should try to write functions with 4 lines of code or less.
Some notes from Mike Long’s Blog:
The first rule of functions: they should be small
The second rule of functions: they should be smaller than that
Blocks within if statements, while statements, for loops, etc should be one line long
…and that line of code will usually be a function call
There should be no more than one or maybe two levels of indentation
Functions should do one thing
Function statements should all be at the same level of abstraction
A function should have no more than 3 arguments
Output arguments are a code smell
Passing a boolean flag into a function is truly awful. You are by definition doing two --things in the function.
Side effects are lies.

Related

Code structure: should I use lots of functions to increase readability?

My question has Bash and PowerShell scripts in mind, but I suppose it applies to other languages as well.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times. This decreases the amount of code in the script and it also makes it easier to maintain.
With that in mind, if you discover that your script only calls a function one time then there's no reason for that function to exist as a function. Instead, you should take the function's code and place it in the location where that function is being called.
Having said all that, here's my question:
If I have a complicated script, should I move each section of code into its own function even though each function will only be called once? This would greatly increase the script's readability because its logic (the functions) would all be at the top of the script and the flow of execution would be at the bottom of the script. Since 50 lines of code would be represented by just 1 line, it would be much easier to understand what the script is doing.
Do other people do this? Are there disadvantages to this approach?
Having functions also increases readability. So a bash script might look better and be easier to follow if it reads:
getParams()
startTask()
doSomethingElse()
finishTask()
# implement functions below
even if the function implementations are simple, it reads better.
Code readability is indeed a major concern, usually (nowadays) more important than sheer amount of code or performance. Not to mention that inlining function calls may not necessarily have noticeable performance benefits (very language specific).
So lots of developers (I venture to say that the better of the breed :-) create small functions/methods like you describe, to partition their code into logically cohesive parts.
A function does a well-defined task. If you have a mega function that does 5 different things, it strongly suggests it should be calling 5 smaller functions.
It is my understanding that the purpose of a function is to perform the same (or a very similar) task multiple times.
Well, it is my understanding that a function is a discrete entity that performs a specific, well defined task.
With that in mind, if you discover that your script calls a given function AT LEAST ONCE, then it's doing its job.
Focus on being able to read and easily understand your code.
Having clear, readable code is definitely more a payoff than being afraid of function calls overhead. That's just a premature optimisation.
Plus, the goal of a function is to accomplish a particular task. A task can be a sub-task, there's nothing wrong with that!
Read this book
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Here are some quotes from the book.
"Small!
The first rule of functions is that they should be small. The second rule of functions is that
they should be smaller than that."
"FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL.THEY SHOULD DO IT ONLY."
As far as my knowledge is concerned, a function represents a Sequence of steps which become a part of larger program.
Coming to your question, I strongly agree that function(s) improve readability and re-usability. But at the same time breaking every thing into pieces might not be a good practice.
Finally, I want to give one statement : "Anything In Excess Is Not Beneficial!"

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Why is determining if a function is pure difficult?

I was at the StackOverflow Dev Days convention yesterday, and one of the speakers was talking about Python. He showed a Memoize function, and I asked if there was any way to keep it from being used on a non-pure function. He said no, that's basically impossible, and if someone could figure out a way to do it it would make a great PhD thesis.
That sort of confused me, because it doesn't seem all that difficult for a compiler/interpreter to solve recursively. In pseudocode:
function isPure(functionMetadata): boolean;
begin
result = true;
for each variable in functionMetadata.variablesModified
result = result and variable.isLocalToThisFunction;
for each dependency in functionMetadata.functionsCalled
result = result and isPure(dependency);
end;
That's the basic idea. Obviously you'd need some sort of check to prevent infinite recursion on mutually-dependent functions, but that's not too difficult to set up.
Higher-order functions that take function pointers might be problematic, since they can't be verified statically, but my original question presupposes that the compiler has some sort of language constraint to designate that only a pure function pointer can be passed to a certain parameter. If one existed, that could be used to satisfy the condition.
Obviously this would be easier in a compiled language than an interpreted one, since all this number-crunching would be done before the program is executed and so not slow anything down, but I don't really see any fundamental problems that would make it impossible to evaluate.
Does anyone with a bit more knowledge in this area know what I'm missing?
You also need to annotate every system call, every FFI, ...
And furthermore the tiniest 'leak' tends to leak into the whole code base.
It is not a theoretically intractable problem, but in practice it is very very difficult to do in a fashion that the whole system does not feel brittle.
As an aside, I don't think this makes a good PhD thesis; Haskell effectively already has (a version of) this, with the IO monad.
And I am sure lots of people continue to look at this 'in practice'. (wild speculation) In 20 years we may have this.
It is particularly hard in Python. Since anObject.aFunc can be changed arbitrarily at runtime, you cannot determine at compile time which function will anObject.aFunc() call or even if it will be a function at all.
In addition to the other excellent answers here: Your pseudocode looks only at whether a function modifies variables. But that's not really what "pure" means. "Pure" typically means something closer to "referentially transparent." In other words, the output is completely dependent on the input. So something as simple as reading the current time and making that a factor in the result (or reading from input, or reading the state of the machine, or...) makes the function non-pure without modifying any variables.
Also, you could write a "pure" function that did modify variables.
Here's the first thing that popped into my mind when I read your question.
Class Hierarchies
Determining if a variable is modified includes the act of digging through every single method which is called on the variable to determine if it's mutating. This is ... somewhat straight forward for a sealed type with a non-virtual method.
But consider virtual methods. You must find every single derived type and verify that every single override of that method does not mutate state. Determining this is simply not possible in any language / framework which allows for dynamic code generation or is simply dynamic (if it's possible, it's extremely difficult). The reason why is that the set of derived types is not fixed because a new one can be generated at runtime.
Take C# as an example. There is nothing stopping me from generating a derived class at runtime which overrides that virtual method and modifies state. A static verified would not be able to detect this type of modification and hence could not validate the method was pure or not.
I think the main problem would be doing it efficiently.
D-language has pure functions but you have to specify them yourself, so the compiler would know to check them. I think if you manually specify them then it would be easier to do.
Deciding whether a given function is pure, in general, is reducible to deciding whether any given program will halt - and it is well known that the Halting Problem is the kind of problem that cannot be solved efficiently.
Note that the complexity depends on the language, too. For the more dynamic languages, it's possible to redefine anything at any time. For example, in Tcl
proc myproc {a b} {
if { $a > $b } {
return $a
} else {
return $b
}
}
Every single piece of that could be modified at any time. For example:
the "if" command could be rewritten to use and update global variables
the "return" command, along the same lines, could do the same thing
the could be an execution trace on the if command that, when "if" is used, the return command is redefined based on the inputs to the if command
Admittedly, Tcl is an extreme case; one of the most dynamic languages there is. That being said, it highlights the problem that it can be difficult to determine the purity of a function even once you've entered it.

Can this kernel function be more readable? (Ideas needed for an academic research!)

Following my previous question regarding the rationale behind extremely long functions, I would like to present a specific question regarding a piece of code I am studying for my research. It's a function from the Linux Kernel which is quite long (412 lines) and complicated (an MCC index of 133). Basically, it's a long and nested switch statement
Frankly, I can't think of any way to improve this mess. A dispatch table seems both huge and inefficient, and any subroutine call would require an inconceivable number of arguments in order to cover a large-enough segment of code.
Do you think of any way this function can be rewritten in a more readable way, without losing efficiency? If not, does the code seem readable to you?
Needless to say, any answer that will appear in my research will be given full credit - both here and in the submitted paper.
Link to the function in an online source browser
I don't think that function is a mess. I've had to write such a mess before.
That function is the translation into code of a table from a microprocessor manufacturer. It's very low-level stuff, copying the appropriate hardware registers for the particular interrupt or error reason. In this kind of code, you often can't touch registers which have not been filled in by the hardware - that can cause bus errors. This prevents the use of code that is more general (like copying all registers).
I did see what appeared to be some code duplication. However, at this level (operating at interrupt level), speed is more important. I wouldn't use Extract Method on the common code unless I knew that the extracted method would be inlined.
BTW, while you're in there (the kernel), be sure to capture the change history of this code. I have a suspicion that you'll find there have not been very many changes in here, since it's tied to hardware. The nature of the changes over time of this sort of code will be quite different from the nature of the changes experienced by most user-mode code.
This is the sort of thing that will change, for instance, when a new consolidated IO chip is implemented. In that case, the change is likely to be copy and paste and change the new copy, rather than to modify the existing code to accommodate the changed registers.
Utterly horrible, IMHO. The obvious first-order fix is to make each case in the switch a call to a function. And before anyone starts mumbling about efficiency, let me just say one word - "inlining".
Edit: Is this code part of the Linux FPU emulator by any chance? If so this is very old code that was a hack to get linux to work on Intel chips like the 386 which didn't have an FPU. If it is, it's probably not a suitable study for academics, except for historians!
There's a kind of regularity here, I suspect that for a domain expert this actually feels very coherent.
Also having the variations in close proximty allows immediate visual inspection.
I don't see a need to refactor this code.
I'd start by defining constants for the various classes. Coming into this code cold, it's a mystery what the switching is for; if the switching was against named constants, I'd have a starting point.
Update: You can get rid of about 70 lines where the cases return MAJOR_0C_EXCP; simply let them fall through to the end of the routine. Since this is kernel code I'll mention that there might be some performance issues with that, particularly if the case order has already been optimized, but it would at least reduce the amount of code you need to deal with.
I don't know much about kernels or about how re-factoring them might work.
The main thing that comes to my mind is taking that switch statement and breaking each sub step in to a separate function with a name that describes what the section is doing. Basically, more descriptive names.
But, I don't think this optimizes the function any more. It just breaks it in to smaller functions of which might be helpful... I don't know.
That is my 2 cents.

When should I break a function?

Its prudent to break a long function into a chief function and helper functions.
I know that the outside the module only chief function will be called, but its long length may prove to be intimidating.
Textbooks put a limit on the number of lines, but I feel that this is too rigid.
P.S. I am programming in Python and need to process incoming, messages. The function returns a tuple containing the message but in Python's internal data types.
So you can see somewhat independent code for each message type.
Duplicate Question
When is a function too long?
I think you need to go about this from the other end of the problem. Think bottom-up. Identify small units of work, as small as possible, and start composing your code that way. You will only run into spaghetti-code issues when you code top-down and don't keep a structured approach.
If you already have spaghetti code and need to refactor, you pretty much have to start over. It is probably more work to break up existing spaghetti code than to rewrite it, and the result may not be as good.
I don't think there should be a hard number for the lines of code in a method either, but well written code does not have methods with more than 5 to 10 lines in the lower layers, and 20 to 30 lines in the business logic. To give you some kind of metric.
I'm not a big fan of breaking a function into multiple functions unnecessarily. It's not a hard and fast thing - if there are things that seem like distinct logical units, then by all means, break those out and think about them separately. But don't just break things out for the sake of some guideline like "one page per function" or "N lines per function".
One good rule of thumb is that if it doesn't fit on a single screen it is worth thinking about splitting it up. But only if it makes sense to split it up, some long functions are perfectly readable and it doesn't make any sense to slavishly split them into multiple functions just for the sake of it.
Never write a function that, when printed on fanfold paper, is taller than you are.
I like the rule of thumb that you should break out the subfunction if you can think of a good domain-relevant name for it.
When someone can understand the top-level function without necessarily having to look up the definition of the sub-function, you've likely made a net gain. (But when you break it down too far, your names start referring to your implementation artifacts rather than the domain)
I was recently discussing this with a friend. He suggested refactoring to separate concerns and I must say I have to agree. That is, one function should do one thing, if it does more than one thing, split it up. If not, let it be together, it makes no sense to split up a function, only to have it obfuscate the meaning. After all, a function is a block of code that does one thing!
The limit in term of number of lines is often impractical becuase it doesn't account for readability well. It's better to try to seperate groups of lines of code that have just a few inputs and just a few outputs and make this a separate functon. It's not always possible - then it's often wise to just leave the code as it is and not to refactor for the sake of refactoring.
Well since I am coding in Python so I have the liberty to write functions inside functions, unlike C, C++ or Java. This i feel is a better choice.
It's not specified. But line should be as low as possible. But you may follow the Role of 30. I follow this in my PHP scripts when needed.
Rule of 30:
“Rule of 30” in Refactoring in Large Software Projects by Martin Lippert and Stephen Roock:
Methods should not have more than an average of 30 code lines.
A class should contain an average of less than 30 methods.
A package/library shouldn’t contain more than 30 classes.
Subsystems should avoid more than 30 packages.
A system more than 30 subsystems may create problem.
If an element consists of more than 30 subelements, it is highly probable that there is a serious problem.
personally I break a function if it either saves total lines or total processing time.
if I only run the helper once per chief function I don't bother
The point is that in principal it's better to have specialiced functions. But where one sets the limit depends very much on
1) the "usual" programming style in certain languages. (one can observe that, object-oriented langauges tend to shorter procedureds than let's say C or the like
2) it depends on your way of programming. Every hard limit must be questioned. IMHO. Overall there will probably some "natural" distribution of programs
3) I think what one should keep on one's mind is that a function should do a certain task take for example some function for parsing it is usually much longer than a function just settin some field in a structure. Or getting back just consider how a event loop in the Windows API may look. So that all suggests that there may be good reasons for long methods...
If there is independent code (in your case specifics for each message type) those areas should be broken out.
Size matters not. Judge me by my size do you? - Yoda
Your main concerns are readability, simplicity and maintainability. A good indicator is if you need to write comments to explain a section of a function then that section is a good candidate for a separate function.
There are many reasons to break a long function into its constituent pieces. Most important is:
readability
maintainability
code clarity/intent
Some functions simple cannot be broken into smaller pieces without negatively impacting the listed goals, so there is no hard-and-fast rule.
If you didn't write it and it's already in production: NEVER!!! If you break it up, you're likely to break it, it's that simple.
If you are writing it and you're not sure, the on screen rule apples as others have said.

Resources