How to keep only one return statement in a function? - refactoring

Despite the discussion here, Should a function have only one return statement?, are there some simple tips or method to keep only one return statement? Or how to refactor a multiple return statement to a only one return statement?

First of all it is not clear why you want to do this as in the thread it is already mentioned that this feature was added in modern languages due to some reasons and to allow the user to get more functionality.
The wiki says a lot on it:
Some people make sure each function has a single entry, single exit
(SESE). These people argue that The use of a return statement violates
structured programming: it is an unstructured exit from the function,
resulting in multiple exit points, rather than the single exit point
required by structured programming. It has thus been argued[5] that
one should eschew the use of the explicit return statement except at
the textual end of a subroutine, considering that, when it is used to
"return early", it may suffer from the same sort of problems that
arise for the GOTO statement. Conversely, it can be argued that using
the return statement is worthwhile when the alternative is more
convoluted code, such as deeper nesting, harming readability.
Other people say that one or more "guard clauses" -- conditional
"early exit" return statements near the beginning of a function --
often make a function easier to read than the alternative.[6][7][8][9]
The most common problem in early exit is that cleanup or final
statements are not executed – for example, allocated memory is not
unallocated, or open files are not closed, causing leaks. These must
be done at each return site, which is brittle and can easily result in
bugs. For instance, in later development, a return statement could be
overlooked by a developer, and an action which should be performed at
the end of a subroutine (e.g., a trace statement) might not be
performed in all cases. Languages without a return statement, such as
standard Pascal don't have this problem. Some languages, such as C++
and Python, employ concepts which allow actions to be performed
automatically upon return (or exception throw) which mitigates some of
these issues – these are often known as "try/finally" or similar.
Ironically, functionality like these "finally" clauses can be
implemented by a goto to the single return point of the subroutine. An
alternative solution is to use the normal stack unwinding (variable
deallocation) at function exit to unallocate resources, such as via
destructors on local variables, or similar mechanisms such as Python's
"with" statement.
But even then if you want to achieve the Single entry and Single exit functionality then the best way to do is to get invalid cases out of the way first, either simply exiting or raising exceptions as appropriate, put a blank line in there, then add the "real" body of the method.
Also check Where did the notion of “one return only” come from?

Related

How is Ruby's throw-catch implemented?

In ruby you can throw :label so long as you've wrapped everything in a catch(:label) do block.
I want to add this to a custom lispy language but I'm not sure how it's implemented under the hood. Any pointers?
This is an example of a non-local exit. If you are using your host language's (in this case C) call stack for function calls in your target language (so e.g. a function call in your lisp equates to a function call in C), then the easiest way is to use your host language's form of non-local exit. In C that means setjmp/longjmp.
If, however, you are maintaining your target language's call stack separately then you have many options for how to do this. One really simple way would be to have each lexical-scope exit yield two values; the actual value returned, and an exception state, if any. Then you can check for the exception at runtime and propagate this value up. This has the downside of incurring extra cost to function calls when no condition is signaled, but may be sufficient for a toy language.
The book "Lisp In Small Pieces" covers about a half-dozen ways of handling this, if you're interested.

Is checking Perl function arguments worth it?

There's a lot of buzz about MooseX::Method::Signatures and even before that, modules such as Params::Validate that are designed to type check every argument to methods or functions. I'm considering using the former for all my future Perl code, both personal and at my place of work. But I'm not sure if it's worth the effort.
I'm thinking of all the Perl code I've seen (and written) before that performs no such checking. I very rarely see a module do this:
my ($a, $b) = #_;
defined $a or croak '$a must be defined!';
!ref $a or croak '$a must be a scalar!";
...
#_ == 2 or croak "Too many arguments!";
Perhaps because it's simply too much work without some kind of helper module, but perhaps because in practice we don't send excess arguments to functions, and we don't send arrayrefs to methods that expect scalars - or if we do, we have use warnings; and we quickly hear about it - a duck typing approach.
So is Perl type checking worth the performance hit, or are its strengths predominantly shown in compiled, strongly typed languages such as C or Java?
I'm interested in answers from anyone who has experience writing Perl that uses these modules and has seen benefits (or not) from their use; if your company/project has any policies relating to type checking; and any problems with type checking and performance.
UPDATE: I read an interesting article on the subject recently, called Strong Testing vs. Strong Typing. Ignoring the slight Python bias, it essentially states that type checking can be suffocating in some instances, and even if your program passes the type checks, it's no guarantee of correctness - proper tests are the only way to be sure.
If it's important for you to check that an argument is exactly what you need, it's worth it. Performance only matters when you already have correct functioning. It doesn't matter how fast you can get a wrong answer or a core dump. :)
Now, that sounds like a stupid thing to say, but consider some cases where it isn't. Do I really care what's in #_ here?
sub looks_like_a_number { $_[0] !~ /\D/ }
sub is_a_dog { eval { $_[0]->DOES( 'Dog' ) } }
In those two examples, if the argument isn't what you expect, you are still going to get the right answer because the invalid arguments won't pass the tests. Some people see that as ugly, and I can see their point, but I also think the alternative is ugly. Who wins?
However, there are going to be times that you need guard conditions because your situation isn't so simple. The next thing you have to pass your data to might expect them to be within certain ranges or of certain types and don't fail elegantly.
When I think about guard conditions, I think through what could happen if the inputs are bad and how much I care about the failure. I have to judge that by the demands of each situation. I know that sucks as an answer, but I tend to like it better than a bondage-and-discipline approach where you have to go through all the mess even when it doesn't matter.
I dread Params::Validate because its code is often longer than my subroutine. The Moose stuff is very attractive, but you have to realize that it's a way for you to declare what you want and you still get what you could build by hand (you just don't have to see it or do it). The biggest thing I hate about Perl is the lack of optional method signatures, and that's one of the most attractive features in Perl 6 as well as Moose.
I basically concur with brian. How much you need to worry about your method's inputs depends heavily on how much you are concerned that a) someone will input bad data, and b) bad data will corrupt the purpose of the method. I would also add that there is a difference between external and internal methods. You need to be more diligent about public methods because you're making a promise to consumers of your class; conversely you can be less diligent about internal methods as you have greater (theoretical) control over the code that accesses it, and have only yourself to blame if things go wrong.
MooseX::Method::Signatures is an elegant solution to adding a simple declarative way to explain the parameters of a method. Method::Signatures::Simple and Params::Validate are nice but lack one of the features I find most appealing about Moose: the Type system. I have used MooseX::Declare and by extension MooseX::Method::Signatures for several projects and I find that the bar to writing the extra checks is so minimal it's almost seductive.
Yes its worth it - defensive programming is one of those things that are always worth it.
The counterargument I've seen presented to this is that checking parameters on every single function call is redundant and a waste of CPU time. This argument's supporters favor a model in which all incoming data is rigorously checked when it first enters the system, but internal methods have no parameter checks because they should only be called by code which will pass them data which has already passed the checks at the system's border, so it is assumed to still be valid.
In theory, I really like the sound of that, but I can also see how easily it can fall like a house of cards if someone uses the system (or the system needs to grow to allow use) in a way that was unforeseen when the initial validation border is established. All it takes is one external call to an internal function and all bets are off.
In practice, I'm using Moose at the moment and Moose doesn't really give you the option to bypass validation at the attribute level, plus MooseX::Declare handles and validates method parameters with less fuss than unrolling #_ by hand, so it's pretty much a moot point.
I want to mention two points here.
The first are the tests, the second the performance question.
1) Tests
You mentioned that tests can do a lot and that tests are the only way
to be sure that your code is correct. In general i would say this is
absolutly correct. But tests itself only solves one problem.
If you write a module you have two problems or lets say two different
people that uses your module.
You as a developer and a user that uses your module. Tests helps with the
first that your module is correct and do the right thing, but it didn't
help the user that just uses your module.
For the later, i have one example. i had written a module using Moose
and some other stuff, my code ended always in a Segmentation fault.
Then i began to debug my code and search for the problem. I spend around
4 hours of time to find the error. In the end the problem was that i have
used Moose with the Array Trait. I used the "map" function and i didn't
provide a subroutine function, just a string or something else.
Sure this was an absolutly stupid error of mine, but i spend a long time to
debug it. In the end just a checking of the input that the argument is
a subref would cost the developer 10 seconds of time, and would cost me
and propably other a lot of more time.
I also know of other examples. I had written a REST Client to an interface
completly OOP with Moose. In the end you always got back Objects, you
can change the attributes but sure it didn't call the REST API for
every change you did. Instead you change your values and in the end you
call a update() method that transfers the data, and change the values.
Now i had a user that then wrote:
$obj->update({ foo => 'bar' })
Sure i got an error back, that update() does not work. But sure it didn't
work, because the update() method didn't accept a hashref. It only does
a synchronisation of the actual state of the object with the online
service. The correct code would be.
$obj->foo('bar');
$obj->update();
The first thing works because i never did a checking of the arguments. And i don't throw an error if someone gives more arguments then i expect. The method just starts normal like.
sub update {
my ( $self ) = #_;
...
}
Sure all my tests absolutely works 100% fine. But handling these errors that
are not errors cost me time too. And it costs the user propably a lot
of more time.
So in the end. Yes, tests are the only correct way to ensure that your code
works correct. But that doesn't mean that type checking is meaningless.
Type checking is there to help all your non-developers (on your module)
to use your module correctly. And saves you and others time finding
dump errors.
2) Performance
The short: You don't care for performance until you care.
That means until your module works to slow, Performance is always fast
enough and you don't need to care for this. If your module really works
to slow you need further investigations. But for these investigions
you should use a profiler like Devel::NYTProf to look what is slow.
And i would say. In 99% slowliness is not because you do type
checking, it is more your algorithm. You do a lot of computation, calling
functions to often etc. Often it helps if you do completly other solutions
use another better algorithm, do caching or something else, and the
performance hit is not your type checking. But even if the checking is the
performance hit. Then just remove it where it matters.
There is no reason to leave the type checking where performance don't
matters. Do you think type checking does matter in a case like above?
Where i have written a REST Client? 99% of performance issues here are
the amount of request that goes to the webservice or the time for such an
request. Don't using type checking or MooseX::Declare etc. would propably
speed up absolutly nothing.
And even if you see performance disadvantages. Sometimes it is acceptable.
Because the speed doesn't matter or sometimes something gives you a greater
value. DBIx::Class is slower then pure SQL with DBI, but DBIx::Class
gives you a lot for these.
Params::Validate works great,but of course checking args slows things down. Tests are mandatory(at least in the code I write).
Yes it's absolutely worth it, because it will help during development, maintenance, debugging, etc.
If a developer accidentally sends the wrong parameters to a method, a useful error message will be generated, instead of the error being propagated down to somewhere else.
I'm using Moose extensively for a fairly large OO project I'm working on. Moose's strict type checking has saved my bacon on a few occassions. Most importantly it has helped avoid situations where "undef" values are incorrectly being passed to the method. In just those instances alone it saved me hours of debugging time..
The performance hit is definitely there, but it can be managed. 2 hours of using NYTProf helped me find a few Moose Attributes that I was grinding too hard and I just refactored my code and got 4x performance improvement.
Use type checking. Defensive coding is worth it.
Patrick.
Sometimes. I generally do it whenever I'm passing options via hash or hashref. In these cases it's very easy to misremember or misspell an option name, and checking with Params::Check can save a lot of troubleshooting time.
For example:
sub revise {
my ($file, $options) = #_;
my $tmpl = {
test_mode => { allow => [0,1], 'default' => 0 },
verbosity => { allow => qw/^\d+$/, 'default' => 1 },
force_update => { allow => [0,1], 'default' => 0 },
required_fields => { 'default' => [] },
create_backup => { allow => [0,1], 'default' => 1 },
};
my $args = check($tmpl, $options, 1)
or croak "Could not parse arguments: " . Params::Check::last_error();
...
}
Prior to adding these checks, I'd forget whether the names used underscores or hyphens, pass require_backup instead of create_backup, etc. And this is for code I wrote myself--if other people are going to use it, you should definitely do some sort of idiot-proofing. Params::Check makes it fairly easy to do type checking, allowed value checking, default values, required options, storing option values to other variables and more.

Return from a subroutine

I want to write a subroutine for working out what to do and then returning.
Before you jump on the "A subroutine that returns is a function LOL!" bandwagon, I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
Will vb (sorry I can't be more specific about what version- I'm writing formulas for an embedded system, our API docs are "it runs vb") let me do that or fall in a heap?
I want the return to be executed as it were in the function body calling the subroutine, as though I've got a preprocessor to do the substitution, because otherwise this codebase is going to get unwieldy really fast, and returning the return value of a function seems kludgy.
It's not. Tail-calls are a common practice that work just fine.
They are much preferable than having a function that cannot ever be called unless you want to return its value.
It sounds like you are asking whether C/C++ style macros can be implemented in VB, the answer is no. You could possibly fake it though by generating vbscript, and substituting the right things in.
Lambdas and delegates in VB.Net are not really the same thing as what you are asking for - if my interpretation is correct.

What is the most obfuscated code you've had to fix?

Most programmers will have had the experience of debugging/fixing someone else's code. Sometimes that "someone else's code" is so obfuscated it's bad enough trying to understand what it's doing.
What's the worst (most obfuscated) code you've had to debug/fix?
If you didn't throw it away and recode it from scratch, well why didn't you?
PHP OSCommerce is enough to say, it is obfuscated code...
a Java class
only static methods that manipulates DOM
8000 LOCs
long chain of methods that return null on "error": a.b().c().d().e()
very long methods (400/500 LOC each)
nested if, while, like:
if (...) {
for (...) {
if (...) {
if (...) {
while (...) {
if (...) {
cut-and-paste oriented programming
no exceptions, all exceptions are catched and "handled" using printStackTrace()
no unit tests
no documentation
I was tempted to throw away and recode... but, after 3 days of hard debugging,
I've added the magic if :-)
Spaghetti code PHP CMS system.
by default, programmers think someone else's code is obfuscated.
The worse I probably had to do was interpreting what variables i1, i2 j, k, t were in a simple method and they were not counters in 'for' loops.
In all other circumstances I guess the problem area was difficult which made the code look difficult.
I found this line in our codebase today and thought it was a nice example of sneaky obfuscation:
if (MULTICLICK_ENABLED.equals(propService.getProperty(PropertyNames.MULTICLICK_ENABLED))) {} else {
return false;
}
Just making sure I read the whole line. NO SKIMREADING.
When working on a GWT project, I would reach parts of GWT-compiled obfuscated JS code which wasn't mine.
Now good luck debugging real obfuscated code.
I can't remember the full code, but a single part of it remains burned into my memory as something I spend hours trying to understand:
do{
$tmp = shift unless shift;
$tmp;
}while($tmp);
I couldn't understand it at first, it looks so useless, then I printed out #_ for a list of arguments, a series of alternating boolean and function names, the code was used in conjunction with a library detection module that changed behaviour if a function was broken, but the code was so badly documented and made of things like that which made no sense without a complete understanding of the full code I gave up and rewrote the whole thing.
UPDATE from DVK:
And, lest someone claims this was because Perl is unreadable as opposed to coder being a golf master instead of good software developer, here's the same code in a slightly less obfuscated form (the really correct code wouldn't even HAVE alternating sub names and booleans in the first place :)
# This subroutine take a list of alternating true/false flags
# and subroutine names; and executes the named subroutines for which flag is true.
# I am also weird, otherwise I'd have simply have passed list of subroutines to execute :)
my #flags_and_sub_names_list = #_;
while ( #flags_and_sub_names_list ) {
my $flag = shift #flags_and_sub_names_list;
my $subName = shift #flags_and_sub_names_list;
next unless $flag && $subName;
&{ $subName }; # Call the named subroutine
}
I've had a case of a 300lines function performing input sanitization which missed a certain corner case. It was parsing certain situations manually using IndexOf and Substring plus a lot of inlined variables and constants (looks like the original coder didn't know anything about good practices), and no comment was provided. Throwing it away wasn't feasible due to time constraints and the fact that I didn't have the specification required so rewriting it would've meant understanding the original, but after understanding it fixing it was just quicker. I also added lots of comments, so whoever shall come after me won't feel the same pain taking a look at it...
The Perl statement:
select((select(s),$|=1)[0])
which, at the suggestion of the original author (Randal Schwartz himself, who said he disliked it but nothing else was available at the time), was replaced with something a little more understandable:
IO::Handle->autoflush
Beyond that one-liner, some of the Java JDBC libraries from IBM are obfuscated and all variables and functions are either combinations of the letter 'l' and '1' or single/double characters - very hard to track anything down until you get them all renamed. Needed to do this to track down why they worked fine in IBM's JRE but not Sun's.
If you're talking about HLL codes, once I was updating project written by a chinese and all comments were chinese (stored in ansii) and it was a horror to understand some code fragments, if you're talking about low level code there were MANY of them (obfuscated, mutated, vm-ed...).
I once had to reverse engineer a Java 1.1 framework that:
Extended event-driven SAX parser classes for every class, even those that didn't parse XML (the overridden methods were simply invoked ad hoc by other code)
Custom runtime exceptions were thrown in lieu of method invocations wherever possible. As a result, most of the business logic landed in a nested series of catch blocks.
If I had to guess, it was probably someone's "smart" idea that method invocations were expensive in Java 1.1, so throwing exceptions for non-exceptional flow control was somehow considered an optimization.
Went through about three bottles of eye drops.
I once found a time bomb that had been intentionally obfuscated.
When I had finally decoded what it was doing I mentioned it to the manager who said they knew about the time bomb but had left it in place because it was so ineffective and was interwoven with other code.
The time bomb was (presumably) supposed to go off after a certain date.
Instead, it had a bug in it so it only activated if someone was working after lunchtime on Dec 31st.
It had taken three years for that circumstance to occur since the guy who wrote the time bomb left the company.

Should I make sure arguments aren't null before using them in a function?

The title may not really explain what I'm really trying to get at, couldn't really think of a way to describe what I mean.
I was wondering if it is good practice to check the arguments that a function accepts for nulls or empty before using them. I have this function which just wraps some hash creation like so.
Public Shared Function GenerateHash(ByVal FilePath As IO.FileInfo) As String
If (FilePath Is Nothing) Then
Throw New ArgumentNullException("FilePath")
End If
Dim _sha As New Security.Cryptography.MD5CryptoServiceProvider
Dim _Hash = Convert.ToBase64String(_sha.ComputeHash(New IO.FileStream(FilePath.FullName, IO.FileMode.Open, IO.FileAccess.Read)))
Return _Hash
End Function
As you can see I just takes a IO.Fileinfo as an argument, at the start of the function I am checking to make sure that it is not nothing.
I'm wondering is this good practice or should I just let it get to the actual hasher and then throw the exception because it is null.?
Thanks.
In general, I'd suggest it's good practice to validate all of the arguments to public functions/methods before using them, and fail early rather than after executing half of the function. In this case, you're right to throw the exception.
Depending on what your method is doing, failing early could be important. If your method was altering instance data on your class, you don't want it to alter half of the data, then encounter the null and throw an exception, as your object's data might them be in an intermediate and possibly invalid state.
If you're using an OO language then I'd suggest it's essential to validate the arguments to public methods, but less important with private and protected methods. My rationale here is that you don't know what the inputs to a public method will be - any other code could create an instance of your class and call it's public methods, and pass in unexpected/invalid data. Private methods, however, are called from inside the class, and the class should already have validated any data passing around internally.
One of my favourite techniques in C++ was to DEBUG_ASSERT on NULL pointers. This was drilled into me by senior programmers (along with const correctness) and is one of the things I was most strict on during code reviews. We never dereferenced a pointer without first asserting it wasn't null.
A debug assert is only active for debug targets (it gets stripped in release) so you don't have the extra overhead in production to test for thousands of if's. Generally it would either throw an exception or trigger a hardware breakpoint. We even had systems that would throw up a debug console with the file/line info and an option to ignore the assert (once or indefinitely for the session). That was such a great debug and QA tool (we'd get screenshots with the assert on the testers screen and information on whether the program continued if ignored).
I suggest asserting all invariants in your code including unexpected nulls. If performance of the if's becomes a concern find a way to conditionally compile and keep them active in debug targets. Like source control, this is a technique that has saved my ass more often than it has caused me grief (the most important litmus test of any development technique).
Yes, it's good practice to validate all arguments at the beginning of a method and throw appropriate exceptions like ArgumentException, ArgumentNullException, or ArgumentOutOfRangeException.
If the method is private such that only you the programmer could pass invalid arguments, then you may choose to assert each argument is valid (Debug.Assert) instead of throw.
If NULL is an inacceptable input, throw an exception. By yourself, like you did in your sample, so that the message is helpful.
Another method of handling NULL inputs is just to respont with a NULL in turn. Depends on the type of function -- in the example above I would keep the exception.
If its for an externally facing API then I would say you want to check every parameter as the input cannot be trusted.
However, if it is only going to be used internally then the input should be able to be trusted and you can save yourself a bunch of code that's not adding value to the software.
You should check all arguments against the set of assumptions that you make in that function about their values.
As in your example, if a null argument to your function doesn't make any sense and you're assuming that anyone using your function will know this then being passed a null argument shows some sort of error and some sort of action taken (eg. throwing an exception). And if you use asserts (as James Fassett got in and said before me ;-) ) they cost you nothing in a release version. (they cost you almost nothing in a debug version either)
The same thing applies to any other assumption.
And it's going to be easier to trace the error if you generate it than if you leave it to some standard library routine to throw the exception. You will be able to provide much more useful contextual information.
It's outside the bounds of this question, but you do need to expose the assumptions that your function makes - for example, through the comment header to your function.
According to The Pragmatic Programmer by Andrew Hunt and David Thomas, it is the responsibility of the caller to make sure it gives valid input. So, you must now choose whether you consider a null input to be valid. Unless it makes specific sense to consider null to be a valid input (e.g. it is probably a good idea to consider null to be a legal input if you're testing for equality), I would consider it invalid. That way your program, when it hits incorrect input, will fail sooner. If your program is going to encounter an error condition, you want it to happen as soon as possible. In the event your function does inadvertently get passed a null, you should consider it to be a bug, and react accordingly (i.e. instead of throwing an exception, you should consider making use of an assertion that kills the program, until you are releasing the program).
Classic design by contract: If input is right, output will be right. If input is wrong, there is a bug. (if input is right but output is wrong, there is a bug. That's a gimme.)
I'll add a couple of elaborations (in bold) to the excellent design by contract advice offerred by Brian earlier...
The priniples of "design by contract" require that you define what is acceptable for the caller to pass in (the valid domain of input values) and then, for any valid input, what the method/provider will do.
For an internal method, you can define NULLs as outside the domain of valid input parameters. In this case, you would immediately assert that the input parameter value is NOT NULL. The key insight in this contract specification is that any call passing in a NULL value IS A CALLER'S BUG and the error thrown by the assert statement is the proper behavior.
Now, while very well defined and parsimonius, if you're exposing the method to external/public callers, you should ask yourself, is that the contract I/we really want?
Probably not. In a public interface, you'd probably accept the NULL (as technically in the domain of inputs that the method accepts), but then decline to process gracefully w/ a return message. (More work to meet the naturally more complex customer-facing requirement.)
In either case, what you're after is a protocol that handles all of the cases from both the perspective of the caller and the provider, not lots of scattershot tests that can make it difficult to assess the completeness or lack of completeness of the contractual condition coverage.
Most of the time, letting it just throw the exception is pretty reasonable as long as you are sure the exception won't be ignored.
If you can add something to it, however, it doesn't hurt to wrap the exception with one that is more accurate and rethrow it. Decoding "NullPointerException" is going to take a bit longer than "IllegalArgumentException("FilePath MUST be supplied")" (Or whatever).
Lately I've been working on a platform where you have to run an obfuscator before you test. Every stack trace looks like monkeys typing random crap, so I got in the habit of checking my arguments all the time.
I'd love to see a "nullable" or "nonull" modifier on variables and arguments so the compiler can check for you.
If you're writing a public API, do your caller the favor of helping them find their bugs quickly, and check for valid inputs.
If you're writing an API where the caller might untrusted (or the caller of the caller), checked for valid inputs, because it's good security.
If your APIs are only reachable by trusted callers, like "internal" in C#, then don't feel like you have to write all that extra code. It won't be useful to anyone.

Resources