Flatten conditional as a refactoring - refactoring

Consider:
if (something) {
// Code...
}
With CodeRush installed it recommended doing:
if (!something) {
return;
}
// Code...
Could someone explain how this is better? Surely there is no benefit what so ever.

Isolated, as you've presented it - no benefit. But mark4o is right on: it's less nesting, which becomes very clear if you look at even, say a 4-level nesting:
public void foo() {
if (a)
if (b)
if (c)
if (d)
doSomething();
}
versus
public void foo() {
if (!a)
return;
if (!b)
return;
if (!c)
return;
if (!d)
return;
doSomething();
}
early returns like this improve readability.

In some cases, it's cleaner to validate all of your inputs at the beginning of a method and just bail out if anything is not correct. You can have a series of single-level if checks that check successively more and more specific things until you're confident that your inputs are good. The rest of the method will then be much easier to write, and will tend to have fewer nested conditionals.

One less level of nesting.

This is a conventional refactoring meant for maintainability. See:
http://www.refactoring.com/catalog/replaceNestedConditionalWithGuardClauses.html
With one condition, it's not a big improvement. But it follows the "fail fast" principle, and you really start to notice the benefit when you have lots of conditions. If you grew up on "structured programming", which typically recommends functions have single exit points, it may seem unnatural, but if you've ever tried to debug code that has three levels or more of nested conditionals, you'll start to appreciate it.

It can be used to make the code more readable (by way of less nesting). See here for a good example, and here for a good discussion of the merits.
That sort of pattern is commonly used to replace:
void SomeMethod()
{
if (condition_1)
{
if (condition_2)
{
if (condition_3)
{
// code
}
}
}
}
With:
void SomeMethod()
{
if (!condition_1) { return; }
if (!condition_2) { return; }
if (!condition_3) { return; }
// code
}
Which is much easier on the eyes.

I don't think CodeRush is recommending it --- rather just offering it as an option.

IMO, it depends on if something or !something is the exceptional case. If there is a significant amount of code if something happens, then using the !something conditional makes more sense for legibility and potential nesting reduction.

Well, look at it this way (I'll use php as an example):
You fill a form and go to this page: validate.php
example 1:
<?php
if (valid_data($_POST['username'])) {
if (valid_data($_POST['password'])) {
login();
} else {
die();
}
} else {
die();
}
?>
vs
<?php
if (!valid_data($_POST['username'])) {
die();
}
if (!valid_data($_POST['password'])) {
die();
}
login();
?>
Which one is better and easier to maintain? Remember this is just validating two things. Imagine this for a register page or something else.

I remember very clearly losing marks on a piece of college work because I had gone with the
if (!something) {
return;
}
// Code...
format. My lecturer pontificated that it was bad practice to have more than one exit point in a function. I thought that was nuts and 20+ years of computer programming later, I still do.
To be fair, he lived in an era where the lingua franca was C and functions were often pages long and full of nested conditionals making it difficult to track what was going on.
Then and now, however, simplicity is king: Keeping functions small and commenting them well is the best way to make things readable and maintainable.

Related

Is it ever okay to not use an ELSE statement if you have a return or throw inside the IF statement? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I often write code such as the following
bool myFunct (...)
{
if (something)
{
return false;
}
// .... more code ....
}
The alternative is
bool myFunct (...)
{
if (something)
{
return false;
}
else
{
// .... more code ....
}
}
Of course, that else block is unnecessary, because the early return means that reaching the else statement in the first place is the equivalent of being inside it. Then there's the fact that, to make the compiler happy, I often have to change the structure of the 2nd implementation to
bool myFunct (...)
{
bool retval = true;
if (something)
{
retval = false;
}
else
{
// .... more code ....
}
return retval;
}
which is extra code and looks stupid. My question is, what do the governing authorities and priests say about this type of situation?
Not only it is OK, it is even encouraged in Spartan Programming. According to Spartan Programming - shorter and simpler code is better, and you achive it (among other ways) by fast terminations and avoiding else statements when possible
Under minimizing use of control:
(2) Simplifying conditionals with early return.
(4) Simplifying logic of iteration with early exits (via
return, continue and break statements).
P.S. It seems #Jeff Atwood also likes the spartan programming way
Of course. You're basically writing a guard condition that will stop you trying to perform unnecessary logic. It's fine.
Like many things, it's personal preference, but I prefer to see code written like your second example for simple cases, and the third example for more complex cases. That doesn't mean any are wrong, though.
There's nothing technically wrong with having an else clause for an if clause that does not terminate naturally (e.g., has a return or throw statement), it's just useless.
Several style guidelines argue against it (and several IDE and analysis tools may produce warnings to support these guidelines), but ultimately, it's just a preference.
The 2nd example looks fine to me because, if the code in the first statement is updated like below, it'll prevent unexpected behavior :
bool myFunct (...)
{
if (something)
{
// ... stuff
if (something_else) return false;
}
else
{
// .... more code ....
}
}

SonarQube - Nested If Depth

I'm getting this violation on sonarqube Nested If Depth
if (some condition){
some code;
if (some condition) {
some code;
}
}
and also here:
for (some condition) {
if (some condition) {
some code;
}
}
how can I reduce the depth?
Answer is already accepted, but doesn't answer the actual question.
So for completeness sake I want to add my 2 cents.
For reference see This question here
The example you list looks like it is dealing with guard conditions. Things like "only run this method if ...." or "only perform this loop iteration if ...."
In these cases, if you have 3 or 4 groups of guards you might end up indenting very deeply, making the code harder to read.
Anyways the way to fix this code to be more readable is to return early.
instead of
if (some condition) {
// some code
if (some other condition) {
// some more code
}
}
You can write
if (!some condition) {
return;
}
// some code
if (!some other condition) {
return;
}
// some more code
You now only every have 1 level of nesting and it is clear that you do not run this method unless 'some condition' has been met.
The same goes for the loop, using continue:
for (some condition) {
if (some other condition) {
// some code;
}
}
becomes
for (some condition) {
if (!some other condition) {
continue;
}
// some code
}
What you would state here is that unless 'some other condition' is met, you skip this loop.
The real question is why is the max depth set to 1 ? It's overkill.
This kind of rule is meant to keep your code readable. More than 2 nested blocks can make the code unreadable, but 1-2 will always be readable.
If you decide to keep the max depth set to 1, you need to refactor your code and put every 2nd condition check inside a separate method. No offense, but unless you have a very specific and good reason to do it, it looks like a bit stupid.

php mail form security: most reliable way to spot new lines

I am trying to build a secure php contact form to allow users (and hopefully not spammers) to send mail.
I am looking at the way of detecting new lines in the from: field, with which users will submit their email address and in the subject: field.
I have 2 alternatives aof the same function to detect new lines and I would like your opinion about which one would be the most reliable (meaning working in the most cases):
function containingnewlines1($stringtotest) {
if (preg_match("/(%0A|%0D|\\n+|\\r+)/i", $stringtotest) != 0) {
echo "Newline found. Suspected injection attempt";
exit;
}
}
function containingnewlines2($stringtotest) {
if (preg_match("/^\R$/", $stringtotest) != 0) {
echo "Newline found. Suspected injection attempt";
exit;
}
}
Thank you in advance for your opinions!
Cheers
The vastly more pertinent question is "Which one is more reliable?". The efficiency of either approach is irrelevant because neither approach should take more than a few milliseconds to execute. Trying to decide between the two based on a matter of milliseconds is a micro-optimization.
Furthermore, what do you mean by efficiency? Do you mean which one is faster? Which one consumes the least memory? Efficiency is an ill-defined term, you need to be more specific.
If you absolutely must make a decision based on performance/efficiency requirements then I'd recommend constructing a benchmark and finding out for yourself which one is the closest fit to your requirements, because at the end of the day only you can answer that question.
I added myself 2 more funcs and did a benchmark of 100000 loops:
function containingnewlines3($stringtotest) {
return (strpbrk($stringtotest,"\r\n") !== FALSE);
}
function containingnewlines4($stringtotest) {
return (strpos($stringtotest,"\n") !== FALSE && strpos($stringtotest,"\r\n") !== FALSE);
}
$start = microtime(TRUE);
for($x=0;$x<100000;$x++) {
containingnewlines1($html); // 0.272623 ms
containingnewlines2($html); // 0.244299 ms
containingnewlines3($html); // 0.377767 ms
containingnewlines4($html); // 0.142282 ms
}
echo (microtime(TRUE) - $start);
Actually, I decided to use the first function, as it covers 2 more cases (%OA and %OD) and as it also includes all the new lines characters variations used by different OSes (\n, \n\r etc).

Loop vs closure, readable vs concise?

In my answer to my own question here I posted some code and #Dave Newton was kind enough to provide me with a gist and show me the error in my not-so-Groovy ways. <-- Groovy pun
I took his advice and revamped my code to be Groovier. Since then the link I am making (which Dave represents with the replaceWith variable) has changed. Now the closure representation of what I want to do would look like this:
int i = 1
errorList = errorLinksFile.readLines().grep { it.contains "href" }.collect { line ->
def replaceWith = "<a href=\"${rooturl}${build.url}parsed_console/log_content.html#ERROR${i++}\">"
line.replaceAll(pattern, replaceWith).minus("</font>")
}
And the for loop representation of what I want to do would look like this:
def errorList = []
def i = 1
for(line in errorLinksFile.getText().split("\n")){
if(!line.contains("href")){
continue
}
errorList.add(line.replaceAll(pattern, "<a href=\"${rooturl}${build.url}parsed_console/log_content.html#ERROR${i++}\">").minus("</font>"))
}
The closure version is definitely more concise, but I'm worried if I always go the "Groovier" route the code might be harder for other programmers to understand than a simple for loop. So when is Groovier better and when should I opt for code that is likely to be understood by all programmers?
I believe that a development team should strive to be the best and coding to the least knowledgeable/experienced developer does not support this. It is important that more than one person on the team knows how to read the code that is developed though. So if you're the only one that can read it, teach someone else. If you're worried about someone new to the team being able to read it I feel that they would be equally hard to read since there would be lack of domain knowledge. What I would do though is break it up a little bit:
def originalMethod() {
//Do whatever happens before the given code
errorList = getModifiedErrorsFromFile(errorLinksFile)
}
def getModifiedErrorsFromFile(errorLinksFile) {
int i = 1
getHrefsFromFile(errorLinksFile).collect { line ->
def replaceWith = getReplacementTextForLine(i)
i++
line.replaceAll(pattern, replaceWith).minus("</font>")
}
}
def getHrefsFromFile(errorLinksFile) {
errorLinksFile.readLines().grep { it.contains "href" }
}
def getReplacementTextForLine(i) {
"<a href=\"${rooturl}${build.url}parsed_console/log_content.html#ERROR${i}\">"
}
This way if the next person doesn't immediately understand what is going on they should be able to infer what is going on based on the method names. If that doesn't work adding tests would help the next person understand what is going on.
My 2 cents. Good topic though!!!
Idiomatic groovy is good, people will learn the common idioms quickly. "Clever" groovy, in my opinion, is more likely to be just confusing.

How much information hiding is necessary when doing code refactoring?

How much information hiding is necessary? I have boilerplate code before I delete a record, it looks like this:
public override void OrderProcessing_Delete(Dictionary<string, object> pkColumns)
{
var c = Connect();
using (var cmd = new NpgsqlCommand("SELECT COUNT(*) FROM orders WHERE order_id = :_order_id", c)
{ Parameters = { {"_order_id", pkColumns["order_id"]} } } )
{
var count = (long)cmd.ExecuteScalar();
// deletion's boilerplate code...
if (count == 0) throw new RecordNotFoundException();
else if (count > 1) throw new DatabaseStructureChangedException();
// ...boiler plate code
}
// deleting of table(s) goes here...
}
NOTE: boilerplate code is code-generated, including the "using (var cmd = new NpgsqlCommand( ... )"
But I'm seriously thinking to refactor the boiler plate code, I wanted a more succint code. This is how I envision to refactor the code (made nicer with extension method (not the sole reason ;))
using (var cmd = new NpgsqlCommand("SELECT COUNT(*) FROM orders WHERE order_id = :_order_id", c)
{ Parameters = { {"_order_id", pkColumns["order_id"]} } } )
{
cmd.VerifyDeletion(); // [EDIT: was ExecuteWithVerification before]
}
I wanted the executescalar and the boilerplate code to goes inside the extension method.
For my code above, does it warrants code refactoring / information hiding? Is my refactored operation looks too opaque?
I would say that your refactor is extremely good, if your new single line of code replaces a handful of lines of code in many places in your program. Especially since the functionality is going to be the same in all of those places.
The programmer coming after you and looking at your code will simply look at the definition of the extension method to find out what it does, and now he knows that this code is defined in one place, so there is no possibility of it differing from place to place.
Try it if you must, but my feeling is it's not about succinctness but whether or not you want to enforce the behavior every time or most of the time. And by extension, if the verify-condition changes that it would likely change across the board.
Basically, reducing a small chunk of boiler-plate code doesn't necessarily make things more succinct; it's just one more bit of abstractness the developer has to wade through and understand.
As a developer, I'd have no idea what "ExecuteWithVerify" means. What exactly are we verifying? I'd have to look it up and remember it. But with the boiler-plate code, I can look at the code and understand exactly what's going on.
And by NOT reducing it to a separate method I can also tune the boiler-plate code for cases where exceptions need to be thrown for differing conditions.
It's not information-hiding when you extract or refactor your code. It's only information-hiding when you start restricting access to your extension definition after refactoring.
"new" operator within a Class (except for the Constructor) should be Avoided at all costs. This is what you need to refactor here.

Resources