Proper formatting for if statements inside loops - syntax

I'm curious which of the following is considered better form. My examples are snippets, but in more complex programs I've written, I can see arguments for and have used either.
Option 1: Skip ahead if a condition is met, otherwise execute lots of code:
for i in range(5):
if i == 3:
continue
print i
# and do lots of other stuff
The argument for this: clearly skips certain condition, doesn't result in overly nested code.
Option 2: Execute lots of code if condition is not met:
for i in range(5):
if i is not 3:
print i
# and do lots of other stuff
The argument for this: more verbose and doesn't use unnecessary continue.
(There seems to be a few questions related to this, but mostly about using functions or long blocks of code. I'm curious about the == vs != part.)

I think that, generally, Option 1 is better practice. In Option 2, 'lots of other stuff' is at a second level on indentation. Generally, the less indentation, the better.

Related

In Bash, Why `then` is needed in the conditional constructs?

The conditional construct of if command looks like this:
if TEST-COMMANDS; then
CONSEQUENT-COMMANDS;
[elif MORE-TEST-COMMANDS; then
MORE-CONSEQUENTS;]
[else ALTERNATE-CONSEQUENTS;]
fi
And the loop construct of while command looks like this:
while TEST-COMMANDS; do CONSEQUENT-COMMANDS; done
I was wondering why then is needed in if command but not in while command? Why couldn't it be ommited?
do in the while syntax serves a similar purpose to then in the if syntax. They both signify the start of the body of the statement - differentiating it from the condition part of the statement.
The if conditional statement is a compound statement in the shell. The if & then sections of the statement are executed as two parts, the then section is only invoked if the if section ends with an exit status of 0. Both sections may contain multiple statements; therefore, a semi-colon alone is insufficient to separate these sections.
Like #shibley is saying in his answer, the do and then words are used to indicate the beginning of the block of actions to perform.
I have done some research and could not find the historical reasons, so I am going to guess the logical ones. It might be too subjective, so do not hesitate to comment your impressions below.
The bash syntax is quite "symmetrical": Whenever you have an case you finish it with esac. Also, it was designed in a very human way, so it is easily understandable.
That said, if you are in a while loop, it means that you are going to do something while a condition is true. Then, when it is not true anymore you are done.
However, in an if condition, you are saying that if something happens, then something needs to be executed.
In short: do and then are human-readable ways to indicate the same, that is, the beginning of a block of commands to be performed upon a while or if condition.

Good style for splitting lengthy expressions over lines

If the following is not the best style, what is for the equivalent expression?
if (some_really_long_expression__________ && \
some_other_really_long_expression)
The line continuation feels ugly. But I'm having a hard time finding a better alternative.
The parser doesn't need the backslashes in cases where the continuation is unambiguous. For example, using Ruby 2.0:
if true &&
true &&
true
puts true
end
#=> true
The following are some more-or-less random thoughts about the question of line length from someone who just plays with Ruby. Nor have I had any training as a software engineer, so consider yourself forewarned.
I find the problem of long lines is often more the number of characters than the number of operations. The former can be reduced by (drum-roll) shortening variable names and method names. The question, of course, is whether the application of a verbosity filter (aka babbling, prattling or jabbering filter) will make the code harder to comprehend. How often have you seen something fairly close to the following (without \)?
total_cuteness_rating = cats_dogs_and_pigs.map {|animal| \
cuteness_calculation(animal)}.reduce {|cuteness_accumulator, \
cuteness_per_animal| cuteness_accumulator + cuteness_per_animal}
Compare that with:
tot_cuteness = pets.map {|a| cuteness(a)}.reduce(&:+)
Firstly, I see no benefit of long names for local variables within a block (and rarely for local variables in a method). Here, isn't it perfectly obvious what a refers to in the calculation of tot_cuteness? How good a memory do you need to remember what a is when it is confined to a single line of code?
Secondly, whenever possible use the short form for enumerables followed by a block (e.g, reduce(&:+)). This allows us to comprehend what's going on in microseconds, here as soon as our eyes latch onto the +. Same, for .to_i, _s or _f. True, reduce {|tot, e| tot + e} isn't much longer, but we're forcing the reader's brain to decode two variables as well as the operator, when + is really all it needs.
Another way to shorten lines is to avoid long chains of operations. That comes at a cost, however. As far as I'm concerned, the longer the chain, the better. It reduces the need for temporary variables, reduces the number of lines of code and--possibly of greatest importance--allows us to read across a line, as most humans are accustomed, rather than down the page. The above line of code reads, "To calculate total cuteness, calculate each pet's cuteness rating, then sum those ratings". How could it be more clear?
When chains are particularly long, they can be written over multiple lines without using the line-continuaton character \:
array.each {|e| blah, blah, ..., blah
.map {|a| blah, blah, ..., blah
.reduce {|i| blah, blah, ..., blah }
}
}
That's no less clear than separate statements. I think this is frequently done in Rails.
What about the use of abbreviations? Which of the following names is most clear?
number_of_dogs
number_dogs
nbr_dogs
n_dogs
I would argue the first three are equally clear, and the last no less clear if the writer consistently prefixes variable names with n_ when that means "number of". Same for tot_, and so on. Enough.
One approach is to encapsulate those expressions inside meaningful methods. And you might be able to break it into multiple methods that you can later reuse.
Other then that is hard to suggest anything with the little information you gave. You might be able to get rid of the if statement using command objects or something like that but I can't tell if it makes sense on your code because you didn't show it.
Ismael answer works really well in Ruby (there may be other languages too) for 2 reasons:
Ruby has very low overhead to creating methods due to lack of type
definition
It allows you to decouple such logic for reuse or future adaptability and testing
Another option I'll toss out is create logic equations and store the result in a variable e.g.
# this are short logic equations testing x but you can apply same for longer expressions
number_gt_5 = x > 5
number_lt_20 = x < 20
number_eq_11 = x == 11
if (number_gt_5 && number_lt_20 && !number_eq_11)
# do some stuff
end

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

style opinion re. empty If block

I'm trying to curb some of the bad habits of a self-proclaimed "senior programmer." He insists on writing If blocks like this:
if (expression) {}
else {
statements
}
Or as he usually writes it in classic ASP VBScript:
If expression Then
Else
statements
End If
The expression could be something as easily negated as:
if (x == 0) {}
else {
statements
}
Other than clarity of coding style, what other reasons can I provide for my opinion that the following is preferred?
if (x != 0) {
statements
}
Or even the more general case (again in VBScript):
If Not expression Then
statements
End If
Reasons that come to my mind for supporting your opinion (which I agree with BTW) are:
Easier to read (which implies easier to understand)
Easier to maintain (because of point #1)
Consistent with 'established' coding styles in most major programming languages
I have NEVER come across the coding-style/form that your co-worker insists on using.
I've tried it both ways. McConnell in Code Complete says one should always include both the then and the else to demonstrate that one has thought about both conditions, even if the operation is nothing (NOP). It looks like your friend is doing this.
I've found this practice to add no value in the field because unit testing handles this or it is unnecessary. YMMV, of course.
If you really want to burn his bacon, calculate how much time he's spending writing the empty statements, multiply by 1.5 (for testing) and then multiply that number by his hourly rate. Send him a bill for the amount.
As an aside, I'd move the close curly bracket to the else line:
if (expression) {
} else {
statements
}
The reason being that it is tempting to (or easy to accidentally) add some statement outside the block.
For this reason, I abhor single-line (bare) statements, of the form
if (expression)
statement
Because it can get fugly (and buggy) really fast
if (expression)
statement1
statement2
statement2 will always run, even though it might look like it should be subject to expression. Getting in the habit of always using brackets will kill this stumbling point dead.

Why use short-circuit code?

Related Questions: Benefits of using short-circuit evaluation, Why would a language NOT use Short-circuit evaluation?, Can someone explain this line of code please? (Logic & Assignment operators)
There are questions about the benefits of a language using short-circuit code, but I'm wondering what are the benefits for a programmer? Is it just that it can make code a little more concise? Or are there performance reasons?
I'm not asking about situations where two entities need to be evaluated anyway, for example:
if($user->auth() AND $model->valid()){
$model->save();
}
To me the reasoning there is clear - since both need to be true, you can skip the more costly model validation if the user can't save the data.
This also has a (to me) obvious purpose:
if(is_string($userid) AND strlen($userid) > 10){
//do something
};
Because it wouldn't be wise to call strlen() with a non-string value.
What I'm wondering about is the use of short-circuit code when it doesn't effect any other statements. For example, from the Zend Application default index page:
defined('APPLICATION_PATH')
|| define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
This could have been:
if(!defined('APPLICATION_PATH')){
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
}
Or even as a single statement:
if(!defined('APPLICATION_PATH'))
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
So why use the short-circuit code? Just for the 'coolness' factor of using logic operators in place of control structures? To consolidate nested if statements? Because it's faster?
For programmers, the benefit of a less verbose syntax over another more verbose syntax can be:
less to type, therefore higher coding efficiency
less to read, therefore better maintainability.
Now I'm only talking about when the less verbose syntax is not tricky or clever in any way, just the same recognized way of doing, but in fewer characters.
It's often when you see specific constructs in one language that you wish the language you use could have, but didn't even necessarily realize it before. Some examples off the top of my head:
anonymous inner classes in Java instead of passing a pointer to a function (way more lines of code).
in Ruby, the ||= operator, to evaluate an expression and assign to it if it evaluates to false or is null. Sure, you can achieve the same thing by 3 lines of code, but why?
and many more...
Use it to confuse people!
I don't know PHP and I've never seen short-circuiting used outside an if or while condition in the C family of languages, but in Perl it's very idiomatic to say:
open my $filehandle, '<', 'filename' or die "Couldn't open file: $!";
One advantage of having it all in one statement is the variable declaration. Otherwise you'd have to say:
my $filehandle;
unless (open $filehandle, '<', 'filename') {
die "Couldn't open file: $!";
}
Hard to claim the second one is cleaner in that case. And it'd be wordier still in a language that doesn't have unless
I think your example is for the coolness factor. There's no reason to write code like that.
EDIT: I have no problem with doing it for idiomatic reasons. If everyone else who uses a language uses short-circuit evaluation to make statement-like entities that everyone understands, then you should too. However, my experience is that code of that sort is rarely written in C-family languages; proper form is just to use the "if" statement as normal, which separates the conditional (which presumably has no side effects) from the function call that the conditional controls (which presumably has many side effects).
Short circuit operators can be useful in two important circumstances which haven't yet been mentioned:
Case 1. Suppose you had a pointer which may or may not be NULL and you wanted to check that it wasn't NULL, and that the thing it pointed to wasn't 0. However, you must not dereference the pointer if it's NULL. Without short-circuit operators, you would have to do this:
if (a != NULL) {
if (*a != 0) {
⋮
}
}
However, short-circuit operators allow you to write this more compactly:
if (a != NULL && *a != 0) {
⋮
}
in the certain knowledge that *a will not be evaluated if a is NULL.
Case 2. If you want to set a variable to a non-false value returned from one of a series of functions, you can simply do:
my $file = $user_filename ||
find_file_in_user_path() ||
find_file_in_system_path() ||
$default_filename;
This sets the value of $file to $user_filename if it's present, or the result of find_file_in_user_path(), if it's true, or … so on. This is seen perhaps more often in Perl than C, but I have seen it in C.
There are other uses, including the rather contrived examples which you cite above. But they are a useful tool, and one which I have missed when programming in less complex languages.
Related to what Dan said, I'd think it all depends on the conventions of each programming language. I can't see any difference, so do whatever is idiomatic in each programming language. One thing that could make a difference that comes to mind is if you had to do a series of checks, in that case the short-circuiting style would be much clearer than the alternative if style.
What if you had a expensive to call (performance wise) function that returned a boolean on the right hand side that you only wanted called if another condition was true (or false)? In this case Short circuiting saves you many CPU cycles. It does make the code more concise because of fewer nested if statements. So, for all the reasons you listed at the end of your question.
The truth is actually performance. Short circuiting is used in compilers to eliminate dead code saving on file size and execution speed. At run-time short-circuiting does not execute the remaining clause in the logical expression if their outcome does not affect the answer, speeding up the evaluation of the formula. I am struggling to remember an example. e.g
a AND b AND c
There are two terms in this formula evaluated left to right.
if a AND b evaluates to FALSE then the next expression AND c can either be FALSE AND TRUE or FALSE AND FALSE. Both evaluate to FALSE no matter what the value of c is. Therefore the compiler does not include AND c in the compiled format hence short-circuiting the code.
To answer the question there are special cases when the compiler cannot determine whether the logical expression has a constant output and hence would not short-circuit the code.
Think of it this way, if you have a statement like
if( A AND B )
chances are if A returns FALSE you'll only ever want to evaluate B in rare special cases. For this reason NOT using short ciruit evaluation is confusing.
Short circuit evaluation also makes your code more readable by preventing another bracketed indentation and brackets have a tendency to add up.

Resources