Ruby case/when vs if/elsif - ruby

The case/when statements remind me of try/catch statements in Python, which are fairly expensive operations. Is this similar with the Ruby case/when statements? What advantages do they have, other than perhaps being more concise, to if/elsif Ruby statements? When would I use one over the other?

The case expression is not at all like a try/catch block. The Ruby equivalents to try and catch are begin and rescue.
In general, the case expression is used when you want to test one value for several conditions. For example:
case x
when String
"You passed a string but X is supposed to be a number. What were you thinking?"
when 0
"X is zero"
when 1..5
"X is between 1 and 5"
else
"X isn't a number we're interested in"
end
The case expression is orthogonal to the switch statement that exists in many other languages (e.g. C, Java, JavaScript), though Python doesn't include any such thing. The main difference with case is that it is an expression rather than a statement (so it yields a value) and it uses the === operator for equality, which allows us to express interesting things like "Is this value a String? Is it 0? Is it in the range 1..5?"

Ruby's begin/rescue/end is more similar to Python's try/catch (assuming Python's try/catch is similar to Javascript, Java, etc.). In both of the above the code runs, catches errors and continues.
case/when is like C's switch and ignoring the === operator that bjhaid mentions operates very much like if/elseif/end. Which you use is up to you, but there are some advantages to using case when the number of conditionals gets long. No one likes /if/elsif/elsif/elsif/elsif/elsif/end :-)
Ruby has some other magical things involving that === operator that can make case nice, but I'll leave that to the documentation which explains it better than I can.

Related

Overriding Ruby's & and | methods doesn't require . operator? [duplicate]

I'm wondering why calls to operator methods don't require a dot? Or rather, why can't normal methods be called without a dot?
Example
class Foo
def +(object)
puts "this will work"
end
def plus(object)
puts "this won't"
end
end
f = Foo.new
f + "anything" # "this will work"
f plus "anything" # NoMethodError: undefined method `plus' for main:Object
The answer to this question, as to pretty much every language design question is: "Just because". Language design is a series of mostly subjective trade-offs. And for most of those subjective trade-offs, the only correct answer to the question why something is the way it is, is simply "because Matz said so".
There are certainly other choices:
Lisp doesn't have operators at all. +, -, ::, >, = and so on are simply normal legal function names (variable names, actually), just like foo or bar?
(plus 1 2)
(+ 1 2)
Smalltalk almost doesn't have operators. The only special casing Smalltalk has is that methods which consist only of operator characters do not have to end with a colon. In particular, since there are no operators, all method calls have the same precedence and are evaluated strictly left-to-right: 2 + 3 * 4 is 20, not 14.
1 plus: 2
1 + 2
Scala almost doesn't have operators. Just like Lisp and Smalltalk, *, -, #::: and so on are simply legal method names. (Actually, they are also legal class, trait, type and field names.) Any method can be called either with or without a dot. If you use the form without the dot and the method takes only a single argument, then you can leave off the brackets as well. Scala does have precedence, though, although it is not user-definable; it is simply determined by the first character of the name. As an added twist, operator method names that end with a colon are inverted or right-associative, i.e. a :: b is equivalent to b.::(a) and not a.::(b).
1.plus(2)
1 plus(2)
1 plus 2
1.+(2)
1 +(2)
1 + 2
In Haskell, any function whose name consists of operator symbols is considered an operator. Any function can be treated as an operator by enclosing it in backticks and any operator can be treated as a function by enclosing it in brackets. In addition, the programmer can freely define associativity, fixity and precedence for user-defined operators.
plus 1 2
1 `plus` 2
(+) 1 2
1 + 2
There is no particular reason why Ruby couldn't support user-defined operators in a style similar to Scala. There is a reason why Ruby can't support arbitrary methods in operator position, simply because
foo plus bar
is already legal, and thus this would be a backwards-incompatible change.
Another thing to consider is that Ruby wasn't actually fully designed in advance. It was designed through its implementation. Which means that in a lot of places, the implementation is leaking through. For example, there is absolutely no logical reason why
puts(!true)
is legal but
puts(not true)
isn't. The only reason why this is so, is because Matz used an LALR(1) parser to parse a non-LALR(1) language. If he had designed the language first, he would have never picked an LALR(1) parser in the first place, and the expression would be legal.
The Refinement feature currently being discussed on ruby-core is another example. The way it is currently specified, will make it impossible to optimize method calls and inline methods, even if the program in question doesn't actually use Refinements at all. With just a simple tweak, it can be just as expressive and powerful, and ensure that the pessimization cost is only incurred for scopes that actually use Refinements. Apparently, the sole reason why it was specified this way, is that a) it was easier to prototype this way, and b) YARV doesn't have an optimizer, so nobody even bothered to think about the implications (well, nobody except Charles Oliver Nutter).
So, for basically any question you have about Ruby's design, the answer will almost always be either "because Matz said so" or "because in 1993 it was easier to implement that way".
The implementation doesn't have the additional complexity that would be needed to allow generic definition of new operators.
Instead, Ruby has a Yacc parser that uses a statically defined grammar. You get the built-in operators and that's it. Symbols occur in a fixed set of sentences in the grammar. As you have noted, the operators can be overloaded, which is more than most languages offer.
Certainly it's not because Matz was lazy.
Ruby actually has a fiendishly complex grammar that is roughly at the limit of what can be accomplished in Yacc. To get more complex would require using a less portable compiler generator or it would have required writing the parser by hand in C, and doing that would have limited future implementation portability in its own way as well as not providing the world with the Yacc input. That would be a problem because Ruby's Yacc source code is the only Ruby grammar documentation and is therefore "the standard".
Because Ruby has "syntax sugar" that allows for a variety of convenient syntax for preset situations. For example:
class Foo
def bar=( o ); end
end
# This is actually calling the bar= method with a parameter, not assigning a value
Foo.new.bar = 42
Here's a list of the operator expressions that may be implemented as methods in Ruby.
Because Ruby's syntax was designed to look roughly like popular OO languages, and those use the dot operator to call methods. The language it borrowed its object model from, Smalltalk, didn't use dots for messages, and in fact had a fairly "weird" syntax that many people found off-putting. Ruby has been called "Smalltalk with an Algol syntax," where Algol is the language that gave us the conventions you're talking about here. (Of course, there are actually more differences than just the Algol syntax.)
Missing braces was some "advantage" for ruby 1.8, but with ruby 1.9 you can't even write method_0 method_1 some param it will be rejected, so the language goes rather to the strict version instead of freeforms.

Ruby evaluates the default value in fetch even when key is found

h = {a: "foo"}
h.fetch(:a, h.fetch(:b))
yields key not found: :b
It seems strange that Ruby evaluates the default value even when the key is found? Is there a way around this?
Edit: Apparently this behavior falls under the paradigm of lazy vs eager evaluation. Nearly all imperative languages use eager evaluation, while many functional languages use lazy evaluation. However some languages, such as Python (which was before last week the only language I knew), have lazy evaluation capabilities for some operations.
It seems strange that Ruby evaluates the default value even when the key is found? Is there a way around this?
The overwhelming majority of mainstream programming languages is strict, i.e. arguments are fully evaluated before being passed. The only exception is Haskell, and calling it mainstream is a bit of a stretch.
So, no, it is not really "strange", it is how (almost) every language works, and it is also how every single other method in Ruby works. Every method in Ruby always fully evaluates all its arguments. That is, why, for example defined? and alias cannot be methods but must be builtin language constructs.
However, there is a way to delay evaluation, so to speak, using blocks: the content of a block is only evaluated each time it is called. Thankfully, Hash#fetch does take a block, so you can just use
h.fetch(:a) { h.fetch(:b) }
If you want the computation to only run if the key is not found, you can use the alternate form of fetch which accepts a block:
h.fetch(a) { h.fetch b }
I didn't actually know this was the case but I had a hunch and tried it and it worked. The reason I thought to try this was that something similar can be done in other methods such as gsub, i.e.
"123".gsub(/[0-9]/) { |i| i.to_i + 1 } == "234"
You could use the || operator instead, since it's lazy evaluated by design.
For example the following code where Nope is not defined does not throw an error.
{ a: "foo" }[:a] || Nope
However the fetch version will throw an error.
{ a: "foo" }.fetch(:a, Nope)
Personally I prefer how fetch is designed because it wont hide a bug in the default value.
However, in cases where I would rather not evaluate a default statement, then I would definitely reach for the || operator before doing something in a block or a lambda/proc.

Pythonesque blocks and postfix expressions

In JavaScript,
f = function(x) {
return x + 1;
}
(5)
seems at a glance as though it should assign f the successor function, but actually assigns the value 6, because the lambda expression followed by parentheses is interpreted by the parser as a postfix expression, specifically a function call. Fortunately this is easy to fix:
f = function(x) {
return x + 1;
};
(5)
behaves as expected.
If Python allowed a block in a lambda expression, there would be a similar problem:
f = lambda(x):
return x + 1
(5)
but this time we can't solve it the same way because there are no semicolons. In practice Python avoids the problem by not allowing multiline lambda expressions, but I'm working on a language with indentation-based syntax where I do want multiline lambda and other expressions, so I'm trying to figure out how to avoid having a block parse as the start of a postfix expression. Thus far I'm thinking maybe each level of the recursive descent parser should have a parameter along the lines of 'we have already eaten a block in this statement so don't do postfix'.
Are there any existing languages that encounter this problem, and how do they solve it if so?
Python has semicolons. This is perfectly valid (though ugly and not recommended) Python code: f = lambda(x): x + 1; (5).
There are many other problems with multi-line lambdas in otherwise standard Python syntax though. It is completely incompatible with how Python handles indentation (whitespace in general, actually) inside expressions - it doesn't, and that's the complete opposite of what you want. You should read the numerous python-ideas thread about multi-line lambdas. It's somewhere between very hard to impossible.
If you want arbitrarily complex compound statements inside lambdas you can't use the existing rules for multi-line expressions even if you made all statements expressions. You'd have to change the indentation handling (see the language reference for how it works right now) so that expressions can also contain blocks. This is hard to do without breaking perfectly fine Python code, and will certainly result in a language many Python programmers will consider worse in several regards: Harder to understand, more complex to implement, permits some stupid errors, etc.
Most languages don't solve this exact problem at all. Most candidates (Scala, Ruby, Lisps, and variants of these three) have explicit end-of-block tokens. I know of two languages that have the same problem, one of which (Haskell) has been mentioned by another answer. Coffeescript also uses indentation without end-of-block tokens. It parses the transliteration of your example correctly. However, I could not find any specification of how or why it does this (and I won't dig through the parser source code). Both differ significantly from Python in syntax as well as design philosophy, so their solution is of little (if any) use for Python.
In Haskell, there is an implicit semicolon whenever you start a line with the same indentation as a previous one, assuming the parser is in a layout-sensitive mode.
More specifically, after a token is encountered that signals the start of a (layout-sensitive) block, the indentation level of the first token of the first block item is remembered. Each line that is indented more continues the current block item; each line that is indented the same starts a new block item, and the first line that is indented less implies the closure of the block.
How your last example would be treated depends on whether the f = is a block item in some block or not. If it is, then there will be an implicit semicolon between the lambda expression and the (5), since the latter is indented the same as the former. If it is not, then the (5) will be treated as continuing whatever block item the f = is a part of, making it an argument to the lamda function.
The details are a bit messier than this; look at the Haskell 2010 report.

style opinion re. empty If block

I'm trying to curb some of the bad habits of a self-proclaimed "senior programmer." He insists on writing If blocks like this:
if (expression) {}
else {
statements
}
Or as he usually writes it in classic ASP VBScript:
If expression Then
Else
statements
End If
The expression could be something as easily negated as:
if (x == 0) {}
else {
statements
}
Other than clarity of coding style, what other reasons can I provide for my opinion that the following is preferred?
if (x != 0) {
statements
}
Or even the more general case (again in VBScript):
If Not expression Then
statements
End If
Reasons that come to my mind for supporting your opinion (which I agree with BTW) are:
Easier to read (which implies easier to understand)
Easier to maintain (because of point #1)
Consistent with 'established' coding styles in most major programming languages
I have NEVER come across the coding-style/form that your co-worker insists on using.
I've tried it both ways. McConnell in Code Complete says one should always include both the then and the else to demonstrate that one has thought about both conditions, even if the operation is nothing (NOP). It looks like your friend is doing this.
I've found this practice to add no value in the field because unit testing handles this or it is unnecessary. YMMV, of course.
If you really want to burn his bacon, calculate how much time he's spending writing the empty statements, multiply by 1.5 (for testing) and then multiply that number by his hourly rate. Send him a bill for the amount.
As an aside, I'd move the close curly bracket to the else line:
if (expression) {
} else {
statements
}
The reason being that it is tempting to (or easy to accidentally) add some statement outside the block.
For this reason, I abhor single-line (bare) statements, of the form
if (expression)
statement
Because it can get fugly (and buggy) really fast
if (expression)
statement1
statement2
statement2 will always run, even though it might look like it should be subject to expression. Getting in the habit of always using brackets will kill this stumbling point dead.

Why use short-circuit code?

Related Questions: Benefits of using short-circuit evaluation, Why would a language NOT use Short-circuit evaluation?, Can someone explain this line of code please? (Logic & Assignment operators)
There are questions about the benefits of a language using short-circuit code, but I'm wondering what are the benefits for a programmer? Is it just that it can make code a little more concise? Or are there performance reasons?
I'm not asking about situations where two entities need to be evaluated anyway, for example:
if($user->auth() AND $model->valid()){
$model->save();
}
To me the reasoning there is clear - since both need to be true, you can skip the more costly model validation if the user can't save the data.
This also has a (to me) obvious purpose:
if(is_string($userid) AND strlen($userid) > 10){
//do something
};
Because it wouldn't be wise to call strlen() with a non-string value.
What I'm wondering about is the use of short-circuit code when it doesn't effect any other statements. For example, from the Zend Application default index page:
defined('APPLICATION_PATH')
|| define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
This could have been:
if(!defined('APPLICATION_PATH')){
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
}
Or even as a single statement:
if(!defined('APPLICATION_PATH'))
define('APPLICATION_PATH', realpath(dirname(__FILE__) . '/../application'));
So why use the short-circuit code? Just for the 'coolness' factor of using logic operators in place of control structures? To consolidate nested if statements? Because it's faster?
For programmers, the benefit of a less verbose syntax over another more verbose syntax can be:
less to type, therefore higher coding efficiency
less to read, therefore better maintainability.
Now I'm only talking about when the less verbose syntax is not tricky or clever in any way, just the same recognized way of doing, but in fewer characters.
It's often when you see specific constructs in one language that you wish the language you use could have, but didn't even necessarily realize it before. Some examples off the top of my head:
anonymous inner classes in Java instead of passing a pointer to a function (way more lines of code).
in Ruby, the ||= operator, to evaluate an expression and assign to it if it evaluates to false or is null. Sure, you can achieve the same thing by 3 lines of code, but why?
and many more...
Use it to confuse people!
I don't know PHP and I've never seen short-circuiting used outside an if or while condition in the C family of languages, but in Perl it's very idiomatic to say:
open my $filehandle, '<', 'filename' or die "Couldn't open file: $!";
One advantage of having it all in one statement is the variable declaration. Otherwise you'd have to say:
my $filehandle;
unless (open $filehandle, '<', 'filename') {
die "Couldn't open file: $!";
}
Hard to claim the second one is cleaner in that case. And it'd be wordier still in a language that doesn't have unless
I think your example is for the coolness factor. There's no reason to write code like that.
EDIT: I have no problem with doing it for idiomatic reasons. If everyone else who uses a language uses short-circuit evaluation to make statement-like entities that everyone understands, then you should too. However, my experience is that code of that sort is rarely written in C-family languages; proper form is just to use the "if" statement as normal, which separates the conditional (which presumably has no side effects) from the function call that the conditional controls (which presumably has many side effects).
Short circuit operators can be useful in two important circumstances which haven't yet been mentioned:
Case 1. Suppose you had a pointer which may or may not be NULL and you wanted to check that it wasn't NULL, and that the thing it pointed to wasn't 0. However, you must not dereference the pointer if it's NULL. Without short-circuit operators, you would have to do this:
if (a != NULL) {
if (*a != 0) {
⋮
}
}
However, short-circuit operators allow you to write this more compactly:
if (a != NULL && *a != 0) {
⋮
}
in the certain knowledge that *a will not be evaluated if a is NULL.
Case 2. If you want to set a variable to a non-false value returned from one of a series of functions, you can simply do:
my $file = $user_filename ||
find_file_in_user_path() ||
find_file_in_system_path() ||
$default_filename;
This sets the value of $file to $user_filename if it's present, or the result of find_file_in_user_path(), if it's true, or … so on. This is seen perhaps more often in Perl than C, but I have seen it in C.
There are other uses, including the rather contrived examples which you cite above. But they are a useful tool, and one which I have missed when programming in less complex languages.
Related to what Dan said, I'd think it all depends on the conventions of each programming language. I can't see any difference, so do whatever is idiomatic in each programming language. One thing that could make a difference that comes to mind is if you had to do a series of checks, in that case the short-circuiting style would be much clearer than the alternative if style.
What if you had a expensive to call (performance wise) function that returned a boolean on the right hand side that you only wanted called if another condition was true (or false)? In this case Short circuiting saves you many CPU cycles. It does make the code more concise because of fewer nested if statements. So, for all the reasons you listed at the end of your question.
The truth is actually performance. Short circuiting is used in compilers to eliminate dead code saving on file size and execution speed. At run-time short-circuiting does not execute the remaining clause in the logical expression if their outcome does not affect the answer, speeding up the evaluation of the formula. I am struggling to remember an example. e.g
a AND b AND c
There are two terms in this formula evaluated left to right.
if a AND b evaluates to FALSE then the next expression AND c can either be FALSE AND TRUE or FALSE AND FALSE. Both evaluate to FALSE no matter what the value of c is. Therefore the compiler does not include AND c in the compiled format hence short-circuiting the code.
To answer the question there are special cases when the compiler cannot determine whether the logical expression has a constant output and hence would not short-circuit the code.
Think of it this way, if you have a statement like
if( A AND B )
chances are if A returns FALSE you'll only ever want to evaluate B in rare special cases. For this reason NOT using short ciruit evaluation is confusing.
Short circuit evaluation also makes your code more readable by preventing another bracketed indentation and brackets have a tendency to add up.

Resources