destructive method and bang method in Ruby - ruby

David A. Black said in his Book:
Dangerous can mean whatever the person writing the method wants it to mean. In the case of the built-in classes, it usually means this method, unlike its non-bang equivalent,
permanently modifies its receiver. It doesn’t always, though: exit! is a dangerous alternative to exit, in the sense that it doesn’t run any finalizers on the way out of the program.
The danger in sub! (a method that substitutes a replacement string for a matched pattern in a string) is partly that it changes its receiver and partly that it returns nil if no change has taken place—unlike sub, which always returns a copy of the original string with the replacement (or no replacement) made.
While all the above is understood,but couldn't understand what he tried to say in the below.
Furthermore, don’t assume a direct correlation between bang methods and destructive methods. They often coincide, but they’re not the same thing.
Based on what notion we can classify and put a method in the destructive or dangerous list?

Destructive methods are those, that change the value of an attribute of the object they're called on. So what he says can be reiterated as:
Don't assume that method! will change a value of an attribute. This is often the case, but not a rule.

Related

Why are bang methods dangerous in Ruby?

I've been re-learning Ruby lately, and this page says that usually a bang method is dangerous, but it doesn't say why. Why are bang methods dangerous?
There are two widespread meanings of "dangerous" in standard library and common gems:
Method mutates the receiver, as opposed to returning a copy of the receiver. Example: Array#map!
Method will raise an exception if its primary function can't be performed. Example: ActiveRecord::Base#save!, ActiveRecord::Base#create!. If, say, an object can't be saved (because it's not valid or whatever), save! will raise an error, while save will return false.
I usually add a third meaning to it in my code:
Method will immediately persist data in the database, instead of just changing some attributes and hoping that later someone will save the object. Example: hypothetical Article#approve!
The page you refer to includes this:
Normally for the built-in classes, dangerous usually (although not always) means this method, unlike its non-bang equivalent, permanently modifies its receiver.
The convention goes like this:
Firstly, you create a bang method only if you have a non-bang alternative with the same name.
Secondly - yes, it means that this version is "more dangerous". This is a very vague term as you said yourself.
In a lot of the standard library it will modify an object in place, instead of creating a new one. Sometimes it will return nil instead of the object if the call didn't require any modification.
In rails bang methods usually raise exceptions as opposed to returning nil.
Why are bang methods dangerous?
Because that's the naming convention: if there are two methods which do the same thing, then you name them both the same name but the more surprising or more dangerous one gets the bang.
For example, Process::exit and Process::exit! both exit the currently running Ruby process, but the bang version will skip running all exit handlers that may be installed, and so, for example, skip any cleanup that you might have scheduled for when your app exits.
No it is not dangerous. Bang methods simply means they are modifying the object itself and you should be careful.

What determines if a bang method on a mutable class returns `nil`?

Usually, bang methods on mutable a class such as String, Array, or Hash return nil when no modification is made. But some Array bang methods, i.e., collect!, map!, reverse!, rotate!, shuffle!, sort!, sort_by! and a Hash bang method, i.e., merge!, never return nil. What is the rationale behind this? What makes these methods different from others? I don't see why knowing whether an array was sorted by sort! is not useful while knowing whether an array was made unique by uniq! is useful.
TL;DR
[B]ang methods on mutable class[es] return nil when no modification is made...[b]ut some Array bang methods...never return nil.
If there is an "official" rationale (along the lines of an official specification), I'm currently unaware of it. I personally suspect it's simply because some objects use nil as a return value to indicate errors (e.g. index out of range) rather than raising an exception, while others always return a valid object. I very much doubt there's an overriding philosophy there, although there appears to be a general consensus about when to use bang methods when you dig deep enough.
Bang Methods Aren't Inherently About Mutation
As one example, consider issue #5009, which requests:
[P]lease use bang methods (those that end with !) consistently in the API.
One useful response says:
[That bang is destructive] is a common misconception about the use of bang methods in Ruby. Bang does not indicate that a method mutates its receiver, merely that it should be used with caution.
Some Community Consensus on Bang Methods
There is definitely some consensus among Rubyists about when to use bang methods. The Ruby Style Guide currently offers the following guidelines:
The names of potentially dangerous methods...should end with an exclamation mark if there exists a safe version of that dangerous method.
Define the non-bang (safe) method in terms of the bang (dangerous) one if possible.
These guidelines seem consistent with the general idea that bang methods are about the caller being careful in choosing the method or handling the return value, rather than the bang acting as an indicator of what will be returned.

Why do some Ruby methods need a bang and others don't to be a destructive method?

For example, array.pop doesn't need a bang to permanently alter the array. Why is this so and what was the reasoning behind developing these certain Ruby methods without this conformity?
Bang methods are most commonly used to distinguish between a dangerous and a safe version of the same method. Here are some example cases that one might want to distinguish with a bang/no-bang combination:
mutator methods - one version changes the object, the other one returns a copy and leaves the original object unchanged
when encountering an error, one version throws an exception while the other one only writes an error message to the log or does nothing
However, the convention is to leave the bang off if there is only one version that makes sense. For example, poping an array without actually changing it makes no sense. In this case, it would end up being a different operation: Array#last. A lot of methods change the object they are called on, for example setters. We don't need to write these with a bang either, because it's clear that they change the object.
Lastly, there are a few exceptions to this, where some developers might use a bang method without implementing a bang-less counterpart. In these cases, the bang is simply used as a way to making the method calls stand out visually. For example:
the method does something dangerous or destructive
the method does something unexpected
the method has a significant performance impact
The bang is used to distinguish between a dangerous and less dangerous version of the same method. There is only one pop method, so there is nothing to distinguish.
Note: the name of the method has absolutely nothing whatsoever to do with what it does. Whether a method is destructive or not depends on what code it executes, not what name it has.
A suffix of ! means that a method is a dangerous version of another method. For example, save! is the dangerous version of save. Dangerous could mean editing in place, doing something with more strict errors, etc. It is not required to use the ! suffix on a method that is dangerous, but doesn't need a safer counterpart. Additionally, this is just a naming convention, so Ruby does not restrict what you can and can't do if a method does or doesn't end with !.
There is a common misconception that every method that edits something in place should end with !. This is not true, ! is only needed when there is a more dangerous version of a method that already exists, and this does not necessarily mean that the dangerous method edits in place. For example, in Rails, ActiveRecord::Base#save! is a version of ActiveRecord::Base#save that performs validations.
The meaning of bang in Ruby is "caution". It means you should use the method with caution, nothing more. I cannot find the reference anymore, but people of authority said explicitly that bang ≠ destructive method. Bang is just a semantic element associated with caution. It is up to the programmer to weigh in everything and decide when to use bang.
For example, in my simulation gem, I use #step method to obtain the step size.
simulation.step #=> 0.42
and step! method to actually perform the simulation step.
simulation.step! #=> takes the simulation to the next time step
But as for #reset method, I decided that the word "reset" it's verbose enough and it is not necessary to use bang to warn the user that the simulation state will be destroyed:
simulation.reset #=> resets the simulation back to the initial state
P.S.: Now I remember, once upon a time, Matz said half jokingly that he regrets introducing methods with bang into Ruby at all, because bang is semantically so ambiguous.

What is special about boolean?

In Ruby, there is a convention to have a method name end with a question mark to indicate that its return value is boolean. Why is boolean considered so special? Is there anything convenient if you know that a method's return value is particularly boolean? After all, in Ruby, you can insert all kinds of value returning (getter) methods into a conditional without caring whether it is boolean or not.
I think it is a waste to use the question mark just for indicating a boolean value. There should be more useful uses. I have plenty of use case where I want to have a pair of getter and setter methods, where the setter method should return self so that I can use it in a method chain. And naming them something like get_foo and set_foo looks cumbersome. Rather than following the convention, I am tempted to name a pair of getter and setter methods like this:
def foo?; #foo end
def foo v; #foo = v end
where the value of #foo is not (necessarily) boolean. (Besides potential criticism that breaking the convention will confuse other programmers), is there something wrong with doing that?
There is nothing special at all, it's just a convention. A question can be answered with "yes" or "no", but also with another stuff like someone's name.
By returning a boolean on methods with a question mark, it indicates it to be an explicit behavior.
If you make the answer be "yes" or "no", it's easy for the reader of your code to identify the behavior of your method without even looking at the implementation. On the other hand, if you make it return any other type, it is more difficult for the reader to understand your code without reading your class and method definition.
With a boolean there are only two possible answers. If the return value is not boolean it can be anything, which would not help at all. You would still need to look at the method implementation. You should always look further to understand some piece of code, but using this convention makes it simpler.
There is a convention to use question mark in method names to indicate that a method is a predicate. AFAIK, this predicate is not required (by the convention) to return a boolean value, thanks to simple rules for truthy/falsey values.
Besides potential criticism that breaking the convention will confuse other programmers, is there something wrong with doing that?
Confusing and surprising fellow programmers is bad. Ruby couldn't care less. It's just a convention. And conventions exist for a reason.
You can put anything in a flow control construct, but semantically booleans are appropriate. "If" in real human language typically takes a boolean, and the same is true of the construct in many programming languages. Ruby likes to make things convenient and assigns a "truthiness" value to everything in the language, which affects how it behaves in a boolean context.
In other words, booleans are the only things that are almost exclusively used for flow control, so the convention is to make them look "right" for flow-control constructs. It's their native environment.
(Besides potential criticism that breaking the convention will confuse other programmers), is there something wrong with doing that?
In the same sense that there is nothing wrong with naming all your variables after 1920s comedians, no, there's nothing wrong with that. But also in the same sense as naming all your variables after 1920s comedians, it isn't a very good idea. Nowhere in any language that I know of -- human or computer -- does the question mark mean "get." So the semantics of your code are off with that convention.
This question and the answers boil down to "POLS" AKA "Principle of Least Surprise".
A method name can be a random choice of letters and numbers separated by underscores, with '!', '?' and '=' sprinkled through them, if we chose to do so. They could be randomly created by the code at run time, and, as long as the rest of the code used the same arrangement of characters, the program would run and Ruby would be happy.
We humans, the programmers, determine the name of the methods used, to represent something, a characteristic or an action. Trying to use randomly named methods would lead to madness, or at least a very hard to maintain program. So, instead, we try to use sensible names for things. Sometimes they're verbs or adjectives, sometimes they're more descriptive because the method does several things.
As part of that naming, sometimes we want to provide additional hints about the behavior of the method. By convention in Ruby, we use "!" to warn the coder that the method changes something or is destructive. "=" indicates the method takes a parameter and assigns it to the receiver/object. It's a setter method and in many other languages it'd be idiomatic to use "set_flag..." or "set_value..." as the name. It's just a convention in that language, and followed by developers in the language.
We use "?" in Ruby to ask a question about an object, whether it is, or isn't, true about that object. We could say "is_true?" or "true?" and indicate we are testing whether something is true about it. If it's true, or false, it's a Boolean response so we return a true/false value.

Why does to_a and to_ary behave differently in subclasses of Array?

If you have a subclass X of Array, then doing X#to_a returns an array object, while doing X#to_ary returns an x object.
While I understand that to_a means "I can be changed into an array", while to_ary means "I behave like an array", I don't understand why the former implements a change of class while the latter doesn't.
Also, isn't returning a subclass of Array sufficient for to_a, under the Liskov Substitution Principle?
Is "because that's the way it's defined to be" sufficient?
to_a
Returns self. If called on a subclass of Array, converts the receiver to an Array object.
to_ary
Returns self.
Probably not, so here we go into the rabbit hole.
Beyond the fact that the documentation definitively states that this is the way it is, the reasoning is perhaps only truly answerable by Matz, et al.
Digging around though it would seem that to_ary is used when implicit type conversions occur. Its use for implicit conversions seems to be echoed in this feature request as well. In other words, if an object responds to to_ary, then it should be treated as an Array, and it is used in this way internally. Thus to_a would be for when you (explicitly) want an Array and not some subclass.
Yes, returning a subclass would still satisfy LSP (assuming the subclass does not decide to radically change the behavior of Array such that it wouldn't be), but the principle only states that a subclass may be substituted for its base class, not that it needs to be. I'm not really sure that matter here anyway, though, since you're calling to_a your explicitly asking for a different object (to go along with the reasoning about implicit conversions above) and thus you're saying you don't want a substitute object type.
As a general rule, the implicit conversions are automatically called by the interpreter, and they are intended to only convert things that are very much like the type that was required but not found.
The explicit conversions, however, can be called on disparate types as long as there is some way to get from point a to point b, even if that involves some sort of polar route or a detour.
So you simply have more freedom for a leap with to_a, but I agree that it seems like X should be good enough.

Resources