Ruby how much error checking is appropriate? - ruby

I'm implementing a few things in Ruby and I was wondering how much error checking is appropriate (or, more precisely, how much error checking should be done by convention)?
For example, I'm implementing a method which swaps two elements in an array. The method is very simple:
def swap(a,b)
#array[a], #array[b] = #array[b], #array[a]
end
It's really simple, but is it ruby-ish to check whether the given indexes are valid, or is that an unnecessary overhead (bearing in mind I do not intend for the method to work with wrap-around values like -1)?

I can't help you with negative indexes, but you can use
#array.fetch(a)
to raise an exception if a is an invalid index.
I ought to use fetch when I regard an invalid index as a "Can't happen" case, but sometimes I think only about the "happy path" scenario.

It depends on the behavior that you're looking for. The Array#[] method that you're calling will do that check for you and return nil if you're using a nonexistent index, so I don't see a need to duplicate the error checking if you want that standard behavior. If you want something else, you'll need to implement the behavior you want.
However, this method will work with an index of -1, so if you want to disallow that, you will need to put a check for that.
Basically, I think a good rule is: Check for conditions in which your method will behave incorrectly, and implement the behavior you want. The out-of-bounds index condition will be caught by the array and handled a certain way -- if that handling is correct, you don't need to do anything. An index that does not match some custom expectation of the method will not be caught at all, so you definitely need to check for it.

Related

Why does delete return the deleted element instead of the new array?

In ruby, Array#delete(obj) will search and remove the specified object from the array. However, may be I'm missing something here but I found the returning value --- the obj itself --- is quite strange and a even a little bit useless.
My humble opinion is that in consistent with methods like sort/sort! and map/map! there should be two methods, e.g. delete/delete!, where
ary.delete(obj) -> new array, with obj removed
ary.delete!(obj) -> ary (after removing obj from ary)
For several reasons, first being that current delete is non-pure, and it should warn the programmer about that just like many other methods in Array (in fact the entire delete_??? family has this issue, they are quite dangerous methods!), second being that returning the obj is much less chainable than returning the new array, for example, if delete were like the above one I described, then I can do multiple deletions in one statement, or I can do something else after deletion:
ary = [1,2,2,2,3,3,3,4]
ary.delete(2).delete(3) #=> [1,4], equivalent to "ary - [2,3]"
ary.delete(2).map{|x|x**2"} #=> [1,9,9,9,16]
which is elegant and easy to read.
So I guess my question is: is this a deliberate design out of some reason, or is it just a heritage of the language?
If you already know that delete is always dangerous, there is no need to add a bang ! to further notice that it is dangerous. That is why it does not have it. Other methods like map may or may not be dangerous; that is why they have versions with and without the bang.
As for why it returns the extracted element, it provides access to information that is cumbersome to refer to if it were not designed like that. The original array after modification can easily be referred to by accessing the receiver, but the extracted element is not easily accessible.
Perhaps, you might be comparing this to methods that add elements, like push or unshift. These methods add elements irrespective of what elements the receiver array has, so returning the added element would be always the same as the argument passed, and you know it, so it is not helpful to return the added elements. Therefore, the modified array is returned, which is more helpful. For delete, whether the element is extracted depends on whether the receiver array has it, and you don't know that, so it is useful to have it as a return value.
For anyone who might be asking the same question, I think I understand it a little bit more now so I might as well share my approach to this question.
So the short answer is that ruby is not a language originally designed for functional programming, neither does it put purity of methods to its priority.
On the other hand, for my particular applications described in my question, we do have alternatives. The - method can be used as a pure alternative of delete in most situations, for example, the code in my question can be implemented like this:
ary = [1,2,2,2,3,3,3,4]
ary.-([2]).-([3]) #=> [1,4], or simply ary.-([2,3])
ary.-([2]).map{|x|x**2"} #=> [1,9,9,9,16]
and you can happily get all the benefits from the purity of -. For delete_if, I guess in most situations select (with return value negated) could be a not-so-great pure candidate.
As for why delete family was designed like this, I think it's more of a difference in point of view. They are supposed to be more of shorthands for commonly needed non-pure procedures than to be juxtaposed with functional-flavored select, map, etc.
I’ve wondered some of these same things myself. What I’ve largely concluded is that the method simply has a misleading name that carries with it false expectations. Those false expectations are what trigger our curiosity as to why the method works like it does. Bottom line—I think it’s a super useful method that we wouldn’t be questioning if it had a name like “swipe_at” or “steal_at”.
Anyway, another alternative we have is values_at(*args) which is functionally the opposite of delete_at in that you specify what you want to keep and then you get the modified array (as opposed to specifying what you want to remove and then getting the removed item).

Why do some Ruby methods need a bang and others don't to be a destructive method?

For example, array.pop doesn't need a bang to permanently alter the array. Why is this so and what was the reasoning behind developing these certain Ruby methods without this conformity?
Bang methods are most commonly used to distinguish between a dangerous and a safe version of the same method. Here are some example cases that one might want to distinguish with a bang/no-bang combination:
mutator methods - one version changes the object, the other one returns a copy and leaves the original object unchanged
when encountering an error, one version throws an exception while the other one only writes an error message to the log or does nothing
However, the convention is to leave the bang off if there is only one version that makes sense. For example, poping an array without actually changing it makes no sense. In this case, it would end up being a different operation: Array#last. A lot of methods change the object they are called on, for example setters. We don't need to write these with a bang either, because it's clear that they change the object.
Lastly, there are a few exceptions to this, where some developers might use a bang method without implementing a bang-less counterpart. In these cases, the bang is simply used as a way to making the method calls stand out visually. For example:
the method does something dangerous or destructive
the method does something unexpected
the method has a significant performance impact
The bang is used to distinguish between a dangerous and less dangerous version of the same method. There is only one pop method, so there is nothing to distinguish.
Note: the name of the method has absolutely nothing whatsoever to do with what it does. Whether a method is destructive or not depends on what code it executes, not what name it has.
A suffix of ! means that a method is a dangerous version of another method. For example, save! is the dangerous version of save. Dangerous could mean editing in place, doing something with more strict errors, etc. It is not required to use the ! suffix on a method that is dangerous, but doesn't need a safer counterpart. Additionally, this is just a naming convention, so Ruby does not restrict what you can and can't do if a method does or doesn't end with !.
There is a common misconception that every method that edits something in place should end with !. This is not true, ! is only needed when there is a more dangerous version of a method that already exists, and this does not necessarily mean that the dangerous method edits in place. For example, in Rails, ActiveRecord::Base#save! is a version of ActiveRecord::Base#save that performs validations.
The meaning of bang in Ruby is "caution". It means you should use the method with caution, nothing more. I cannot find the reference anymore, but people of authority said explicitly that bang ≠ destructive method. Bang is just a semantic element associated with caution. It is up to the programmer to weigh in everything and decide when to use bang.
For example, in my simulation gem, I use #step method to obtain the step size.
simulation.step #=> 0.42
and step! method to actually perform the simulation step.
simulation.step! #=> takes the simulation to the next time step
But as for #reset method, I decided that the word "reset" it's verbose enough and it is not necessary to use bang to warn the user that the simulation state will be destroyed:
simulation.reset #=> resets the simulation back to the initial state
P.S.: Now I remember, once upon a time, Matz said half jokingly that he regrets introducing methods with bang into Ruby at all, because bang is semantically so ambiguous.

Complex RSpec Argument Testing

I have a method I want to test that is being called with a particular object. However, identifying this object is somewhat complex because I'm not interested in a particular instance of the object, but one that conforms to certain conditions.
For example, my code might be like:
some_complex_object = ComplexObject.generate(params)
my_function(some_complex_object)
And in my test I want to check that
test_complex_object = ComplexObject.generate(test_params)
subject.should_receive(:my_function).with(test_complex_object)
But I definitely know that ComplexObject#== will return false when comparing some_complex_object and test_complex_object because that is desired behavior (elsewhere I rely on standard == behavior and so don't want to overload it for ComplexObject).
I would argue that it's a problem that I find myself needing to write a test such as this, and would prefer to restructure the code so that I don't need to write such a test, but that's unfortunately a much larger task that would require rewriting a lot of existing code and so is a longer term fix that I want to get to but can't do right now within time constraints.
Is there a way with Rspec to be able to do a more complex comparison between arguments in a test? Ideally I'd like something where I could use a block so I can write arbitrary comparison code.
See https://www.relishapp.com/rspec/rspec-mocks/v/2-7/docs/argument-matchers for information on how you can provide a block to do arbitrary analysis of the arguments passed to a method, as in:
expect(subject).to receive(:my_function) do |test_complex_object|
# code setting expectations on test_complex_object
end
You can also define a custom matcher which will let you evaluate objects to see if that satisfy the condition, as described in https://www.relishapp.com/rspec/rspec-expectations/v/2-3/docs/custom-matchers/define-matcher

Validate arguments in Ruby?

I wonder if one should validate that the arguments passed to a method is of a certain class.
eg.
def type(hash = {}, array = [])
# validate before
raise "first argument needs to be a hash" unless hash.class == Hash
raise "second argument needs to be an array" unless array.class == Array
# actual code
end
Is it smart to do this or is it just cumbersome and waste of time to validate all passed in arguments?
Are there circumstances when you would like to have this extra security and circumstances when you won't bother?
Share your experiences!
I wouldn't recommend this specific approach, as you fail to accommodate classes that provide hash or array semantics but are not that class. If you need this kind of validation, you're better off using respond_to? with a method name. Arrays implement the method :[], for what it's worth.
OpenStruct has hash semantics and attribute-accessor method semantics, but won't return true for the condition hash.class==Hash. It'll work just like a hash in practice, though.
To put it in perspective, even in a non-dynamic language you wouldn't want to do it this way; you'd prefer to verify that an object implements IDictionary<T>. Ruby would idiomatically prefer that, when necessary, you verify that the method exists, because if it does, the developer is probably intending their object to act alike. You can provide additional sanity with unit tests around the client code as an alternative to forcing things to be non-dynamic.
There's usually no need to validate that arguments are of a certain class. In Ruby, you're encouraged to use Duck Typing.
I have found that validating that the input parameters meet your preconditions is a very valuable practice. The stupid person that its saves you from is you. This is especially true for Ruby as it has no compile time checks. If there are some characteristics of the input of your method that you know must be true it makes senes to check them at run time and raise errors with meaningful error messages. Otherwise, the code just starts producing garbage out for garbage in and you either get a wrong answer or an exception down the line.
I think that it is unnecessary. I once read on a blog something like "If you need to protect your code from stupid people, ruby isn't the language for you."
If you want compiler/runtime-enforced code contracts, then Ruby isn't for you. If you want a malleable language and easy testability, then Ruby is.

Should I make sure arguments aren't null before using them in a function?

The title may not really explain what I'm really trying to get at, couldn't really think of a way to describe what I mean.
I was wondering if it is good practice to check the arguments that a function accepts for nulls or empty before using them. I have this function which just wraps some hash creation like so.
Public Shared Function GenerateHash(ByVal FilePath As IO.FileInfo) As String
If (FilePath Is Nothing) Then
Throw New ArgumentNullException("FilePath")
End If
Dim _sha As New Security.Cryptography.MD5CryptoServiceProvider
Dim _Hash = Convert.ToBase64String(_sha.ComputeHash(New IO.FileStream(FilePath.FullName, IO.FileMode.Open, IO.FileAccess.Read)))
Return _Hash
End Function
As you can see I just takes a IO.Fileinfo as an argument, at the start of the function I am checking to make sure that it is not nothing.
I'm wondering is this good practice or should I just let it get to the actual hasher and then throw the exception because it is null.?
Thanks.
In general, I'd suggest it's good practice to validate all of the arguments to public functions/methods before using them, and fail early rather than after executing half of the function. In this case, you're right to throw the exception.
Depending on what your method is doing, failing early could be important. If your method was altering instance data on your class, you don't want it to alter half of the data, then encounter the null and throw an exception, as your object's data might them be in an intermediate and possibly invalid state.
If you're using an OO language then I'd suggest it's essential to validate the arguments to public methods, but less important with private and protected methods. My rationale here is that you don't know what the inputs to a public method will be - any other code could create an instance of your class and call it's public methods, and pass in unexpected/invalid data. Private methods, however, are called from inside the class, and the class should already have validated any data passing around internally.
One of my favourite techniques in C++ was to DEBUG_ASSERT on NULL pointers. This was drilled into me by senior programmers (along with const correctness) and is one of the things I was most strict on during code reviews. We never dereferenced a pointer without first asserting it wasn't null.
A debug assert is only active for debug targets (it gets stripped in release) so you don't have the extra overhead in production to test for thousands of if's. Generally it would either throw an exception or trigger a hardware breakpoint. We even had systems that would throw up a debug console with the file/line info and an option to ignore the assert (once or indefinitely for the session). That was such a great debug and QA tool (we'd get screenshots with the assert on the testers screen and information on whether the program continued if ignored).
I suggest asserting all invariants in your code including unexpected nulls. If performance of the if's becomes a concern find a way to conditionally compile and keep them active in debug targets. Like source control, this is a technique that has saved my ass more often than it has caused me grief (the most important litmus test of any development technique).
Yes, it's good practice to validate all arguments at the beginning of a method and throw appropriate exceptions like ArgumentException, ArgumentNullException, or ArgumentOutOfRangeException.
If the method is private such that only you the programmer could pass invalid arguments, then you may choose to assert each argument is valid (Debug.Assert) instead of throw.
If NULL is an inacceptable input, throw an exception. By yourself, like you did in your sample, so that the message is helpful.
Another method of handling NULL inputs is just to respont with a NULL in turn. Depends on the type of function -- in the example above I would keep the exception.
If its for an externally facing API then I would say you want to check every parameter as the input cannot be trusted.
However, if it is only going to be used internally then the input should be able to be trusted and you can save yourself a bunch of code that's not adding value to the software.
You should check all arguments against the set of assumptions that you make in that function about their values.
As in your example, if a null argument to your function doesn't make any sense and you're assuming that anyone using your function will know this then being passed a null argument shows some sort of error and some sort of action taken (eg. throwing an exception). And if you use asserts (as James Fassett got in and said before me ;-) ) they cost you nothing in a release version. (they cost you almost nothing in a debug version either)
The same thing applies to any other assumption.
And it's going to be easier to trace the error if you generate it than if you leave it to some standard library routine to throw the exception. You will be able to provide much more useful contextual information.
It's outside the bounds of this question, but you do need to expose the assumptions that your function makes - for example, through the comment header to your function.
According to The Pragmatic Programmer by Andrew Hunt and David Thomas, it is the responsibility of the caller to make sure it gives valid input. So, you must now choose whether you consider a null input to be valid. Unless it makes specific sense to consider null to be a valid input (e.g. it is probably a good idea to consider null to be a legal input if you're testing for equality), I would consider it invalid. That way your program, when it hits incorrect input, will fail sooner. If your program is going to encounter an error condition, you want it to happen as soon as possible. In the event your function does inadvertently get passed a null, you should consider it to be a bug, and react accordingly (i.e. instead of throwing an exception, you should consider making use of an assertion that kills the program, until you are releasing the program).
Classic design by contract: If input is right, output will be right. If input is wrong, there is a bug. (if input is right but output is wrong, there is a bug. That's a gimme.)
I'll add a couple of elaborations (in bold) to the excellent design by contract advice offerred by Brian earlier...
The priniples of "design by contract" require that you define what is acceptable for the caller to pass in (the valid domain of input values) and then, for any valid input, what the method/provider will do.
For an internal method, you can define NULLs as outside the domain of valid input parameters. In this case, you would immediately assert that the input parameter value is NOT NULL. The key insight in this contract specification is that any call passing in a NULL value IS A CALLER'S BUG and the error thrown by the assert statement is the proper behavior.
Now, while very well defined and parsimonius, if you're exposing the method to external/public callers, you should ask yourself, is that the contract I/we really want?
Probably not. In a public interface, you'd probably accept the NULL (as technically in the domain of inputs that the method accepts), but then decline to process gracefully w/ a return message. (More work to meet the naturally more complex customer-facing requirement.)
In either case, what you're after is a protocol that handles all of the cases from both the perspective of the caller and the provider, not lots of scattershot tests that can make it difficult to assess the completeness or lack of completeness of the contractual condition coverage.
Most of the time, letting it just throw the exception is pretty reasonable as long as you are sure the exception won't be ignored.
If you can add something to it, however, it doesn't hurt to wrap the exception with one that is more accurate and rethrow it. Decoding "NullPointerException" is going to take a bit longer than "IllegalArgumentException("FilePath MUST be supplied")" (Or whatever).
Lately I've been working on a platform where you have to run an obfuscator before you test. Every stack trace looks like monkeys typing random crap, so I got in the habit of checking my arguments all the time.
I'd love to see a "nullable" or "nonull" modifier on variables and arguments so the compiler can check for you.
If you're writing a public API, do your caller the favor of helping them find their bugs quickly, and check for valid inputs.
If you're writing an API where the caller might untrusted (or the caller of the caller), checked for valid inputs, because it's good security.
If your APIs are only reachable by trusted callers, like "internal" in C#, then don't feel like you have to write all that extra code. It won't be useful to anyone.

Resources