type conversion in Ruby - ruby

Ruby is strongly typed language. Thus it performs type conversion rather than type casting. Now there is two types of conversions - implicit and explicit.
Based on what rules ruby determines which kind of conversion to be applied on what context?

Ruby is "duck typed", neither strong or weak typed, which means the behavior of a variable/object does not necessarily depend on the class it belongs to, but rather very "blind" and do method call at runtime without type check. If it cannot do that, it raises error.
Ruby does implicit conversion for Integer, String and some other internal classes. Whether to perform a conversion depends on the left operand. For example,
1 + "2"
The left operand is an integer, so ruby tries to do a math operation +. But the right operand is a string, so ruby will try to do a conversion (coersion) from string to integer. (Although it still failed. To make it work, one need to redefine the method + for Integer or we call monkey patch to do an explicit conversion using String#to_i)

Related

Ruby to_s a special method? [duplicate]

I'm learning Ruby and I've seen a couple of methods that are confusing me a bit, particularly to_s vs to_str (and similarly, to_i/to_int, to_a/to_ary, & to_h/to_hash). What I've read explains that the shorter form (e.g. to_s) are for explicit conversions while the longer form are for implicit conversions.
I don't really understand how to_str would actually be used. Would something other than a String ever define to_str? Can you give a practical application for this method?
Note first that all of this applies to each pair of “short” (e.g. to_s/to_i/to_a/to_h) vs. “long” (e.g. to_str/to_int/to_ary/to_hash) coercion methods in Ruby (for their respective types) as they all have the same semantics.
They have different meanings. You should not implement to_str unless your object acts like a string, rather than just being representable by a string. The only core class that implements to_str is String itself.
From Programming Ruby (quoted from this blog post, which is worth reading all of):
[to_i and to_s] are not particularly strict: if an object has some kind of decent representation as a string, for example, it will probably have a to_s method… [to_int and to_str] are strict conversion functions: you implement them only if [your] object can naturally be used every place a string or an integer could be used.
Older Ruby documentation from the Pickaxe has this to say:
Unlike to_s, which is supported by almost all classes, to_str is normally implemented only by those classes that act like strings.
For example, in addition to Integer, both Float & Numeric implement to_int (to_i's equivalent of to_str) because both of them can readily substituted for an Integer (they are all actually numbers). Unless your class has a similarly tight relationship with String, you should not implement to_str.
To understand if you should use/implement to_s/to_str, let's look at some exemples. It is revealing to consider when these method fail.
1.to_s # returns "1"
Object.new.to_s # returns "#<Object:0x4932990>"
1.to_str # raises NoMethodError
Object.new.to_str # raises NoMethodError
As we can see, to_s is happy to turn any object into a string. On the other hand, to_str raises an error when its parameter does not look like a string.
Now let us look at Array#join.
[1,2].join(',') # returns "1,2"
[1,2].join(3) # fails, the argument does not look like a valid separator.
It is useful that Array#join converts to string the items in the array (whatever they really are) before joining them, so Array#join calls to_s on them.
However, the separator is supposed to be a string -- someone calling [1,2].join(3) is likely to be making a mistake. This is why Array#join calls to_str on the separator.
The same principle seems to hold for the other methods. Consider to_a/to_ary on a hash:
{1,2}.to_a # returns [[1, 2]], an array that describes the hash
{1,2}.to_ary # fails, because a hash is not really an array.
In summary, here is how I see it:
call to_s to get a string that describes the object.
call to_str to verify that an object really acts like a string.
implement to_s when you can build a string that describes your object.
implement to_str when your object can fully behave like a string.
I think a case when you could implement to_str yourself is maybe a ColoredString class -- a string that has a color attached to it. If it seems clear to you that passing a colored comma to join is not a mistake and should result in "1,2" (even though that string would not be colored), then do implement to_str on ColoredString.
Zverok has a great easily understandable article about when to use what (explained with to_h and to_hash).
It has to do whether your Object implementing those methods can be converted to a string
-> use to_s
or it is a type of some (enhanced) string
-> use to_str
I've seen a meaningful usage of to_hash in practice for the Configuration class in the gem 'configuration' (GitHub and Configuration.rb)
It represents -- as the name says -- the provided configuration, which in fact is a kind of hash (with additional features), rather than being convertible to one.

Syntax sugar for variable replacement with `round`

We have do-and-replace functions like map!, reject!, reverse!, rotate!. Also we have binary operations in short form like +=, -=.
Do we have something for mathematical round? We need to use a = a.round, and it's a bit weird for me to repeat the variable name. Do you know how to shorten it?
OK, smart guys have already explained, why there is no syntactic sugar for Float#round. Just out of curiosity I’m gonna show, how you might implement this sugar yourself [partially]. Since Float class has no ~# method defined, and you do rounding quite often, you might monkeypatch Float class:
class Float
def ~#
self.round # self is redundant, left just for clarity
end
end
or, in this simple case, just (credits to #sawa):
alias_method :~#, :round
and now:
~5.2
#⇒ 5
a = 2.45 && ~a
#⇒ 2
Since Numerics are immutable, it’s still impossible to modify it inplace, but the above might save you four keyboard hits per rounding.
As for destructive methods, it is impossible since numerals are immutable, and it would not make sense. Would you want a numeral 5.2 that behaves as 5?
As for syntax sugar, it would be a mess if every single method had one. So there isn't. And since syntax sugar is defined in the core level, you cannot do anything in an ordinary Ruby script to create a new one.
Ruby's numeric types are immutable: they are value objects. Therefore you won't find any methods that mutate a number in place.
Because the numeric types are immutable, certain optimizations are possible that would not be possible with mutable numbers. In c-ruby, for example, a reference, which may point to any kind of object, is normally a pointer to an object. But if the reference is to a Fixnum, then the reference contains the integer itself, rather than pointing to an instance of Fixnum. Ruby does a number of magic tricks to hide this optimization, making it appear that an integer really is an instance of a Fixnum.
To make numbers mutable would make this optimization impossible, so I don't expect that Ruby will ever have mutable numeric types.

Does the actual value of a enum class enumeration remain constant/invariant?

Given code for an incomplete server like:
enum class Command : uint32_t {
LOGIN,
MESSAGE,
JOIN_CHANNEL,
PART_CHANNEL,
INVALID
};
Can I expect that converting Command::LOGIN to an integer will always give the same value?
Across compilers?
Across compiler versions?
If I add another enumeration?
If I remove an enumeration?
Converting Command::LOGIN would look something like this:
uint32_t number = static_cast<uint32_t>(Command::LOGIN);
Some extra information on what I am doing here. This enumeration is fed onto the wire by converting it to an integer sending it along to the server/client. I do not really particularly care what the number is, as long as it will always stay the same. If it will not stay the same, then obviously I will have to provide my own numbers through the usual way.
Now my sneaking suspicion is that it will change depending on what compiler was used to compile the code, but I would like to know for sure.
Bonus question: How does the compiler/language determine what number to use for Command::LOGIN?
Before submitting this question, I have noticed some changes from say 3137527848 to 0 and back, so it is obviously not valid to rely on it not changing. I am still curious about how this number is determined, and how or why that number is changing.
From the C++11 Standard (or rather, n3485):
[dcl.enum]/2
If the first enumerator has no initializer, the value of the corresponding constant is zero. An enumerator-definition without an initializer gives the enumerator the value obtained by increasing the value of the previous enumerator by one.
Additionally, [expr.static.cast]/9
A value of a scoped enumeration type can be explicitly converted to an integral type. The value is unchanged if the original value can be represented by the specified type.
I think it's obvious that the values of the enumerators can be represented by uint32_t; if they weren't, [dcl.enum]/5 says "if the initializing value of an enumerator cannot be represented by the underlying type, the program is ill-formed."
So as long as you use the underlying type for conversion (either explicitly or via std::underlying_type<Command>::type), the value of those enumerators are fixed as long as you don't add any enumerators before them (in the same enumeration) or alter their order.
As Nicolas Louis Guillemo pointed out, be aware of possible different endianness when transferring the value.
If you assign explicit integer values to your enum constants then you are guaranteed to always have the same value when converting to the integer type.
Just do something like the following:
enum class Command : uint32_t {
LOGIN = 12,
MESSAGE = 46,
JOIN_CHANNEL = 5,
PART_CHANNEL = 0,
INVALID = 42
};
If you don't specify any values explicitly, the values are set implicitly, starting from zero and increasing by one with each move down the list.
Quoting from draft n3485:
[dcl.enum] paragraph 2
The enumeration type declared with an enum-key of only enum is an
unscoped enumeration, and its enumerators are unscoped enumerators.
The enum-keys enum class and enum struct are semantically equivalent;
an enumeration type declared with one of these is a scoped
enumeration, and its enumerators are scoped enumerators. [...] The
identifiers in an enumerator-list are declared as constants, and can
appear wherever constants are required. An enumerator-definition with
= gives the associated enumerator the value indicated by the constant-expression. If the first enumerator has no initializer, the
value of the corresponding constant is zero. An
enumerator-definition without an initializer gives the enumerator the
value obtained by increasing the value of the previous enumerator by
one.
The drawback of relying on this, is that if the list order somehow changes in the future, then your code might silently break, so I would advise you be explicit.
Command::LOGIN will always be 0 as long as it's the first enum in the list. Just be careful with the rest of the enums, because they will have different binary representations based on if the computer is using big endian or little endian.

Converting Ruby symbols to integers

I find it surprising that Ruby symbols can be typecasted to integers without errors.
So :a.to_i is legal. I was wondering what is the significance of this integer, is it a unique value specific to that symbol?
You shouldn't do this, as Symbol#to_i was removed in Ruby 1.9 and is thus not compatible going forward. Regardless, the docs say this about it:
Returns an integer that is unique for each symbol within a particular execution of a program.
It is roughly equivalent to calling object_id on the symbol, as they both end up calling the C function SYM2ID().
In 1.8.x symbols were immediate objects. Their implementation was fast, and, most of the time, small. But with that came a security concern regarding the lack of garbage collection.
The #to_i and #to_int methods returned a unique integer and were related to the internal implementation.
Symbols-as-immediates and the implicit and explicit integer conversions have all been removed in 1.9.x. You can of course get the object_id. It's interesting that in 1.8.x to_i and object_id did not return the same number.

How does Integer === 3 work?

So as I understand it, the === operator tests to see if the RHS object is a member of the LHS object. That makes sense. But how does this work in Ruby? I'm looking at the Ruby docs and I only see === defined in Object, I don't see it in Integer itself. Is it just not documented?
Integer is a class, which (at least in Ruby) means that it is just a boring old normal object like any other object, which just happens to be an instance of the Class class (instead of, say, Object or String or MyWhateverFoo).
Class in turn is a subclass of Module (although arguably it shouldn't be, because it violates the Liskov Substition Principle, but that is a discussion for another forum, and is also a dead horse that has already been beaten many many times). And in Module#=== you will find the definition you are looking for, which Class inherits from Module and instances of Class (like Integer) understand.
Module#=== is basically defined symmetric to Object#kind_of?, it returns true if its argument is an instance of itself. So, 3 is an instance of Integer, therefore Integer === 3 returns true, just as 3.kind_of?(Integer) would.
So as I understand it, the === operator tests to see if the RHS object is a member of the LHS object.
Not necessarily. === is a method, just like any other method. It does whatever I want it to do. And in some cases the "is member of" analogy breaks down. In this case it is already pretty hard to swallow. If you are a hardcore type theory freak, then viewing a type as a set and instances of that type as members of a set is totally natural. And of course for Array and Hash the definition of "member" is also obvious.
But what about Regexp? Again, if you are formal languages buff and know your Chomsky backwards, then interpreting a Regexp as an infinite set of words and Strings as members of that set feels completely natural, but if not, then it sounds kind of weird.
So far, I have failed to come up with a concise description of precisely what === means. In fact, I haven't even come up with a good name for it. It is usually called the triple equals operator, threequals operator or case equality operator, but I strongly dislike those names, because it has absolutely nothing to do with equality.
So, what does it do? The best I have come up with is: imagine you are making a table, and one of the column headers is Integer. Would it make sense to write 3 in that column? If one of the column headers is /ab*a/, would it make sense to write 'abbbba' in that column?
Based on that definition, it could be called the subsumption operator, but that's even worse than the other examples ...
It's defined on Module, which Class is a subclass of, which Integer is an instance of.
In other words, when you run Integer === 3, you're calling '===' (with the parameter 3) on the object referred to to by the constant Integer, which is an instance of the class named Class. Since Class is a subclass of Module and doesn't define its own ===, you get the implementation of === defined on Module.
See the API docs for Module for more information.
Umm, Integer is a subclass of Object.

Resources