Ruby: Can't Iterate From Time Despite Responding to Succ - ruby

The Ruby docs for Range#each say
The each method can only be used if the begin object of the range supports the succ
method. A TypeError is raised if the object does not have succ method defined (like
Float).
Time does respond to succ (even though it's marked as obsolete since Ruby 1.9.2):
>> Time.now.succ
(irb):11: warning: Time#succ is obsolete; use time + 1
=> 2021-10-02 14:52:56 -0700
However, trying to use Range#each when the elements are Time objects gives me the TypeError the docs mention:
t = Time.now
range = (t..t+1)
range.each { |t| p t } # TypeError (can't iterate from Time)
I'm trying to determine when it's safe to iterate through a Range. Based on the docs, I had thought checking that its beginning element responds to succ was sufficient. This example suggests that's not quite right.
Do folks have better insight into how to determine this?
Thanks!

Based on the docs, I had thought checking that its beginning element responds to succ was sufficient.
Unfortunately, the documentation is somewhat incomplete here. It is not enough that the beginning element responds to the succ message.
The way that Range#each iterates is that it first yields the beginning element. Then it sends succ to the beginning element and yields the return value of that message send. Then it sends succ to that element … and so on.
You can think of it something like this:
class Range
def each
return enum_for(__callee__) { size } unless block_given?
current = self.begin
while (current <=> self.end).negative?
yield current
current = current.succ
end
yield current unless exclude_end?
self
end
end
So, is not enough that the beginning element responds to the succ message, but the object returned by that message send also needs to respond to succ, and the object returned by sending succ to that element as well, and so on.
In addition tho that, all elements also need to respond to <=> with the ending element as argument.
This example suggests that's not quite right.
Since, as you say, Time#succ is deprecated, it would theoretically be possible that Range#each explicitly checks whether the beginning element is an instance of Time, and raises a TypeError because of the use of a deprecated method. In fact, that is actually what happens in at least some Ruby implementations, see for example in Rubinius core/range.rb lines 109–111:
unless first.respond_to?(:succ) && !first.kind_of?(Time)
raise TypeError, "can't iterate from #{first.class}"
end
or TruffleRuby src/main/ruby/truffleruby/core/range.rb lines 277–279:
unless first.respond_to?(:succ) && !Primitive.object_kind_of?(first, Time)
raise TypeError, "can't iterate from #{first.class}"
end
or JRuby core/src/main/java/org/jruby/RubyRange.java lines 611–613:
if (!discreteObject(context, begin)) {
throw context.runtime.newTypeError("can't iterate from " + begin.getMetaClass().getName());
}
in combination with the definition of discreteObject on lines 873–876:
if (obj instanceof RubyTime) return false;
or in YARV range.c line 978–981:
if (!discrete_object_p(beg)) {
rb_raise(rb_eTypeError, "can't iterate from %s",
rb_obj_classname(beg));
}
in combination with the definition of discrete_object_p on lines 311–316:
if (rb_obj_is_kind_of(obj, rb_cTime)) return FALSE; /* until Time#succ removed */
[Note the comment! Also note that I filed a bug report and submitted a patch that has already been merged, so what I wrote above is actually no longer true.]
Interestingly, neither Opal nor IronRuby nor MRuby check for Time. MRuby sticks closely to the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification, which does not specify Time#succ (see section 15.2.19.7.23–24), so that makes sense. Both IronRuby and Opal implement Time#succ and don't check for an instance of Time in their respective Range#each methods, so your code would actually work. (The reason for IronRuby is probably because it never implemented Ruby 1.9.2 completely; not sure about Opal.)
If you are using a recent version of YARV (which is true for the vast majority of Ruby programmers), the answer may actually be much simpler: after being deprecated for 10 years, Time#succ was removed. There is no Time#succ anymore in recent versions of YARV. (It is likely that other implementors will follow.)
This shows that relying on deprecated methods is dangerous: they might be removed (like what happened in YARV), or they might not exist in the first place (like what happened in MRuby), or they might accidentally work (like in Opal and IronRuby).
Do folks have better insight into how to determine this?
My approach typically goes something like this:
Carefully read and analyze the documentation for the latest version on Ruby-Doc.
Carefully read and analyze the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification.
Carefully read and analyze the ruby/spec.
Carefully read and analyze any relevant discussions on Bugs.Ruby-Lang.Org.
Write exploratory tests on multiple different versions of multiple different implementations and carefully analyze their results and behavior.
Read the relevant parts of the source code of
Rubinius
Opal
TruffleRuby
JRuby
IronRuby
MRuby
YARV
Step back, go to sleep, and ignore the problem for at least two days.
Start again from step #1 with a fresh head.
Discuss the problem with other Ruby developers.
Based on those discussions, start again from step #1 with everything I learned from those discussions.
Write a question on Stack Overflow, but do not submit it yet.
Step back, go to sleep, and ignore the problem for at least two days.
Submit the question to Stack Overflow.

Related

What happens when a method is used on an object created from a built in class?

I understand that classes are like mold from which you can create objects, and a class defines a number of methods and variables (class,instances,local...) inside of it.
Let's say we have a class like this:
class Person
def initialize (name,age)
#name = name
#age = age
end
def greeting
"#{#name} says hi to you!"
end
end
me = Person.new "John", 34
puts me.greeting
As I can understand, when we call Person.new we are creating an object of class Person and initializing some internal attributes for that object, which will be stored in the instance variables #name and #age. The variable me will then be a reference to this newly created object.
When we call me.greeting, what happens is that greeting method is called on the object referenced by me, and that method will use the instance variable #name that is directly tied/attached to that object.
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
Let's say now that we have the string "hello". We created it using a string literal, just like: string = "hello".
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
My doubt arises because I can't understand what happens when we call something like string.upcase, how does the #upcase method "work" on string? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
No, that is very much not what you are doing in an Object-Oriented Program. (Or really any well-designed program.)
What you are describing is a break of encapsulation, abstraction, and information hiding. You should never inspect and/or use another object's instance variables or any of its other private implementation details.
In Object-Orientation, all computation is performed by sending messages between objects. The only thing you can do is sending messages to objects and the only thing you can observe about an object is the responses to those messages.
Only the object itself can inspect and use its attributes and instance variables. No other object can, not even objects of the same type.
If you send an object a message and you get a response, the only thing you know is what is in that response. You don't know how the object created that response: did the object compute the answer on the fly? Was the answer already stored in an instance variable and the object just responded with that? Did the object delegate the problem to a different object? Did it print out the request, fax it to a temp agency in the Philippines, and have a worker compute the answer by hand with pen and paper? You don't know. You can't know. You mustn't know. That is at the very heart of Object-Orientation.
This is, BTW, exactly how messaging works in real-life. If you send someone a message asking "what is π²" and they answer with "9.8696044011", then you have no idea whether they computed this by hand, used a calculator, used their smart phone, looked it up, asked a friend, or hired someone to answer the question for them.
You can imagine objects as being little computers themselves: they have internal storage, RAM, HDD, SSD, etc. (instance variables), they have code running on them, the OS, the basic system libraries, etc. (methods), but one computer cannot read another computer's RAM (access its instance variables) or run its code (execute its methods). It can only send it a request over the network and look at the response.
So, in some sense, your question is meaningless: from the point of view of Object-Oriented Abstraction, is should be impossible to answer your question, because it should be impossible to know how an object is implemented internally.
It could use instance variables, or it could not. It could be implemented in Ruby, or it could be implemented in another programming language. It could be implemented as a standard Ruby object, or it could be implemented as some secret internal private part of the Ruby implementation.
In fact, it could even not exist at all! (For example, in many Ruby implementations small integers do not actually exist as objects at all. The Ruby implementation will just make it look like they do.)
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
[…] [W]hat happens when we call something like string.upcase, how does the #upcase method "work" on string? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
There is nothing in the Ruby Language Specification that says how the String#upcase method is implemented. The Ruby Language Specification only says what the result is, but it doesn't say anything about how the result is computed.
Note that this is not specific to Ruby. This is how pretty much every programming language works. The Specification says what the results should be, but the details of how to compute those results is left to the implementor. By leaving the decision about the internal implementation details up to the implementor, this frees the implementor to choose the most efficient, most performant implementation that makes sense for their particular implementation.
For example, in the Java platform, there are existing methods available for converting a string to upper case. Therefore, in an implementation like TruffleRuby, JRuby, or XRuby, which sits on top of the Java platform, it makes sense to just call the existing Java methods for converting strings to upper case. Why waste time implementing an algorithm for converting strings to upper case when somebody else has already done that for you? Likewise, in an implementation like IronRuby or Ruby.NET, which sit on top of the .NET platform, you can just use .NET's builtin methods for converting strings to upper case. In an implementation like Opal, you can just use ECMAScript's methods for converting strings to upper case. And so on.
Unfortunately, unlike many other programming languages, the Ruby Language Specification does not exist as a single document in a single place. Ruby does not have a single formal specification that defines what certain language constructs mean.
There are several resources, the sum of which can be considered kind of a specification for the Ruby programming language.
Some of these resources are:
The ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification – Note that the ISO Ruby Specification was written around 2009–2010 with the specific goal that all existing Ruby implementations at the time would easily be compliant. Since YARV only implements Ruby 1.9+ and MRI only implements Ruby 1.8 and lower, this means that the ISO Ruby Specification only contains features that are common to both Ruby 1.8 and Ruby 1.9. Also, the ISO Ruby Specification was specifically intended to be minimal and only contain the features that are absolutely required for writing Ruby programs. Because of that, it does for example only specify Strings very broadly (since they have changed significantly between Ruby 1.8 and Ruby 1.9). It obviously also does not specify features which were added after the ISO Ruby Specification was written, such as Ractors or Pattern Matching.
The Ruby Spec Suite aka ruby/spec – Note that the ruby/spec is unfortunately far from complete. However, I quite like it because it is written in Ruby instead of "ISO-standardese", which is much easier to read for a Rubyist, and it doubles as an executable conformance test suite.
The Ruby Programming Language by David Flanagan and Yukihiro 'matz' Matsumoto – This book was written by David Flanagan together with Ruby's creator matz to serve as a Language Reference for Ruby.
Programming Ruby by Dave Thomas, Andy Hunt, and Chad Fowler – This book was the first English book about Ruby and served as the standard introduction and description of Ruby for a long time. This book also first documented the Ruby core library and standard library, and the authors donated that documentation back to the community.
The Ruby Issue Tracking System, specifically, the Feature sub-tracker – However, please note that unfortunately, the community is really, really bad at distinguishing between Tickets about the Ruby Programming Language and Tickets about the YARV Ruby Implementation: they both get intermingled in the tracker.
The Meeting Logs of the Ruby Developer Meetings.
New features are often discussed on the mailing lists, in particular the ruby-core (English) and ruby-dev (Japanese) mailing lists.
The Ruby documentation – Again, be aware that this documentation is generated from the source code of YARV and does not distinguish between features of Ruby and features of YARV.
In the past, there were a couple of attempts of formalizing changes to the Ruby Specification, such as the Ruby Change Request (RCR) and Ruby Enhancement Proposal (REP) processes, both of which were unsuccessful.
If all else fails, you need to check the source code of the popular Ruby implementations to see what they actually do.
For example, this is what the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification has to say about String#upcase:
15.2.10.5.42 String#upcase
upcase
Visibility: public
Behavior: The method returns a new direct instance of the class String which contains all the characters of the receiver, with all the lower-case characters replaced with the corresponding upper-case characters.
As you can see, there is no mention of instances variables or really any details at all about how the method is implemented. It only specifies the result.
If a Ruby implementor wants to use instance variables, they are allowed to use instances variables, if a Ruby implementor doesn't want to use instance variables, they are allowed to do that, too.
If you check the Ruby Spec Suite for String#upcase, you will find specifications like these (this is just an example, there are quite a few more):
describe "String#upcase" do
it "returns a copy of self with all lowercase letters upcased" do
"Hello".upcase.should == "HELLO"
"hello".upcase.should == "HELLO"
end
describe "full Unicode case mapping" do
it "works for all of Unicode with no option" do
"äöü".upcase.should == "ÄÖÜ"
end
it "updates string metadata" do
upcased = "aßet".upcase
upcased.should == "ASSET"
upcased.size.should == 5
upcased.bytesize.should == 5
upcased.ascii_only?.should be_true
end
end
end
Again, as you can see, the Spec only describes results but not mechanisms. And this is very much intentional.
The same is true for the Ruby-Doc documentation of String#upcase:
upcase(*options) → string
Returns a string containing the upcased characters in self:
s = 'Hello World!' # => "Hello World!"
s.upcase # => "HELLO WORLD!"
The casing may be affected by the given options; see Case Mapping.
There is no mention of any particular mechanism here, nor in the linked documentation about Unicode Case Mapping.
All of this only tells us how String#upcase is specified and documented, though. But how is it actually implemented? Well, lucky for us, the majority of Ruby implementations are Free and Open Source Software, or at the very least make their source code available for study.
In Rubinius, you can find the implementation of String#upcase in core/string.rb lines 819–822 and it looks like this:
def upcase
str = dup
str.upcase! || str
end
It just delegates the work to String#upcase!, so let's look at that next, it is implemented right next to String#upcase in core/string.rb lines 824–843 and looks something like this (simplified and abridged):
def upcase!
return if #num_bytes == 0
ctype = Rubinius::CType
i = 0
while i < #num_bytes
c = #data[i]
if ctype.islower(c)
#data[i] = ctype.toupper!(c)
end
i += 1
end
end
So, as you can see, this is indeed just standard Ruby code using instance variables like #num_bytes which holds the length of the String in platform bytes and #data which is an Array of platform bytes holding the actual content of the String. It uses two helper methods from the Rubinius::CType library (a library for manipulating individual characters as byte-sized integers). The "actual" conversion to upper case is done by Rubinius::CType::toupper!, which is implemented in core/ctype.rb and is extremely simple (to the point of being simplistic):
def self.toupper!(num)
num - 32
end
Another very simple example is the implementation of String#upcase in Opal, which you can find in opal/corelib/string.rb and looks like this:
def upcase
`self.toUpperCase()`
end
Opal is an implementation of Ruby for the ECMAScript platform. Opal cleverly overloads the Kernel#` method, which is normally used to spawn a sub shell (which doesn't exist in ECMAScript) and execute commands in the platform's native command language (which on the ECMAScript platform arguably is ECMAScript). In Opal, Kernel#` is instead used to inject arbitrary ECMAScript code into Ruby.
So, all that `self.toUpperCase()` does, is call the String.prototype.toUpperCase method on self, which does work because of how the String class is defined in Opal:
class ::String < `String`
In other words, Opal implements Ruby's String class by simply inheriting from ECMAScript's String "class" (really the String Constructor function) and is therefore able to very easily and elegantly reuse all the work that has been done implementing Strings in ECMAScript.
Another very simple example is TruffleRuby. Its implementation of String#upcase can be found in src/main/ruby/truffleruby/core/string.rb and looks like this:
def upcase(*options)
s = Primitive.dup_as_string_instance(self)
s.upcase!(*options)
s
end
Similar to Rubinius, String#upcase just delegates to String#upcase!, which is not surprising since TruffleRuby's core library was originally forked from Rubinius's. This is what String#upcase! looks like:
def upcase!(*options)
mapped_options = Truffle::StringOperations.validate_case_mapping_options(options, false)
Primitive.string_upcase! self, mapped_options
end
The Truffle::StringOperations::valdiate_case_mapping_options helper method is not terribly interesting, it is just used to implement the rather complex rules for what the Case Mapping Options that you can pass to the various String methods are allowed to look like. The actual "meat" of TruffleRuby's implementation of String#upcase! is just this: Primitive.string_upcase! self, mapped_options.
The syntax Primitive.some_name was agreed upon between the developers of multiple Ruby implementations as "magic" syntax within the core of the implementation itself to be able to call out from Ruby code into "primitives" or "intrinsics" that are provided by the runtime system, but are not necessarily implemented in Ruby.
In other words, all that Primitive.string_upcase! self, mapped_options tells us is "there is a magic function called string_upcase! defined somewhere deep in the bowels of TruffleRuby itself, which knows how to convert a string to upper case, but we are not supposed to know how it works".
If you are really curious, you can find the implementation of Primitive.string_upcase! in src/main/java/org/truffleruby/core/string/StringNodes.java. The code looks dauntingly long and complex, but all you really need to know is that the Truffle Language Implementation Framework is based on constructing Nodes for an AST-walking interpreter. Once you ignore all the machinery related to constructing the AST nodes, the code itself is actually fairly simple.
Once again, the implementors are relying on the fact that the Truffle Language Implementation Framework already comes with a powerful implementation of strings, which the TruffleRuby developers can simply reuse for their own strings.
By the way, this idea of "primitives" or "intrinsics" is an idea that is used in many programming language implementations. It is especially popular in the Smalltalk world. It allows you to write the definition of your methods in the language itself, which in turn allows features like reflection and tools like documentation generators and IDEs (e.g. for automatic code completion) to work without them having to understand a second language, but still have an efficient implementation in a separate language with privileged access to the internals of the implementation.
For example, because large parts of YARV are implemented in C instead of Ruby, but YARV is the implementation that the documentation on Ruby-Doc and Ruby-Lang is generated from, that means that the RDoc Ruby Documentation Generator actually needs to understand both Ruby and C. And you will notice that sometimes documentation for methods implemented in C is missing, incomplete, or corrupted. Similarly, trying to get information about methods implemented in C using Method#parameters sometimes returns non-sensical or useless results. This would not happen if YARV used something like Intrinsics instead of directly writing the methods in C.
JRuby implements String#upcase in several overloads of org.jruby.RubyString.upcase and String#upcase! in several overloads of org.jruby.RubyString.upcase_bang.
However, in the end, they all delegate to one specific overload of org.jruby.RubyString.upcase_bang defined in core/src/main/java/org/jruby/RubyString.java like this:
private IRubyObject upcase_bang(ThreadContext context, int flags) {
modifyAndKeepCodeRange();
Encoding enc = checkDummyEncoding();
if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
(flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
int s = value.getBegin();
int end = s + value.getRealSize();
byte[]bytes = value.getUnsafeBytes();
while (s < end) {
int c = bytes[s] & 0xff;
if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
bytes[s] = (byte)('A' + (c - 'a'));
flags |= Config.CASE_MODIFIED;
}
s++;
}
} else {
flags = caseMap(context.runtime, flags, enc);
if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
}
return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}
As you can see, this is is a very low-level way of implementing it.
In MRuby, the implementation looks again very different. MRuby is designed to be light-weight, small, and easy to embed into a larger application. It is also designed to be used in small embedded systems such as robots, sensors, and IoT devices. Because of that, it is designed to be very modular: a lot of the parts of MRuby are optional and are distributed as "MGems". Even parts of the core language are optional and can be left out, such as support for the catch and throw keywords, big numbers, the Dir class, meta programming, eval, the Math module, IO and File, and so on.
If we want to find out where String#upcase is implemented, we have to follow a trail of breadcrumbs. We start with the mrb_str_upcase function in src/string.c which looks like this:
static mrb_value
mrb_str_upcase(mrb_state *mrb, mrb_value self)
{
mrb_value str;
str = mrb_str_dup(mrb, self);
mrb_str_upcase_bang(mrb, str);
return str;
}
This is a pattern we have already seen a couple of times: String#upcase just duplicates the String and then delegates to String#upcase!, which is implemented just above in mrb_str_upcase_bang:
static mrb_value
mrb_str_upcase_bang(mrb_state *mrb, mrb_value str)
{
struct RString *s = mrb_str_ptr(str);
char *p, *pend;
mrb_bool modify = FALSE;
mrb_str_modify_keep_ascii(mrb, s);
p = RSTRING_PTR(str);
pend = RSTRING_END(str);
while (p < pend) {
if (ISLOWER(*p)) {
*p = TOUPPER(*p);
modify = TRUE;
}
p++;
}
if (modify) return str;
return mrb_nil_value();
}
As you can see, there is a lot of mechanic in there to extract the underlying data structure from the Ruby String object, iterate over that data structure making sure to not run over the end, etc., but the real work of actually converting to uppercase is actually performed by the TOUPPER macro defined in include/mruby.h:
#define TOUPPER(c) (ISLOWER(c) ? ((c) & 0x5f) : (c))
There you have it! That's how String#upcase works "under the hood" in five different Ruby implementations: Rubinius, Opal, TruffleRuby, JRuby, and MRuby. And it will again be different in IronRuby, YARV, RubyMotion, Ruby.NET, XRuby, MagLev, MacRuby, tinyrb, MRI, IoRuby, or any of the other Ruby implementations of present, future, and past.
This shows you that there are many different ways of approaching how to implement something like String#upcase in a Ruby implementation. There are almost as many different approaches as there are implementations!
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
Yes, we are, basically:
string = "hello" is shorthand for string = String.new("hello")
take a look at the following:
https://ruby-doc.org/core-3.1.2/String.html#method-c-new (ruby 3)
https://ruby-doc.org/core-2.3.0/String.html#method-c-new (ruby 2)
What's the difference between String.new and a string literal in Ruby?
You can also check the following (to extend the functionalities of the class):
Extend Ruby String class with method to change the contents
So the short answer is:
Dealing with built in classes (String, Array, Integer, ...etc) is almost the same thing as we do in any other class we create

Freezing string literals in early Ruby versions

This question applies in particular to Ruby 1.9 and 2.1, where String literals can not be frozen automatically. In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time, which among other advantages is said to make the program perform better. As a concrete example, this article proposes the expression
("%09d".freeze % id).scan(/\d{3}/).join("/".freeze)
I want to use this concept in our project, and for testing purpose, I tried the following code:
3.times { x="abc".freeze; puts x.object_id }
In Ruby 2.3, this prints the same object ID every time. In JRuby 1.7, which corresponds on the language level to Ruby 1.9, it prints three different object IDs, although I have explicitly frozen the string.
Could somebody explain the reason for this, and how to use freeze properly in this situation?
In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time
That is not what Object#freeze does. As the name implies, it "freezes" the object, i.e. it disallows any further modification to the object's internal state. There is nothing in the documentation that even remotely suggests that Object#freeze performs some sort of de-duplication or interning.
You may be thinking of String#-#, but this does not exist in Ruby 2.1. It was only added in Ruby 2.3, and actually had different semantics then:
Ruby 2.3–2.4: returns self if self is already frozen, otherwise returns self.dup.freeze, i.e. a frozen duplicate of the string:
-str → str (frozen)
If the string is frozen, then return the string itself.
If the string is not frozen, then duplicate the string freeze it and return it.
Ruby 2.5+: returns self if self is already frozen, otherwise returns a frozen version of the string that is de-duplicated (i.e. it may be looked up in a cache of existing frozen strings and the existing version returned):
-str → str (frozen)
Returns a frozen, possibly pre-existing copy of the string.
The string will be deduplicated as long as it is not tainted, or has any instance variables set on it.
So, the article you linked to is wrong on three counts:
De-duplication is only performed for strings, not arbitrary objects.
De-duplication is not performed by freeze.
De-duplication is only performed by String#-# starting in Ruby 2.5.
There is also a fourth claim that is wrong in that article, although we can't really blame the author for that since the article is from 2016 and the decision was only changed in 2019: Ruby 3.0 will not have immutable string literals by default.
The one thing that is correct in that article is that the # frozen_string_literal: true pragma (or the corresponding command line option --enable-frozen-string-literal) will not only freeze all static string literals, it will also de-duplicate them.

Why doesn't ruby support method overloading?

Instead of supporting method overloading Ruby overwrites existing methods. Can anyone explain why the language was designed this way?
"Overloading" is a term that simply doesn't even make sense in Ruby. It is basically a synonym for "static argument-based dispatch", but Ruby doesn't have static dispatch at all. So, the reason why Ruby doesn't support static dispatch based on the arguments, is because it doesn't support static dispatch, period. It doesn't support static dispatch of any kind, whether argument-based or otherwise.
Now, if you are not actually specifically asking about overloading, but maybe about dynamic argument-based dispatch, then the answer is: because Matz didn't implement it. Because nobody else bothered to propose it. Because nobody else bothered to implement it.
In general, dynamic argument-based dispatch in a language with optional arguments and variable-length argument lists, is very hard to get right, and even harder to keep it understandable. Even in languages with static argument-based dispatch and without optional arguments (like Java, for example), it is sometimes almost impossible to tell for a mere mortal, which overload is going to be picked.
In C#, you can actually encode any 3-SAT problem into overload resolution, which means that overload resolution in C# is NP-hard.
Now try that with dynamic dispatch, where you have the additional time dimension to keep in your head.
There are languages which dynamically dispatch based on all arguments of a procedure, as opposed to object-oriented languages, which only dispatch on the "hidden" zeroth self argument. Common Lisp, for example, dispatches on the dynamic types and even the dynamic values of all arguments. Clojure dispatches on an arbitrary function of all arguments (which BTW is extremely cool and extremely powerful).
But I don't know of any OO language with dynamic argument-based dispatch. Martin Odersky said that he might consider adding argument-based dispatch to Scala, but only if he can remove overloading at the same time and be backwards-compatible both with existing Scala code that uses overloading and compatible with Java (he especially mentioned Swing and AWT which play some extremely complex tricks exercising pretty much every nasty dark corner case of Java's rather complex overloading rules). I've had some ideas myself about adding argument-based dispatch to Ruby, but I never could figure out how to do it in a backwards-compatible manner.
Method overloading can be achieved by declaring two methods with the same name and different signatures. These different signatures can be either,
Arguments with different data types, eg: method(int a, int b) vs method(String a, String b)
Variable number of arguments, eg: method(a) vs method(a, b)
We cannot achieve method overloading using the first way because there is no data type declaration in ruby(dynamic typed language). So the only way to define the above method is def(a,b)
With the second option, it might look like we can achieve method overloading, but we can't. Let say I have two methods with different number of arguments,
def method(a); end;
def method(a, b = true); end; # second argument has a default value
method(10)
# Now the method call can match the first one as well as the second one,
# so here is the problem.
So ruby needs to maintain one method in the method look up chain with a unique name.
I presume you are looking for the ability to do this:
def my_method(arg1)
..
end
def my_method(arg1, arg2)
..
end
Ruby supports this in a different way:
def my_method(*args)
if args.length == 1
#method 1
else
#method 2
end
end
A common pattern is also to pass in options as a hash:
def my_method(options)
if options[:arg1] and options[:arg2]
#method 2
elsif options[:arg1]
#method 1
end
end
my_method arg1: 'hello', arg2: 'world'
Method overloading makes sense in a language with static typing, where you can distinguish between different types of arguments
f(1)
f('foo')
f(true)
as well as between different number of arguments
f(1)
f(1, 'foo')
f(1, 'foo', true)
The first distinction does not exist in ruby. Ruby uses dynamic typing or "duck typing". The second distinction can be handled by default arguments or by working with arguments:
def f(n, s = 'foo', flux_compensator = true)
...
end
def f(*args)
case args.size
when
...
when 2
...
when 3
...
end
end
This doesn't answer the question of why ruby doesn't have method overloading, but third-party libraries can provide it.
The contracts.ruby library allows overloading. Example adapted from the tutorial:
class Factorial
include Contracts
Contract 1 => 1
def fact(x)
x
end
Contract Num => Num
def fact(x)
x * fact(x - 1)
end
end
# try it out
Factorial.new.fact(5) # => 120
Note that this is actually more powerful than Java's overloading, because you can specify values to match (e.g. 1), not merely types.
You will see decreased performance using this though; you will have to run benchmarks to decide how much you can tolerate.
I often do the following structure :
def method(param)
case param
when String
method_for_String(param)
when Type1
method_for_Type1(param)
...
else
#default implementation
end
end
This allow the user of the object to use the clean and clear method_name : method
But if he want to optimise execution, he can directly call the correct method.
Also, it makes your test clearers and betters.
there are already great answers on why side of the question. however, if anyone looking for other solutions checkout functional-ruby gem which is inspired by Elixir pattern matching features.
class Foo
include Functional::PatternMatching
## Constructor Over loading
defn(:initialize) { #name = 'baz' }
defn(:initialize, _) {|name| #name = name.to_s }
## Method Overloading
defn(:greet, :male) {
puts "Hello, sir!"
}
defn(:greet, :female) {
puts "Hello, ma'am!"
}
end
foo = Foo.new or Foo.new('Bar')
foo.greet(:male) => "Hello, sir!"
foo.greet(:female) => "Hello, ma'am!"
I came across this nice interview with Yukihiro Matsumoto (aka. "Matz"), the creator of Ruby. Incidentally, he explains his reasoning and intention there. It is a good complement to #nkm's excellent exemplification of the problem. I have highlighted the parts that answer your question on why Ruby was designed that way:
Orthogonal versus Harmonious
Bill Venners: Dave Thomas also claimed that if I ask you to add a
feature that is orthogonal, you won't do it. What you want is
something that's harmonious. What does that mean?
Yukihiro Matsumoto: I believe consistency and orthogonality are tools
of design, not the primary goal in design.
Bill Venners: What does orthogonality mean in this context?
Yukihiro Matsumoto: An example of orthogonality is allowing any
combination of small features or syntax. For example, C++ supports
both default parameter values for functions and overloading of
function names based on parameters. Both are good features to have in
a language, but because they are orthogonal, you can apply both at the
same time. The compiler knows how to apply both at the same time. If
it's ambiguous, the compiler will flag an error. But if I look at the
code, I need to apply the rule with my brain too. I need to guess how
the compiler works. If I'm right, and I'm smart enough, it's no
problem. But if I'm not smart enough, and I'm really not, it causes
confusion. The result will be unexpected for an ordinary person. This
is an example of how orthogonality is bad.
Source: "The Philosophy of Ruby", A Conversation with Yukihiro Matsumoto, Part I
by Bill Venners, September 29, 2003 at: https://www.artima.com/intv/ruby.html
Statically typed languages support method overloading, which involves their binding at compile time. Ruby, on the other hand, is a dynamically typed language and cannot support static binding at all. In languages with optional arguments and variable-length argument lists, it is also difficult to determine which method will be invoked during dynamic argument-based dispatch. Additionally, Ruby is implemented in C, which itself does not support method overloading.

Ruby equivalent of C#'s 'yield' keyword, or, creating sequences without preallocating memory

In C#, you could do something like this:
public IEnumerable<T> GetItems<T>()
{
for (int i=0; i<10000000; i++) {
yield return i;
}
}
This returns an enumerable sequence of 10 million integers without ever allocating a collection in memory of that length.
Is there a way of doing an equivalent thing in Ruby? The specific example I am trying to deal with is the flattening of a rectangular array into a sequence of values to be enumerated. The return value does not have to be an Array or Set, but rather some kind of sequence that can only be iterated/enumerated in order, not by index. Consequently, the entire sequence need not be allocated in memory concurrently. In .NET, this is IEnumerable and IEnumerable<T>.
Any clarification on the terminology used here in the Ruby world would be helpful, as I am more familiar with .NET terminology.
EDIT
Perhaps my original question wasn't really clear enough -- I think the fact that yield has very different meanings in C# and Ruby is the cause of confusion here.
I don't want a solution that requires my method to use a block. I want a solution that has an actual return value. A return value allows convenient processing of the sequence (filtering, projection, concatenation, zipping, etc).
Here's a simple example of how I might use get_items:
things = obj.get_items.select { |i| !i.thing.nil? }.map { |i| i.thing }
In C#, any method returning IEnumerable that uses a yield return causes the compiler to generate a finite state machine behind the scenes that caters for this behaviour. I suspect something similar could be achieved using Ruby's continuations, but I haven't seen an example and am not quite clear myself on how this would be done.
It does indeed seem possible that I might use Enumerable to achieve this. A simple solution would be to us an Array (which includes module Enumerable), but I do not want to create an intermediate collection with N items in memory when it's possible to just provide them lazily and avoid any memory spike at all.
If this still doesn't make sense, then consider the above code example. get_items returns an enumeration, upon which select is called. What is passed to select is an instance that knows how to provide the next item in the sequence whenever it is needed. Importantly, the whole collection of items hasn't been calculated yet. Only when select needs an item will it ask for it, and the latent code in get_items will kick into action and provide it. This laziness carries along the chain, such that select only draws the next item from the sequence when map asks for it. As such, a long chain of operations can be performed on one data item at a time. In fact, code structured in this way can even process an infinite sequence of values without any kinds of memory errors.
So, this kind of laziness is easily coded in C#, and I don't know how to do it in Ruby.
I hope that's clearer (I'll try to avoid writing questions at 3AM in future.)
It's supported by Enumerator since Ruby 1.9 (and back-ported to 1.8.7). See Generator: Ruby.
Cliche example:
fib = Enumerator.new do |y|
y.yield i = 0
y.yield j = 1
while true
k = i + j
y.yield k
i = j
j = k
end
end
100.times { puts fib.next() }
Your specific example is equivalent to 10000000.times, but let's assume for a moment that the times method didn't exist and you wanted to implement it yourself, it'd look like this:
class Integer
def my_times
return enum_for(:my_times) unless block_given?
i=0
while i<self
yield i
i += 1
end
end
end
10000.my_times # Returns an Enumerable which will let
# you iterate of the numbers from 0 to 10000 (exclusive)
Edit: To clarify my answer a bit:
In the above example my_times can be (and is) used without a block and it will return an Enumerable object, which will let you iterate over the numbers from 0 to n. So it is exactly equivalent to your example in C#.
This works using the enum_for method. The enum_for method takes as its argument the name of a method, which will yield some items. It then returns an instance of class Enumerator (which includes the module Enumerable), which when iterated over will execute the given method and give you the items which were yielded by the method. Note that if you only iterate over the first x items of the enumerable, the method will only execute until x items have been yielded (i.e. only as much as necessary of the method will be executed) and if you iterate over the enumerable twice, the method will be executed twice.
In 1.8.7+ it has become to define methods, which yield items, so that when called without a block, they will return an Enumerator which will let the user iterate over those items lazily. This is done by adding the line return enum_for(:name_of_this_method) unless block_given? to the beginning of the method like I did in my example.
Without having much ruby experience, what C# does in yield return is usually known as lazy evaluation or lazy execution: providing answers only as they are needed. It's not about allocating memory, it's about deferring computation until actually needed, expressed in a way similar to simple linear execution (rather than the underlying iterator-with-state-saving).
A quick google turned up a ruby library in beta. See if it's what you want.
C# ripped the 'yield' keyword right out of Ruby- see Implementing Iterators here for more.
As for your actual problem, you have presumably an array of arrays and you want to create a one-way iteration over the complete length of the list? Perhaps worth looking at array.flatten as a starting point - if the performance is alright then you probably don't need to go too much further.

Ruby's yield feature in relation to computer science

I recently discovered Ruby's blocks and yielding features, and I was wondering: where does this fit in terms of computer science theory? Is it a functional programming technique, or something more specific?
Ruby's yield is not an iterator like in C# and Python. yield itself is actually a really simple concept once you understand how blocks work in Ruby.
Yes, blocks are a functional programming feature, even though Ruby is not properly a functional language. In fact, Ruby uses the method lambda to create block objects, which is borrowed from Lisp's syntax for creating anonymous functions — which is what blocks are. From a computer science standpoint, Ruby's blocks (and Lisp's lambda functions) are closures. In Ruby, methods usually take only one block. (You can pass more, but it's awkward.)
The yield keyword in Ruby is just a way of calling a block that's been given to a method. These two examples are equivalent:
def with_log
output = yield # We're calling our block here with yield
puts "Returned value is #{output}"
end
def with_log(&stuff_to_do) # the & tells Ruby to convert into
# an object without calling lambda
output = stuff_to_do.call # We're explicitly calling the block here
puts "Returned value is #{output}"
end
In the first case, we're just assuming there's a block and say to call it. In the other, Ruby wraps the block in an object and passes it as an argument. The first is more efficient and readable, but they're effectively the same. You'd call either one like this:
with_log do
a = 5
other_num = gets.to_i
#my_var = a + other_num
end
And it would print the value that wound up getting assigned to #my_var. (OK, so that's a completely stupid function, but I think you get the idea.)
Blocks are used for a lot of things in Ruby. Almost every place you'd use a loop in a language like Java, it's replaced in Ruby with methods that take blocks. For example,
[1,2,3].each {|value| print value} # prints "123"
[1,2,3].map {|value| 2**value} # returns [2, 4, 8]
[1,2,3].reject {|value| value % 2 == 0} # returns [1, 3]
As Andrew noted, it's also commonly used for opening files and many other places. Basically anytime you have a standard function that could use some custom logic (like sorting an array or processing a file), you'll use a block. There are other uses too, but this answer is already so long I'm afraid it will cause heart attacks in readers with weaker constitutions. Hopefully this clears up the confusion on this topic.
There's more to yield and blocks than mere looping.
The series Enumerating enumerable has a series of things you can do with enumerations, such as asking if a statement is true for any member of a group, or if it's true for all the members, or searching for any or all members meeting a certain condition.
Blocks are also useful for variable scope. Rather than merely being convenient, it can help with good design. For example, the code
File.open("filename", "w") do |f|
f.puts "text"
end
ensures that the file stream is closed when you're finished with it, even if an exception occurs, and that the variable is out of scope once you're finished with it.
A casual google didn't come up with a good blog post about blocks and yields in ruby. I don't know why.
Response to comment:
I suspect it gets closed because of the block ending, not because the variable goes out of scope.
My understanding is that nothing special happens when the last variable pointing to an object goes out of scope, apart from that object being eligible for garbage collection. I don't know how to confirm this, though.
I can show that the file object gets closed before it gets garbage collected, which usually doesn't happen immediately. In the following example, you can see that a file object is closed in the second puts statement, but it hasn't been garbage collected.
g = nil
File.open("/dev/null") do |f|
puts f.inspect # #<File:/dev/null>
puts f.object_id # Some number like 70233884832420
g = f
end
puts g.inspect # #<File:/dev/null (closed)>
puts g.object_id # The exact same number as the one printed out above,
# indicating that g points to the exact same object that f pointed to
I think the yield statement originated from the CLU language. I always wonder if the character from Tron was named after CLU too....
I think 'coroutine' is the keyword you're looking for.
E.g. http://en.wikipedia.org/wiki/Yield
Yield in computing and information science:
in computer science, a point of return (and re-entry) of a coroutine

Resources