nil.to_s Produces a Frozen String? - ruby

I'm curious. Is it surprising that the snippet below yields a FrozenError? The magic comment # frozen_string_literal: true is not present.
n = nil
s = n.to_s
s.force_encoding('UTF-8')

This was added in Ruby 2.7 -- It's documented explicitly in the release notes.
Module#name, true.to_s, false.to_s, and nil.to_s now always return a frozen String. The returned String is always the same for a given object. [Experimental] [Feature #16150]
The linked issue has additional reasoning behind the change:
Much of the time when a user calls to_s, they are just looking for a simple string representation to display or to interpolate into another string. In my brief exploration, the result of to_s is rarely mutated directly.
It seems that we could save a lot of objects by providing a way to explicitly request a frozen string.
...
This would reduce string allocations dramatically when applied to many common to_s calls.
In summary, it reduces object allocations, which reduces garbage collection overhead, which improves performance.

Related

Freezing string literals in early Ruby versions

This question applies in particular to Ruby 1.9 and 2.1, where String literals can not be frozen automatically. In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time, which among other advantages is said to make the program perform better. As a concrete example, this article proposes the expression
("%09d".freeze % id).scan(/\d{3}/).join("/".freeze)
I want to use this concept in our project, and for testing purpose, I tried the following code:
3.times { x="abc".freeze; puts x.object_id }
In Ruby 2.3, this prints the same object ID every time. In JRuby 1.7, which corresponds on the language level to Ruby 1.9, it prints three different object IDs, although I have explicitly frozen the string.
Could somebody explain the reason for this, and how to use freeze properly in this situation?
In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time
That is not what Object#freeze does. As the name implies, it "freezes" the object, i.e. it disallows any further modification to the object's internal state. There is nothing in the documentation that even remotely suggests that Object#freeze performs some sort of de-duplication or interning.
You may be thinking of String#-#, but this does not exist in Ruby 2.1. It was only added in Ruby 2.3, and actually had different semantics then:
Ruby 2.3–2.4: returns self if self is already frozen, otherwise returns self.dup.freeze, i.e. a frozen duplicate of the string:
-str → str (frozen)
If the string is frozen, then return the string itself.
If the string is not frozen, then duplicate the string freeze it and return it.
Ruby 2.5+: returns self if self is already frozen, otherwise returns a frozen version of the string that is de-duplicated (i.e. it may be looked up in a cache of existing frozen strings and the existing version returned):
-str → str (frozen)
Returns a frozen, possibly pre-existing copy of the string.
The string will be deduplicated as long as it is not tainted, or has any instance variables set on it.
So, the article you linked to is wrong on three counts:
De-duplication is only performed for strings, not arbitrary objects.
De-duplication is not performed by freeze.
De-duplication is only performed by String#-# starting in Ruby 2.5.
There is also a fourth claim that is wrong in that article, although we can't really blame the author for that since the article is from 2016 and the decision was only changed in 2019: Ruby 3.0 will not have immutable string literals by default.
The one thing that is correct in that article is that the # frozen_string_literal: true pragma (or the corresponding command line option --enable-frozen-string-literal) will not only freeze all static string literals, it will also de-duplicate them.

Ruby to_s a special method? [duplicate]

I'm learning Ruby and I've seen a couple of methods that are confusing me a bit, particularly to_s vs to_str (and similarly, to_i/to_int, to_a/to_ary, & to_h/to_hash). What I've read explains that the shorter form (e.g. to_s) are for explicit conversions while the longer form are for implicit conversions.
I don't really understand how to_str would actually be used. Would something other than a String ever define to_str? Can you give a practical application for this method?
Note first that all of this applies to each pair of “short” (e.g. to_s/to_i/to_a/to_h) vs. “long” (e.g. to_str/to_int/to_ary/to_hash) coercion methods in Ruby (for their respective types) as they all have the same semantics.
They have different meanings. You should not implement to_str unless your object acts like a string, rather than just being representable by a string. The only core class that implements to_str is String itself.
From Programming Ruby (quoted from this blog post, which is worth reading all of):
[to_i and to_s] are not particularly strict: if an object has some kind of decent representation as a string, for example, it will probably have a to_s method… [to_int and to_str] are strict conversion functions: you implement them only if [your] object can naturally be used every place a string or an integer could be used.
Older Ruby documentation from the Pickaxe has this to say:
Unlike to_s, which is supported by almost all classes, to_str is normally implemented only by those classes that act like strings.
For example, in addition to Integer, both Float & Numeric implement to_int (to_i's equivalent of to_str) because both of them can readily substituted for an Integer (they are all actually numbers). Unless your class has a similarly tight relationship with String, you should not implement to_str.
To understand if you should use/implement to_s/to_str, let's look at some exemples. It is revealing to consider when these method fail.
1.to_s # returns "1"
Object.new.to_s # returns "#<Object:0x4932990>"
1.to_str # raises NoMethodError
Object.new.to_str # raises NoMethodError
As we can see, to_s is happy to turn any object into a string. On the other hand, to_str raises an error when its parameter does not look like a string.
Now let us look at Array#join.
[1,2].join(',') # returns "1,2"
[1,2].join(3) # fails, the argument does not look like a valid separator.
It is useful that Array#join converts to string the items in the array (whatever they really are) before joining them, so Array#join calls to_s on them.
However, the separator is supposed to be a string -- someone calling [1,2].join(3) is likely to be making a mistake. This is why Array#join calls to_str on the separator.
The same principle seems to hold for the other methods. Consider to_a/to_ary on a hash:
{1,2}.to_a # returns [[1, 2]], an array that describes the hash
{1,2}.to_ary # fails, because a hash is not really an array.
In summary, here is how I see it:
call to_s to get a string that describes the object.
call to_str to verify that an object really acts like a string.
implement to_s when you can build a string that describes your object.
implement to_str when your object can fully behave like a string.
I think a case when you could implement to_str yourself is maybe a ColoredString class -- a string that has a color attached to it. If it seems clear to you that passing a colored comma to join is not a mistake and should result in "1,2" (even though that string would not be colored), then do implement to_str on ColoredString.
Zverok has a great easily understandable article about when to use what (explained with to_h and to_hash).
It has to do whether your Object implementing those methods can be converted to a string
-> use to_s
or it is a type of some (enhanced) string
-> use to_str
I've seen a meaningful usage of to_hash in practice for the Configuration class in the gem 'configuration' (GitHub and Configuration.rb)
It represents -- as the name says -- the provided configuration, which in fact is a kind of hash (with additional features), rather than being convertible to one.

Why freezing hash literal is not the same as freezing string literal?

I have been reading about ways to reduce memory usage in my Ruby/Rails app, and one thing that is mentioned is freezing objects. I have tried the code below (MRI, Ruby 2.3.3) and it does save memory, according to Activity Monitor, compared to not freezing the string.
pipeline = []
100_000.times { pipeline << 'hello world'.freeze }
However, if I try the same with a hash literal, it uses lots of memory, unless I assign the hash to a variable and freeze it before.
pipeline = []
100_000.times { pipeline << {hello: 'world'}.freeze } # Uses about 25MB
my_hash = {hello: 'world'}
my_hash.freeze
100_000.times { pipeline << my_hash} # This uses about 1MB
Can anyone explain why? I always thought the string case was a bit strange, because it looks like you're simply creating lots of different string objects, freezing each one separately, and adding lots of frozen objects to the array. Don't know why it works, but hey, it did. Now, the hash case is more in line with what I expected, but I don't know why it won't behave like the string.
It's probably the case that the Ruby optimizer can identify that string as being the same from one loop to the next, but it's unable to identify that hash as being identical so it makes new ones. In the second variant you literally use the same hash so the optimizer can handle it.
For proof, look at this:
pipeline = []
100_000.times { pipeline << 'hello world'.freeze }
pipeline.map(&:object_id).uniq.length
# => 1
That's an array of identical objects, one allocation only.
pipeline = []
100_000.times { pipeline << {hello: 'world'}.freeze }
pipeline.map(&:object_id).uniq.length
# => 100000
That's 100,000 different objects.
Can anyone explain why? I always thought the string case was a bit strange, because it looks like you're simply creating lots of different string objects, freezing each one separately, and adding lots of frozen objects to the array.
The expression form
'string literal'.freeze
is a special expression form that is special-cased by the language. It not only freezes the string object, it also performs de-duplication. (Similar to symbols.)
It is a special-cased expression form. It is not evaluating the string literal and then sending it the message freeze. Rather, it is treated as a single entity, a different form of string literal if you will.
In fact, the original proposal did introduce a different form of string literal like this:
'string literal'f
The proposal was changed to make it forwards-compatible: 'foo'f would be a syntax error, if you had to run your code in older versions of Ruby, whereas 'foo'.freeze just works the same way in older versions of Ruby, it only uses more memory.
Note: this means it only works for literals. Here, the string is de-duplicated:
'foo'.freeze
Here, it is not:
foo = 'foo'
foo.freeze
Don't know why it works, but hey, it did.
Basically, it works, because the language specification says so.
Now, the hash case is more in line with what I expected, but I don't know why it won't behave like the string.
Again, it doesn't work, because the language specification only special-cases string literals.

Why can't I overwrite self in the Integer class?

I want to be able to write number.incr, like so:
num = 1; num.incr; num
#=> 2
The error I'm seeing states:
Can't change the value of self
If that's true, how do bang! methods work?
You cannot change the value of self
An object is a class pointer and a set of instance methods (note that this link is an old version of Ruby, because its dramatically simpler, and thus better for explanatory purposes).
"Pointing" at an object means you have a variable which stores the object's location in memory. Then to do anything with the object, you first go to the location in memory (we might say "follow the pointer") to get the object, and then do the thing (e.g. invoke a method, set an ivar).
All Ruby code everywhere is executing in the context of some object. This is where your instance variables get saved, it's where Ruby looks for methods that don't have a receiver (e.g. $stdout is the receiver in $stdout.puts "hi", and the current object is the receiver in puts "hi"). Sometimes you need to do something with the current object. The way to work with objects is through variables, but what variable points at the current object? There isn't one. To fill this need, the keyword self is provided.
self acts like a variable in that it points at the location of the current object. But it is not like a variable, because you can't assign it new value. If you could, the code after that point would suddenly be operating on a different object, which is confusing and has no benefits over just using a variable.
Also remember that the object is tracked by variables which store memory addresses. What is self = 2 supposed to mean? Does it only mean that the current code operates as if it were invoked 2? Or does it mean that all variables pointing at the old object now have their values updated to point at the new one? It isn't really clear, but the former unnecessarily introduces an identity crisis, and the latter is prohibitively expensive and introduce situations where it's unclear what is correct (I'll go into that a bit more below).
You cannot mutate Fixnums
Some objects are special at the C level in Ruby (false, true, nil, fixnums, and symbols).
Variables pointing at them don't actually store a memory location. Instead, the address itself stores the type and identity of the object. Wherever it matters, Ruby checks to see if it's a special object (e.g. when looking up an instance variable), and then extracts the value from it.
So there isn't a spot in memory where the object 123 is stored. Which means self contains the idea of Fixnum 123 rather than a memory address like usual. As with variables, it will get checked for and handled specially when necessary.
Because of this, you cannot mutate the object itself (though it appears they keep a special global variable to allow you to set instance variables on things like Symbols).
Why are they doing all of this? To improve performance, I assume. A number stored in a register is just a series of bits (typically 32 or 64), which means there are hardware instructions for things like addition and multiplication. That is to say the ALU, is wired to perform these operations in a single clock cycle, rather than writing the algorithms with software, which would take many orders of magnitude longer. By storing them like this, they avoid the cost of storing and looking the object in memory, and they gain the advantage that they can directly add the two pointers using hardware. Note, however, that there are still some additional costs in Ruby, that you don't have in C (e.g. checking for overflow and converting result to Bignum).
Bang methods
You can put a bang at the end of any method. It doesn't require the object to change, it's just that people usually try to warn you when you're doing something that could have unexpected side-effects.
class C
def initialize(val)
#val = val # => 12
end # => :initialize
def bang_method!
"My val is: #{#val}" # => "My val is: 12"
end # => :bang_method!
end # => :bang_method!
c = C.new 12 # => #<C:0x007fdac48a7428 #val=12>
c.bang_method! # => "My val is: 12"
c # => #<C:0x007fdac48a7428 #val=12>
Also, there are no bang methods on integers, It wouldn't fit with the paradigm
Fixnum.instance_methods.grep(/!$/) # => [:!]
# Okay, there's one, but it's actually a boolean negation
1.! # => false
# And it's not a Fixnum method, it's an inherited boolean operator
1.method(:!).owner # => BasicObject
# In really, you call it this way, the interpreter translates it
!1 # => false
Alternatives
Make a wrapper object: I'm not going to advocate this one, but it's the closest to what you're trying to do. Basically create your own class, which is mutable, and then make it look like an integer. There's a great blog post walking through this at http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html it will get you 95% of the way there
Don't depend directly on the value of a Fixnum: I can't give better advice than this without knowing what you're trying to do / why you feel this is a need.
Also, you should show your code when you ask questions like this. I misunderstood how you were approaching it for a long time.
It's simply impossible to change self to another object. self is the receiver of the message send. There can be only one.
If that's true, how do bang! methods work?
The bang (!) is simply part of the method name. It has absolutely no special meaning whatsoever. It is a convention among Ruby programmers to name surprising variants of less surprising methods with a bang, but that's just that: a convention.

Do all methods have to return a meaningful value?

Here's a snippet of code from the pickaxe book:
def count_frequency(word_list) counts = Hash.new(0)
for word in word_list
counts[word] += 1
end
counts
end
The counts at the end sets the return value of the method. The value returned is the value of the last calculation.
However, are there not cases where we don't care what the return value of a method is? For example, I have a pair of nested each loops that draw a checkerboard to console. The values of the calculations are fairly meaningless outside the method. I just want a checkerboard drawn.
Is it bad to leave the return value up to circumstance, or should I always be trying to explicitly design methods that return meaningful values?
You don't have to care about the return value if that method is not used as such with a certain expected value. Nothing to worry about.
But for your counts example, returning that value is the whole point of the method. If the method didn't return that value, then it is meaningless, and you definitely need that counts at the end.
There are some cases when the return value is not the main purpose of the method but you still want to return a certain value. One such case is when the method is intended to be used in a jQuery-style method chain like this:
some_object.do_this(args).do_that(args).then_do_this
In such case, it is important that you return the receiver. This happens in certain libraries or frameworks, but unless you specifically intent it to be used that way, you don't necessarily have to do it that way.
No, the return value is not necessary when the method isn't supposed to return a meaningful value, just like most other programming languages.
In fact, one of the most common methods, puts, returns nil.
#puts "hello"
hello
=> nil
No.
In Ruby, every expression returns a value, even if it is just nil. Not just methods; every line you write. In the case of methods, the value returned is the last value evaluated before it exits. The meaning of that value is up to you. If you document that the method has no return value, then even though it does return a value it is undefined; not part of the API and the caller would be wise not to make use of it.
For example, even nil can have proper meaning if you document it; it is often used to signal that a resource could not be found. However, if a method's sole purpose is to perform a side effect like writing to a file, it will probably something that has no real meaning; puts returns nil.
Theoretically, if you document clearly that the return value is meaningless, then you could just incidentally return whatever the last expression in the method happens to evaluate to.
Practically, however, nobody reads documentation, so, if your method does return something, then people will come to depend on it. Also, depending on what exactly it is that you are "accidentally" returning, you might leak private internal implementation details of your method or you might even break encapsulation of your object by e.g. returning the value of a private instance variable.
Take the defined? unary prefix operator, for example. It is specified as returning either a trueish or a falseish value. However, on MRI, it does not just return any trueish value, it actually returns a String describing the kind of expression that is asked about (e.g. 'local-variable', 'method', etc.) And people have become so dependent on this return value that all other Ruby implementations just have to mimic it, even though it is nowhere documented. Now, it turns out that for MRI this information is trivially available, but for JRuby it is not, and keeping this information around incurs a performance penalty.
The E programming language is a purely expression-based language like Ruby or Lisp. Everything is an expression, there are no statements. Everything returns a value. However, unlike those other languages, the implicit return value of a subroutine is not the value of the last expression evaluated inside the subroutine, it is nil. You must explicitly return a value if you want to return something meaningful. That is because the creator of E believes that it is too dangerous to accidentally return something you didn't want. (E is explicitly designed for security, safety, integrity and reliability.)

Resources