Why does `ObjectSpace.each_object(String)` include just about any string? - ruby

After comments by Mike H-R and Stefan to a question of mine, I noticed that ObjectSpace.each_object(String) includes just about any string I can think of:
strings = ObjectSpace.each_object(String)
strings.include?("some random string") # => true
or
strings = ObjectSpace.each_object(String).to_a
strings.include?("some random string") # => true
I thought that strings should include only strings that existed at that point. Why does it include just about any string?
Yet, when I count the length of strings, it returns a finite number:
ObjectSpace.each_object(String).to_a.length # => 15780
This is observed on Ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux] interpreter and irb.
Does this have anything to do with frozen string literal optimization introduced in Ruby 2.1?

When writing code in the IRB strings are added to the ObjectSpace as they are typed:
strings = ObjectSpace.each_object(String)
strings.include?("some random string") # => true
strings = ObjectSpace.each_object(String).to_a
strings.include?("some other random string") # => false
When trying to do it inside an rb file, the text is already there, because it is added when the file is parsed.
test.rb
strings = ObjectSpace.each_object(String)
strings.include?("some random string") # => true
strings = ObjectSpace.each_object(String).to_a
strings.include?("some other random string") # => true
strings = ObjectSpace.each_object(String).to_a
strings.include?("some other random string " + "dynamically built") # => false

That's because in order to pass "some random string" to the include? method on ObjectSpace's each_object iterator, you have to create the string "some random string" first.
Simply by asking ObjectSpace about the existence of "some random string", you are creating "some random string", so of course it exists in the object space. See what I'm saying? So that explains your first example.
In your second example when you get the array of string objects before referencing "some random string", you would think that you'd get false. As you noted though, that's not the case. I assume this is because you are using a string literal, and Ruby is optimizing your code by creating the string before you actually reference it. I don't know enough about Ruby's internals though to go into specifics on that.

Related

Ruby: What does the comment "frozen_string_literal: true" do?

This is the rspec binstub in my project directory.
#!/usr/bin/env ruby
begin
load File.expand_path("../spring", __FILE__)
rescue LoadError
end
# frozen_string_literal: true
#
# This file was generated by Bundler.
#
# The application 'rspec' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require "pathname"
ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
Pathname.new(__FILE__).realpath)
require "rubygems"
require "bundler/setup"
load Gem.bin_path("rspec-core", "rspec")
What is this intended to do?
# frozen_string_literal: true
# frozen_string_literal: true is a magic comment, supported for the first time in Ruby 2.3, that tells Ruby that all string literals in the file are implicitly frozen, as if #freeze had been called on each of them. That is, if a string literal is defined in a file with this comment, and you call a method on that string which modifies it, such as <<, you'll get RuntimeError: can't modify frozen String.
The comment must be on the first line of the file.
In Ruby 2.3, you can use this magic comment to prepare for frozen string literals being the default in Ruby 3.
In Ruby 2.3 run with the --enable=frozen-string-literal flag, and in Ruby 3, string literals are frozen in all files. You can override the global setting with # frozen_string_literal: false.
If you want a string literal to be mutable regardless of the global or per-file setting, you can prefix it with the unary + operator (being careful with operator precedence) or call .dup on it:
# frozen_string_literal: true
"".frozen?
=> true
(+"").frozen?
=> false
"".dup.frozen?
=> false
You can also freeze a mutable (unfrozen) string with unary -.
Source: magic_comment defined in ruby/ruby
It improves application performance by not allocating new space for the same string, thereby also saving time for garbage collection chores. How? when you freeze a string literal(string object), you're telling Ruby to not let any of your programs modify the string literal (object).
Some obvious observations to keep in mind.
1. By freezing string literals, you're not allocating new memory space for it.
Example:
Without magic comment allocates new space for the same string
(Observe the different object IDs printed)
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358500
With magic comment, ruby allocates space only once
# frozen_string_literal: true
def hello_id
a = 'hello'
a.object_id
end
puts hello_id #=> 70244568358640
puts hello_id #=> 70244568358640
2. By freezing string literals, your program will raise an exception when trying to modify the string literal.
Example:
Without magic comment, you can modify the string literals.
name = 'Johny'
name << ' Cash'
puts name #=> Johny Cash
With magic comment, an exception will be raised when you modify string literals
# frozen_string_literal: true
name = 'john'
name << ' cash' #=> `<main>': can't modify frozen String (FrozenError)
puts name
There's always more to learn and be flexible:
https://bugs.ruby-lang.org/issues/8976
https://www.mikeperham.com/2018/02/28/ruby-optimization-with-one-magic-comment/
In Ruby 3.0. Matz (Ruby’s creator) decided to make all String literals frozen by default.
EDIT 2019: he decided to abandon the idea of making frozen-string-literals default for Ruby 3.0 (source: https://bugs.ruby-lang.org/issues/11473#note-53)
You can use in Ruby 2.x. Just add this comment in the first line of your files.
# frozen_string_literal: true
The above comment at top of a file changes semantics of static string
literals in the file. The static string literals will be frozen and
always returns same object. (The semantics of dynamic string literals
is not changed.)
This way has following benefits:
No ugly f-suffix.
No syntax error on older Ruby.
We need only a line
for each file.
Plese, read this topic for more information.
https://bugs.ruby-lang.org/issues/8976

Check the string with hash key

I am using Ruby 1.9.
I have a hash:
Hash_List={"ruby"=>"fun to learn","the rails"=>"It is a framework"}
I have a string like this:
test_string="I am learning the ruby by myself and also the rails."
I need to check if test_string contains words that match the keys of Hash_List. And if it does, replace the words with the matching hash value.
I used this code to check, but it is returning them empty:
another_hash=Hash_List.select{|key,value| key.include? test_string}
OK, hold onto your hat:
HASH_LIST = {
"ruby" => "fun to learn",
"the rails" => "It is a framework"
}
test_string = "I am learning the ruby by myself and also the rails."
keys_regex = /\b (?:#{Regexp.union(HASH_LIST.keys).source}) \b/x # => /\b (?:ruby|the\ rails) \b/x
test_string.gsub(keys_regex, HASH_LIST) # => "I am learning the fun to learn by myself and also It is a framework."
Ruby's got some great tricks up its sleeve, one of which is how we can throw a regular expression and a hash at gsub, and it'll search for every match of the regular expression, look up the matching "hits" as keys in the hash, and substitute the values back into the string:
gsub(pattern, hash) → new_str
...If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string....
Regexp.union(HASH_LIST.keys) # => /ruby|the\ rails/
Regexp.union(HASH_LIST.keys).source # => "ruby|the\\ rails"
Note that the first returns a regular expression and the second returns a string. This is important when we embed them into another regular expression:
/#{Regexp.union(HASH_LIST.keys)}/ # => /(?-mix:ruby|the\ rails)/
/#{Regexp.union(HASH_LIST.keys).source}/ # => /ruby|the\ rails/
The first can quietly destroy what you think is a simple search, because of the ?-mix: flags, which ends up embedding different flags inside the pattern.
The Regexp documentation covers all this well.
This capability is the core to making an extremely high-speed templating routine in Ruby.
You could do that as follows:
Hash_List.each_with_object(test_string.dup) { |(k,v),s| s.sub!(/#{k}/, v) }
#=> "I am learning the fun to learn by myself and also It is a framework."
First, follow naming conventions. Variables are snake_case, and names of classes are CamelCase.
hash = {"ruby" => "fun to learn", "rails" => "It is a framework"}
words = test_string.split(' ') # => ["I", "am", "learning", ...]
another_hash = hash.select{|key,value| words.include?(key)}
Answering your question: split your test string in words with #split and then check whether words include a key.
For checking if the string is substring of another string use String#[String] method:
another_hash = hash.select{|key, value| test_string[key]}

How to use a regex to match a symbol like a string literal in a hash object?

Using str as the key in the hash works fine.
string = 'The dog and cat'
replace = {'dog' => 'woof', 'cat' => 'meow'}
replace.default = 'unknown'
string.gsub(/\w+/, replace)
# => "unknown woof unknown meow"
How do I get the same result with a sym as the hash key?
string = 'The dog and cat'
replace = {dog: 'woof', cat: 'meow'}
replace.default = 'unknown'
string.gsub(/\w+/, replace)
# => "unknown unknown unknown unknown" (actual result)
# => "unknown woof unknown meow" using symbols as hash keys? (desired result)
Some attempts I made:
string.gsub(/(\w+)/, replace[:$1])
# => "unknown unknown unknown unknown"
string.split.map(&:to_sym).to_s.gsub(/\w+/, replace)
# => "[:unknown, :unknown, :unknown, :unknown]"
In the below code :
string = 'The dog and cat'
replace = {dog: 'woof', cat: 'meow'}
replace.default = 'unknown'
string.gsub(/\w+/, replace)
You made the hash replace, default value as 'unknown'. It means, when you will look for a value from the hash replace, if the key is not present, then the hash would return you unknown as value. It will work this way as you defined the hash in this way.
Now #gsub method, giving all the word match like 'the', 'dog' etc, but none of those string are key to your hash replace, thus as I said above, hash replace will return the default value every time unknown.
Remember - In the hash replace = {dog: 'woof', cat: 'meow'}, keys are symbols, like :dog, :cat. But #gsub gives you all the match as a string.
Thus to make it work you need to use block as below :
string = 'The dog and cat'
replace = {dog: 'woof', cat: 'meow'}
replace.default = 'unknown'
string.gsub(/\w+/) { |m| replace[m.to_sym] }
# => "unknown woof unknown meow"
If you want to keep all the other words as-is, but just replace what's in the hash, you can do this:
replace = {'dog' => 'woof', 'cat' => 'meow'}
my_string = 'The dog and cat'
my_string.gsub(/\w+/) {|m| (replace.key? m) ? replace[m] : m}
Which will yield:
The woof and meow
Or more compactly (thanks to #Arup for the hint):
my_string.gsub(/\w+/) {|m| replace.fetch(m, m)}
It's not clear why you'd want to replace "unknown" words with unknown; That's not something very useful in normal string processing or template-processing code. As a result it seems like you're asking an X-Y question, wanting to know how to do "Y" when you should really be doing "X".
Here's some code, and information that could be helpful:
string = 'The dog and cat'
replacement_hash = {dog: 'woof', cat: 'meow'}
First, be very careful using a variable with the same name as a method. Ruby can figure out what you mean usually, but your brain will fail to always make the connection. The brains of people who use, or maintain, your code might not do as well. So, instead of a hash called replace, use something like replacement_hash.
Normally we don't want to define a hash using symbols as the keys if we're going to be doing search and replace actions against a string. Instead we'd want the actual string values; It's more straight-forward that way. That said, this will walk through a hash with symbols as keys, and generate a new hash using the equivalent strings as keys:
replacement_hash2 = Hash[replacement_hash.keys.map(&:to_s).zip(replacement_hash.values)] # => {"dog"=>"woof", "cat"=>"meow"}
We give gsub a regular expression, and it will walk through the string looking for matches to that pattern. One of its cool features is that we can give it a hash, and for each match found it will look through the associated hash and return the value for matching keys.
Here's how to easily build a pattern that matches the keys in the hash. I'm using a case-insensitive pattern, but YMMV:
key_regex = /\b(?:#{ Regexp.union(replacement_hash.keys.map(&:to_s)).source })\b/i # => /\b(?:dog|cat)\b/i
In its most basic form, here's a gsub that uses a pattern to look up values in a hash:
string.gsub(key_regex, replacement_hash2) # => "The woof and meow"
It's also possible to search the string using a pattern, then pass the "hits" to a block, which then computes the needed replacements:
string.gsub(key_regex) { |w| replacement_hash2[w] } # => "The woof and meow"
or:
string.gsub(key_regex) { |w| replacement_hash[w.to_sym] } # => "The woof and meow"
But wait! There's more!
If you don't want the surgical approach, you can also use a more generic ("shotgun approach") regex pattern and handle both hits and misses when looking in the hash:
string.gsub(/\S+/) { |w| replacement_hash[w.to_sym] || 'unknown' } # => "unknown woof unknown meow"
That turns your original code into a single, simple, line of code. Change the regexp as necessary.
Meditate on this is the contents of the block above don't make sense:
replacement_hash[:dog] # => "woof"
replacement_hash[:foo] # => nil
nil || 'unknown' # => "unknown"
Notice, that it's important to convert the suspect/target word matched into a symbol using your replacement_hash hash. That can get sticky, or convoluted, because some strings don't convert to symbols cleanly and result in a double-quoted symbol. You'd have to account for that in your hash definition:
'foo'.to_sym # => :foo
'foo_bar'.to_sym # => :foo_bar
'foo-bar'.to_sym # => :"foo-bar"
'foo bar'.to_sym # => :"foo bar"

Substring syntaxes in Ruby

Python has the following elegant syntax for checking whether one string is a substring of another one:
'ab' in 'abc' # True
Is there an equivalent elegant syntax in Ruby?
I'm aware to the "abc".includes? "ab" Ruby syntax, but I'm wondering whether the inverse syntax exists too (where the first parameter is the substring and the second is the string).
There isn't such method in Ruby standard library, but Rails ActiveSupport provides #.in? method:
1.9.3-p484 :004 > "ab".in? "abc"
=> true
Here is the source code: https://github.com/rails/rails/blob/e20dd73df42d63b206d221e2258cc6dc7b1e6068/activesupport/lib/active_support/core_ext/object/inclusion.rb
Define "elegant".
This does a sub-string search and returns the "hit" if found:
'abc'['ab'] # => "ab"
Using !! converts the value returned to a true/false, so "ab" becomes true:
!!'abc'['ab'] # => true
Knowing that, it's trivial to add it in if you want something closer:
class String
def in?(other)
!!other[self]
end
end
'ab'.in?('abc') # => true
'ab'.in? 'abc' # => true
Or, use require 'active_support/core_ext/object/inclusion' to cherry-pick the Active Suport definition that extends all objects to allow in?. See http://edgeguides.rubyonrails.org/active_support_core_extensions.html#in-questionmark. The upside/downside to that it's modifying all objects.

Ruby: String no longer mixes in Enumerable in 1.9

So how can I still be able to write beautiful code such as:
'im a string meing!'.pop
Note: str.chop isn't sufficient answer
It is not what an enumerable string atually enumerates. Is a string a sequence of ...
lines,
characters,
codepoints or
bytes?
The answer is: all of those, any of those, either of those or neither of those, depending on the context. Therefore, you have to tell Ruby which of those you actually want.
There are several methods in the String class which return enumerators for any of the above. If you want the pre-1.9 behavior, your code sample would be
'im a string meing!'.bytes.to_a.pop
This looks kind of ugly, but there is a reason for it: a string is a sequence. You are treating it as a stack. A stack is not a sequence, in fact it pretty much is the opposite of a sequence.
That's not beautiful :)
Also #pop is not part of Enumerable, it's part of Array.
The reason why String is not enumerable is because there are no 'natural' units to enumerate, should it be on a character basis or a line basis? Because of this String does not have an #each
String instead provides the #each_char and #each_byte and #each_line methods for iteration in the way that you choose.
Since you don't like str[str.length], how about
'im a string meing!'[-1] # returns last character as a character value
or
'im a string meing!'[-1,1] # returns last character as a string
or, if you need it modified in place as well, while keeping it an easy one-liner:
class String
def pop
last = self[-1,1]
self.chop!
last
end
end
#!/usr/bin/ruby1.8
s = "I'm a string meing!"
s, last_char = s.rpartition(/./)
p [s, last_char] # => ["I'm a string meing", "!"]
String.rpartition is new for 1.9 but it's been back-ported to 1.8.7. It searches a string for a regular expression, starting at the end and working backwards. It returns the part of the string before the match, the match, and the part of the string after the match (which we discard here).
String#slice! and String#insert is going to get you much closer to what you want without converting your strings to arrays.
For example, to simulate Array#pop you can do:
text = '¡Exclamation!'
mark = text.slice! -1
mark == '!' #=> true
text #=> "¡Exclamation"
Likewise, for Array#shift:
text = "¡Exclamation!"
inverted_mark = text.slice! 0
inverted_mark == '¡' #=> true
text #=> "Exclamation!"
Naturally, to do an Array#push you just use one of the concatenation methods:
text = 'Hello'
text << '!' #=> "Hello!"
text.concat '!' #=> "Hello!!"
To simulate Array#unshift you use String#insert instead, it's a lot like the inverse of slice really:
text = 'World!'
text.insert 0, 'Hello, ' #=> "Hello, World!"
You can also grab chunks from the middle of a string in multiple ways with slice.
First you can pass a start position and length:
text = 'Something!'
thing = text.slice 4, 5
And you can also pass a Range object to grab absolute positions:
text = 'This is only a test.'
only = text.slice (8..11)
In Ruby 1.9 using String#slice like this is identical to String#[], but if you use the bang method String#slice! it will actually remove the substring you specify.
text = 'This is only a test.'
only = text.slice! (8..12)
text == 'This is a test.' #=> true
Here's a slightly more complex example where we reimplement a simple version of String#gsub! to do a search and replace:
text = 'This is only a test.'
search = 'only'
replace = 'not'
index = text =~ /#{search}/
text.slice! index, search.length
text.insert index, replace
text == 'This is not a test.' #=> true
Of course 99.999% of the time, you're going to want to use the aforementioned String.gsub! which will do the exact same thing:
text = 'This is only a test.'
text.gsub! 'only', 'not'
text == 'This is not a test.' #=> true
references:
Ruby String Documentation

Resources