Why is String#index returning nil here? - ruby

On the last two lines:
$ ruby -v
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10]
$ irb
irb(main):001:0> def t(str)
irb(main):002:1> str.index str
irb(main):003:1> end
=> nil
irb(main):004:0> t 'abc'
=> 0
irb(main):005:0> t "\x01\x11\xfe"
=> nil
irb(main):006:0> t "\x01\x11\xfe".force_encoding(Encoding::UTF_8)
=> nil
Why does str.index str return nil?

"\x01\x11\xfe" is not valid UTF8.
If you call t "\x01\x11\xfe".force_encoding(Encoding::BINARY), you'll get the expected 0.

The behavior is different on my older Ruby:
$ ruby -v ; irb
ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
irb(main):001:0> def t(str)
irb(main):002:1> str.index str
irb(main):003:1> end
=> nil
irb(main):004:0> t 'abc'
=> 0
irb(main):005:0> t "\x01\x11\xfe"
=> 0

I suspect it has something to do with the difference between single-quotes and double-quotes with string literals in ruby:
ruby-1.9.1-p378 > def t(str) ; str.index(str) ; end
=> nil
ruby-1.9.1-p378 > t 'abc'
=> 0
ruby-1.9.1-p378 > t "\x01\x11\xfe"
=> nil
ruby-1.9.1-p378 > t '\x01\x11\xfe'
=> 0
The short answer is that using single-quotes does minimal text processing, but double-quotes allows interpolation, character escaping, and a few other things.
Some examples:
#interpolation
ruby-1.9.1-p378 > x = 5 ; 'number: #{x}'
=> "number: \#{x}"
ruby-1.9.1-p378 > x = 5 ; "number: #{x}"
=> "number: 5"
#character escaping
ruby-1.9.1-p378 > puts 'tab\tseparated'
tab\tseparated
=> nil
ruby-1.9.1-p378 > puts "tab\tseparated"
tab separated
=> nil
#hex characters
ruby-1.9.1-p378 > puts '\x01\x11\xfe'
\x01\x11\xfe
=> nil
ruby-1.9.1-p378 > puts "\x01\x11\xfe"
�
=> nil
I'm sure someone can explain better why this happens, this is just what I've experienced in my rubying.

Related

Why a dangerous method doesn't work with a character element of String in Ruby?

When I apply the upcase! method I get:
a="hello"
a.upcase!
a # Shows "HELLO"
But in this other case:
b="hello"
b[0].upcase!
b[0] # Shows h
b # Shows hello
I don't understand why the upcase! applied to b[0] doesn't have any efect.
b[0] returns a new String every time. Check out the object id:
b = 'hello'
# => "hello"
b[0].object_id
# => 1640520
b[0].object_id
# => 25290780
b[0].object_id
# => 24940620
When you are selecting an individual character in a string, you're not referencing the specific character, you're calling a accessor/mutator function which performs the evaluation:
2.0.0-p643 :001 > hello = "ruby"
=> "ruby"
2.0.0-p643 :002 > hello[0] = "R"
=> "R"
2.0.0-p643 :003 > hello
=> "Ruby"
In the case when you run a dangerous method, the value is requested by the accessor, then it's manipulated and the new variable is updated, but because there is no longer a connection between the character and the string, it will not update the reference.
2.0.0-p643 :004 > hello = "ruby"
=> "ruby"
2.0.0-p643 :005 > hello[0].upcase!
=> "R"
2.0.0-p643 :006 > hello
=> "ruby"

Print memory address for Ruby array

irb> class A; end
=> nil
irb> a=A.new
=> "#<A:0x3094638>"
irb> a.inspect
=> "#<A:0x3094638>"
irb> b=[]
=> []
irb> b.inspect
=> "[]"
How to get memory address of an array object?
Use the method Object#object_id.
Returns an integer identifier for obj. The same number will be returned on all calls to id for a given object, and no two active objects will share an id. #object_id is a different concept from the :name notation, which returns the symbol id of name.
Example :-
Arup-iMac:arup_ruby $ irb
2.1.2 :001 > s = "I am a string"
=> "I am a string"
2.1.2 :002 > obj_id = s.object_id
=> 2156122060
2.1.2 :003 > ObjectSpace._id2ref obj_id
=> "I am a string"
2.1.2 :004 >

How can I escape a Ruby symbol with quotes only if needed?

IRB and Rails console both have a nice way of outputting symbols that only quote-escapes them when necessary. Some examples:
1.9.3p194 :001 > "#test".to_sym
=> :#test
1.9.3p194 :002 > "#Test".to_sym
=> :#Test
1.9.3p194 :003 > "#123".to_sym
=> :"#123"
1.9.3p194 :004 > "##_test".to_sym
=> :##_test
1.9.3p194 :005 > "test?".to_sym
=> :test?
1.9.3p194 :006 > "test!".to_sym
=> :test!
1.9.3p194 :007 > "_test!".to_sym
=> :_test!
1.9.3p194 :008 > "_test?".to_sym
=> :_test?
1.9.3p194 :009 > "A!".to_sym
=> :"A!"
1.9.3p194 :010 > "#a!".to_sym
=> :"#a!"
How would you do this yourself, so that you could do:
puts "This is valid code: #{escape_symbol(some_symbol)}"
The easiest and best way to do this is via Symbol's inspect method:
1.9.3p194 :013 > puts "This is valid code: #{"#a!".to_sym.inspect}"
This is valid code: :"#a!"
=> nil
1.9.3p194 :014 > puts "This is valid code: #{"a!".to_sym.inspect}"
This is valid code: :a!
You could look at the sym_inspect(VALUE sym) method in string.c in Ruby 1.9.3 that does that, if you're curious.
So, even though you don't need another method to call inspect, this would be the simplest implementation:
def escape_symbol(sym)
sym.inspect
end
Here's my attempt at implementing with a few regexs, although I'd suggest using inspect instead if you can:
def escape_symbol(sym)
sym =~ /^[#a-zA-Z_]#?[a-zA-Z_0-9]*$/ || sym =~ /^[a-z_][a-zA-Z_0-9]*[?!]?$/ ? ":#{sym}" : ":\"#{sym.gsub(/"/, '\\"')}\""
end

Ruby Symbol#to_proc leaks references in 1.9.2-p180?

Ok, this is my second attempt at debugging the memory issues with my Sinatra app. I believe I have it nailed down into simple sample code this time.
It seems when I filter an array through .map(&:some_method), it causes the items in that array to not get garbage collected. Running the equivalent .map{|x| x.some_method} is totally fine.
Demonstration: Given a simple sample class:
class C
def foo
"foo"
end
end
If I run the following in IRB, it gets collected normally:
ruby-1.9.2-p180 :001 > a = 10.times.map{C.new}
=> [...]
ruby-1.9.2-p180 :002 > b = a.map{|x| x.foo}
=> ["foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo"]
ruby-1.9.2-p180 :003 > ObjectSpace.each_object(C){}
=> 10
ruby-1.9.2-p180 :004 > a = nil
=> nil
ruby-1.9.2-p180 :005 > b = nil
=> nil
ruby-1.9.2-p180 :006 > GC.start
=> nil
ruby-1.9.2-p180 :007 > ObjectSpace.each_object(C){}
=> 0
So no references to C exist anymore. Good. But substituting map{|x| x.foo} with map(&:foo) (which is advertised as equivalent), it doesn't get collected:
ruby-1.9.2-p180 :001 > a = 10.times.map{C.new}
=> [...]
ruby-1.9.2-p180 :002 > b = a.map(&:foo)
=> ["foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo", "foo"]
ruby-1.9.2-p180 :003 > ObjectSpace.each_object(C){}
=> 10
ruby-1.9.2-p180 :004 > a = nil
=> nil
ruby-1.9.2-p180 :005 > b = nil
=> nil
ruby-1.9.2-p180 :006 > GC.start
=> nil
ruby-1.9.2-p180 :007 > ObjectSpace.each_object(C){}
=> 10
ruby-1.9.2-p180 :008 >
Is this a ruby bug? I'll try in more versions of ruby to be sure but this seems like an obvious issue. Anyone know what I'm doing wrong?
Edit:
I've tried this in 1.8.7-p352 and it doesn't have the issue. 1.9.3-preview1 does however still have the issue. Is a bug report in order or am I doing something wrong?
Edit2: formatting (why does putting four spaces before each line produce syntax highlighting while <pre> tags don't?)
As a.map(&:foo) should be the exact equivalent to a.map{|x| x.foo}, it seems like you really hit a bug in the Ruby code here. It cannot hurt to file a bug report on (http://redmine.ruby-lang.org/), the worst that can happen is that its being ignored. You can decrease the chances of that by providing a patch for the issue.
EDIT: I threw on my IRB and tried your code. I can reproduce the issue you describe on ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]. However, explicitely calling to_proc on the symbol does not suffer from the same problem:
irb(main):001:0> class C; def foo; end; end
=> nil
irb(main):002:0> a = 10.times.map { C.new }
=> [...]
irb(main):004:0> b = a.map(&:foo.to_proc)
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]
irb(main):005:0> ObjectSpace.each_object(C){}
=> 10
irb(main):006:0> a = b = nil
=> nil
irb(main):007:0> GC.start
=> nil
irb(main):008:0> ObjectSpace.each_object(C){}
=> 0
It seems we are facing an issue with the implicit Symbol -> Proc conversion here. Maybe I will try to dive a bit into the Ruby source later. If so, I will keep you updated.
EDIT 2:
Simple workaround for the problem:
class Symbol
def to_proc
lambda { |x| x.send(self) }
end
end
class C
def foo; "foo"; end
end
a = 10.times.map { C.new }
b = a.map(&:foo)
p b
a = b = nil
GC.start
p ObjectSpace.each_object(C) {}
prints 0.

Are strings mutable in Ruby?

Are Strings mutable in Ruby? According to the documentation doing
str = "hello"
str = str + " world"
creates a new string object with the value "hello world" but when we do
str = "hello"
str << " world"
It does not mention that it creates a new object, so does it mutate the str object, which will now have the value "hello world"?
Yes, << mutates the same object, and + creates a new one. Demonstration:
irb(main):011:0> str = "hello"
=> "hello"
irb(main):012:0> str.object_id
=> 22269036
irb(main):013:0> str << " world"
=> "hello world"
irb(main):014:0> str.object_id
=> 22269036
irb(main):015:0> str = str + " world"
=> "hello world world"
irb(main):016:0> str.object_id
=> 21462360
irb(main):017:0>
Just to complement, one implication of this mutability is seem below:
ruby-1.9.2-p0 :001 > str = "foo"
=> "foo"
ruby-1.9.2-p0 :002 > ref = str
=> "foo"
ruby-1.9.2-p0 :003 > str = str + "bar"
=> "foobar"
ruby-1.9.2-p0 :004 > str
=> "foobar"
ruby-1.9.2-p0 :005 > ref
=> "foo"
and
ruby-1.9.2-p0 :001 > str = "foo"
=> "foo"
ruby-1.9.2-p0 :002 > ref = str
=> "foo"
ruby-1.9.2-p0 :003 > str << "bar"
=> "foobar"
ruby-1.9.2-p0 :004 > str
=> "foobar"
ruby-1.9.2-p0 :005 > ref
=> "foobar"
So, you should choose wisely the methods you use with strings in order to avoid unexpected behavior.
Also, if you want something immutable and unique throughout your application you should go with symbols:
ruby-1.9.2-p0 :001 > "foo" == "foo"
=> true
ruby-1.9.2-p0 :002 > "foo".object_id == "foo".object_id
=> false
ruby-1.9.2-p0 :003 > :foo == :foo
=> true
ruby-1.9.2-p0 :004 > :foo.object_id == :foo.object_id
=> true
While above answers are perfect, Just adding this answer for future readers.
In most languages, string literals are also immutable, just like
numbers and symbols. In Ruby versions that are less than 3, all
strings are mutable by default. This changed in Ruby 3. Now all string
literals are immutable by default in Ruby 3++ versions.

Resources