Why when I assign constant to variable and update it, constant is being updated to? Is it expected behavior or bug?
ruby-1.9.3-p0 :001 > A = { :test => '123' }
=> {:test=>"123"}
ruby-1.9.3-p0 :002 > b = A
=> {:test=>"123"}
ruby-1.9.3-p0 :003 > b[:test] = '456'
=> "456"
ruby-1.9.3-p0 :004 > A
=> {:test=>"456"}
This is expected behavior, but why isn't always obvious. This is a very important distinction in languages like Ruby. There are three things in play here:
The constant A
The variable b
The hash { :test => '123' }
The first two are both kinds of variables. The third is an object. The difference between variables and objects is crucial. Variables just refer to objects. When you assign the same object to two variables, they both refer to the same object. There was only ever one object created, so when you change it, both variables refer to the changed object.
This is because of the shallow copy mechanism. In your example A and b are actually references to the same object. To avoid that use:
b = A.dup
This will initialize b with a copy of A instead of pointing it to the same hash(i.e. this uses deep copy).
For more info see here what shallow and deep copy is.
Related
I'm in the process of building my own webserver and want to make a logger. After server get a message, I want both to log it and send the message to the parsers. I need to make some moditications to the message for logging (eg remove the password), but when I change second variable, first is changed too!
msg = log_msg = JSON.parse(something)
log_msg[:password] = '[FILTERED]'
raise msg.inspect # => {..., :password => '[FILTERED]'}
How can I avoid this behavior of my code?
UPDATED It seems more strange, because of irb:
2.2.1 :001 > a = b = 1
=> 1
2.2.1 :002 > b = 2
=> 2
2.2.1 :003 > b
=> 2
2.2.1 :004 > a
=> 1
After the assignment, msg and log_msg reference to the same object. If this is not what you expected, try this:
log_msg = JSON.parse(something)
msg = log_msg.dup
Note that the other example behave differently because Fixnum is special. From the manual:
Fixnum objects have immediate value. This means that when they are assigned or passed as parameters, the actual object is passed, rather than a reference to that object.
This question is tightly linked to the following duscussions:
Object assignment in ruby
Does ruby pass by reference or by value
Ruby parameters by reference or by value
Please read them carefully to understand what's going on (this addressed your code snippet with assigning integer values aswell).
To assign by value you could clone or dup methods. Check the value of object_id to realize if you're working on the same object or not.
a = {} # => {}
b = a # => {}
b.object_id # => 114493940
a.object_id # => 114493940
b = a.clone # => {}
b.object_id # => 115158164
a.object_id # => 114493940
Experimenting in irb with strings and noticed that when a variable referencing a string is used as a key value in a Hash, a new copy of the String is created rather than a reference to the original object, this isn't the case with an Array:
1.9.3-p448 :051 > a = 'str1'
=> "str1"
1.9.3-p448 :052 > b = 'str2'
=> "str2"
1.9.3-p448 :053 > arr = [a,b]
=> ["str1", "str2"]
1.9.3-p448 :054 > arr[0].object_id == a.object_id
=> true
1.9.3-p448 :055 > hash = { a => b }
=> {"str1"=>"str2"}
1.9.3-p448 :056 > hash.keys[0].object_id == a.object_id
=> false
I understand if I just stuck to symbols I wouldn't be asking this question.
What is the purpose for making a copy of the String? I understand that a string comparison would still work but surely an object_id comparison would be quicker?
From Hash.[]= documentation:
key should not have its value changed while it is in use as a key (an
unfrozen String passed as a key will be duplicated and frozen).
Since by default, strings are not immutable in ruby, theoretically you can change them after you set them as keys in your hash. If you do that - your hash will become invalid, as it will not be able to find those keys properly.
Since string are ubiquitous and are often used by reference, this way Ruby protects its hashes from unexpected bugs, which are very hard to detect.
Most of the usual kinds of keys are immutable: numbers, symbols, dates. Strings however are mutable, and as Uri Agassi writes, Ruby protects the hash from bugs. It does not do so for arrays used as keys, perhaps for performance reasons (possibly large arrays) or perhaps arrays are not commonly used as keys. Hashes normally compare by the result of the hash method which every object has. If you want it to compare by object_id then you can switch it on: hash.compare_by_identity.
I have a simple ActiveRecord model called Student with 100 records in the table. I do the following in a rails console session:
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0
x = Student.all
ObjectSpace.each_object(ActiveRecord::Base).count
# => 100
x = nil
GC.start
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0 # Good!
Now I do the following:
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0
x = Student.all.group_by(&:last_name)
ObjectSpace.each_object(ActiveRecord::Base).count
# => 100
x = nil
GC.start
ObjectSpace.each_object(ActiveRecord::Base).count
# => 100 # Bad!
Can anyone explain why this happens and whether there is a smart way to solve this without knowing the underlying hash structure? I know I can do this:
x.keys.each{|k| x[k]=nil}
x = nil
GC.start
and it will remove all Student objects from memory correctly, but I'm wondering if there is a general solution (my real-life problem is wide spread and has more intricate data structures than the hash shown above).
I'm using Ruby 1.9.3-p0 and Rails 3.1.0.
UPDATE (SOLVED)
Per Oscar Del Ben's explanation below, a few ActiveRecord::Relation objects are created in the problematic code snippet (they are actually created in both code snippets, but for some reason they "misbehave" only in the second one. Can someone shed light on why?). These maintain references to the ActiveRecord objects via an instance variable called #records. This instance variable can be set to nil through the "reset" method on ActiveRecord::Relation. You have to make sure to perform this on all the relation objects:
ObjectSpace.each_object(ActiveRecord::Base).count
# => 100
ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)
GC.start
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0
Note: You can also use Mass.detach (using the ruby-mass gem Oscar Del Ben referenced), though it will be much slower than the code above. Note that the code above does not remove a few ActiveRecord::Relation objects from memory. These seem to be pretty insignificant though. You can try doing:
Mass.index(ActiveRecord::Relation)["ActiveRecord::Relation"].each{|x| Mass.detach Mass[x]}
GC.start
And this would remove some of the ActiveRecord::Relation objects, but not all of them (not sure why, and those that are left have no Mass.references. Weird).
I think I know what's going on. Ruby's GC wont free immutable objects (like symbols!). The keys returned by group_by are immutable strings, and so they wont be garbage collected.
UPDATE:
It seems like the problem is not with Rails itself. I tried using group_by alone, and sometimes the objects would not get garbage collected:
oscardelben~/% irb
irb(main):001:0> class Foo
irb(main):002:1> end
=> nil
irb(main):003:0> {"1" => Foo.new, "2" => Foo.new}
=> {"1"=>#<Foo:0x007f9efd8072a0>, "2"=>#<Foo:0x007f9efd807250>}
irb(main):004:0> ObjectSpace.each_object(Foo).count
=> 2
irb(main):005:0> GC.start
=> nil
irb(main):006:0> ObjectSpace.each_object(Foo).count
=> 0
irb(main):007:0> {"1" => Foo.new, "2" => Foo.new}.group_by
=> #<Enumerator: {"1"=>#<Foo:0x007f9efb83d0c8>, "2"=>#<Foo:0x007f9efb83d078>}:group_by>
irb(main):008:0> GC.start
=> nil
irb(main):009:0> ObjectSpace.each_object(Foo).count
=> 2 # Not garbage collected
irb(main):010:0> GC.start
=> nil
irb(main):011:0> ObjectSpace.each_object(Foo).count
=> 0 # Garbage collected
I've digged through the GC internals (which are surprisingly easy to understand), and this seems like a scope issue. Ruby walks through all the objects in the current scope and marks the ones which it thinks are still being used, after that it goes through all the objects in the heap and frees the ones which have not been marked.
In this case I think the hash is still being marked even though it's out of scope. There are many reasons why this may happening. I'll keep investigating.
UPDATE 2:
I've found what's keeping references of objects. To do that I've used the ruby mass gem. It turns out that Active Record relation keeps track of the objects returned.
User.limit(1).group_by(&:name)
GC.start
ObjectSpace.each_object(ActiveRecord::Base).each do |obj|
p Mass.references obj # {"ActiveRecord::Relation#70247565268860"=>["#records"]}
end
Unfortunately, calling reset on the relation didn't seem to help, but hopefully this is enough information for now.
i do not know the answer
But i tried inspecting the heap as given on http://blog.headius.com/2010/07/browsing-memory-jruby-way.html
Have attached a screenshot at, https://skitch.com/deepak_kannan/en3dg/java-visualvm
it was a simple program
class Foo; end
f1 = Foo.new
f2 = Foo.new
GC.start
Then used jvisualvm as given above. Was running this in irb.
Seems as if jruby is tracking the object's scope. The object will not get GC'ed if there are any non-weak references to that object
I know there are other questions similar such as:
Ruby: how to check if variable exists within a hash definition
Checking if a variable is defined?
But the answers aren't fully satisfactory.
I have:
ruby-1.9.2-p290 :001 > a=Hash.new
=> {}
ruby-1.9.2-p290 :002 > a['one']="hello"
=> "hello"
ruby-1.9.2-p290 :006 > defined?(a['one']['some']).nil?
=> false
ruby-1.9.2-p290 :007 > a['one']['some'].nil?
=> true
It seems like:
if a['one']['some'].nil?
a['one']['some']=Array.new
end
would be sufficient. Is this correct? Would this be correct for any data type? Is defined? needed in this case?
thx
You seem to be confusing two concepts. One is if a variable is defined, and another is if a Hash key is defined. Since a hash is, at some point, a variable, then it must be defined.
defined?(a)
# => nil
a = { }
# => {}
defined?(a)
# => "local-variable"
a.key?('one')
# => false
a['one'] = 'hello'
# => 'hello'
a.key?('one')
# => true
Something can be a key and nil at the same time, this is valid. There is no concept of defined or undefined for a Hash. It is all about if the key exists or not.
The only reason to test with .nil? is to distinguish between the two possible non-true values: nil and false. If you will never be using false in that context, then calling .nil? is unnecessarily verbose. In other words, if (x.nil?) is equivalent to if (x) provided x will never be literally false.
What you probably want to employ is the ||= pattern that will assign something if the existing value is nil or false:
# Assign an array to this Hash key if nothing is stored there
a['one']['hello'] ||= [ ]
Update: Edited according to remarks by Bruce.
I had to dig a number of pages deep into Google, but I eventually found this useful bit from the Ruby 1.9 spec:
"In all cases the test [defined?] is conducted without evaluating the operand."
So what's happening is that it looks at:
a['one']['some']
and says "that is sending the "operator []" message to the 'a' object - that is a method call!" and the result of defined? on that is "method".
Then when you check against nil?, the string "method" clearly isn't nil.
In addition to #tadmans answer, what you actually did in your example was to check, if the string "some" is included in the string "hello" which is stored in your hash at the position "one".
a = {}
a['one'] = 'hello'
a['one']['some'] # searches the string "some" in the hash at key "one"
A more simple example:
b = 'hello'
b['he'] # => 'he'
b['ha'] # => nil
That's why the defined? method did not return nil, as you expected, but "method".
I learned that in Ruby, variables hold references to objects, not the objects themselves.
For example:
a = "Tim"
b = a
a[0] = 'J'
Then a and b both have value "Jim".
However if I change the 3rd line to
a = "Jim"
Then a == Jim and b == Tim
I assume that means the code I changed created a new reference for a.
So why does changing a letter or changing the entire string make so much difference?
Follow-up question: Does Java work the same way?
Thank you.
The single thing to learn here is the difference between assignment and method call.
a = 'Jim'
is an assignment. You create a new string object (literal 'Jim') and assign it to variable a.
On the other side,
a[0] = 'J'
is a method call on an object already referenced by the variable a. A method call can't replace the object referenced by the variable with another one, it can just change the internal state of the object, and/or return another object.
I find that things like this are easiest to figure out using IRB:
>> a = 'Tim'
=> "Tim"
>> a.object_id
=> 2156046480
>> b = a
=> "Tim"
>> b.object_id
=> 2156046480
>> a.object_id == b.object_id
=> true
As you can see a and b have the same object_id, meaning they reference the same object. So when you change one, you change the other. Now assign something new to a:
>> a = 'Jim'
=> "Jim"
>> a.object_id
=> 2156019520
>> b.object_id
=> 2156046480
>> a.object_id == b.object_id
=> false
You made a point to a new object, while b still kept the old reference. Changing either of them now will not change the other one.
When you do a[0] = 'J', you're asking
Change the first character of the object referenced by a (which happens to be the same as b) to 'J'
While when you do a = "Jim", you're assigning an entirely new object reference (the string "Jim") to a. b is unaffected because you're not changing anything in the original reference.