I came around this strange feature(?) of arrays in Ruby and it would be very helpful if someone could explain to me why they work the way they do.
First lets give an example of how things usually work.
a = "Hello" #=> "Hello"
b = a #=> "Hello"
b += " Goodbye" #=> "Hello Goodbye"
b #=> "Hello Goodbye"
a #=> "Hello"
Ok cool, when you use = it creats a copy of the object (this time a string).
But when you use arrays this happens:
a = [1,2,3] #=> [1,2,3]
b = a #=> [1,2,3]
b[1] = 5 #=> [1,5,3]
b #=> [1,5,3]
a #=> [1,5,3]
Now thats just strange. Its the only object I've found that doesn't get copied when using = but instead just creates a refrance to the original object.
Can someone also explain (there must be a method) for copying an array without having it point back to the original object?
Actually, you should re-examine your premise.
The string assignment is really b = b + " Goodbye". The b + " Goodbye" operation returns an entirely new string, so the variable b is pointing to a new object after the assignment.
But when you assign to an individual array element, you are not creating an entirely new array, so a and b continue to point to the same object, which you just changed.
If you are looking for a rationale for the mutating vs functional behavior of arrays, it's simple. There is nothing to be gained by modifying the string. It is most likely necessary to allocate new memory anyway, so an entirely new string is created.
But an array can be arbitrarily large. Creating a new array in order to change just one element could be hugely expensive. And in any case, an array is like any other composite object. Changing an individual attribute does not necessarily affect any other attributes.
And to answer your question, you can always do:
b = a.dup
What happends there is that ruby is treating the Array object by reference and not by value.
So you can see it as this:
b= [1,2,3]
a= b
--'b' Points to---> [1,2,3] <--'a' points t---
So as you can see both point to the same reference, that means that if you change anything in a it will be reflected on b.
As for your question on the copying the object you could use the Object#clone method to do so.
Try your Array case with a String:
a = "Hello" #=> "Hello"
b = a #=> "Hello"
b[1] = "x" #=> "x"
b #=> "Hxllo"
a #=> "Hxllo"
Strings and Arrays work the same way in this regard.
The key difference in the two cases, as you wrote them, is this:
b += " Goodbye"
This is syntactic shorthand for
b = b + " Goodbye"
which is creating a new string from b + " Goodbye" and then assigning that to b. The way to modify an existing string, rather than creating a new one, is
b << " Goodbye"
And if you plug that into your sequence, you'll see that it modifies both a and b, since both variables refer to the same string object.
As for deep copying, there's a decent piece about it here:
http://ruby.about.com/od/advancedruby/a/deepcopy.htm
Related
Consider the following code
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map{|x,y|
a[x],a[y]=a[y],a[x]
#p a
a
}
I know ruby map method collects the last expression of the given block but when using the above code to swap the chars in a using the indexes in t won't succeeds.My intention was to collect the state of a after each swap in the index of t.But map always gives the array of a which is in the last state ie)["135264789", "135264789", "135264789", "135264789"].
The results shows that the map method have collected the final result of a after completing each indexes in t.But when printing the a after each swap prints correct value of a at each state.
Is this the correct behavior or am i missing something?
This is because the String#[]= method mutates the string.
Quick fix would be something like this:
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map{|x,y]
b = "#{a}" # IMPORTANT - this builds a new string
b[x],b[y]=b[y],b[x] # this mutates the new string
#p b
b
}
An alternative to "#{a}" would be to say a.clone, it does the same thing in this case.
The reason this works, is because instead of directly modifying a with a[x],a[y]=a[y],a[x], you're making a temporary copy of a and modifying that instead
edit - I misread the question - if you want to show the result of chaining each operation on the previous result, use dup/clone after the modification as Stefan said in his answer
Is my understanding correct?
Yes, I believe it is. I second what Max says, and I'll also elaborate a bit in case it helps.
Each b is a newly created object because it gets created inside the block, so it gets recreated with every new iteration. The a is created outside the block, so the same object (a) keeps getting referenced inside the block for each iteration.
You can better understand how this works by experimenting with #object_id. Try running this code:
a="123456789"
t=[[1,4],[3,4],[4,5],[1,2]]
p t.map { |x,y|
b = "#{a}" # IMPORTANT - this builds a new string
b[x],b[y]=b[y],b[x]
p "a.object_id = #{a.object_id}"
p "b.object_id = #{b.object_id}"
b
}
You will notice that a is the same object for each iteration of the #map method, while b is a new one.
This is an example of the concept of a closure. A closure is some sort of enclosed code structure that retains access to whatever state is available in the context in which it was created, while that context doesn't have access to its, the enclosed code's, state. Sort of like a "one way mirror": the enclosed code can see outside, but the outside can't see into the enclosed code.
In Ruby, closures are implemented as blocks: blocks are closures. So, everything that is visible to whatever context a block is created in (in this case, main) is also visible to that block, although the reverse is not true — for example, you can't reference b from outside the block. (Methods are not closures: if your block were a method, it wouldn't be able to see a unless you passed it in as an argument to your method.)
So, as Max says, when you make changes to a inside your block, you are actually changing (mutating) the same a that you defined up top each time.
Side topic
Now, if you are referencing individual characters in strings it's important to understand that the underlying structure of strings differs from that of arrays. Also, arrays behave differently when you mutate their elements from strings when you mutate their characters.
I'm mentioning this because I have this vague feeling that you are thinking of string character references as pretty much analogous to array element references. This is pretty much only true with respect to syntax.
You may find the results of running this code interesting:
a = '123456789'
p a.object_id
p a[0].object_id
p a[1].object_id
a[0] = '7'
p a.object_id
p a[0].object_id
p a[1].object_id
puts
a = '123456789'.chars
p a.object_id
p a[0].object_id
p a[1].object_id
a[0] = '7'
p a.object_id
p a[0].object_id
p a[1].object_id
In particular, a comparison of the four outputs of a[1].object_id should be instructive, because it shows where strings and arrays differ. If you reassign an element in an array, that element and only that element gets a new object id. If you reassign a character in a string, the string object itself remains the same, but every character in the string gets recreated.
Since you are returning a from map, the result will contain a four times. Those a's all refer to the same object.
You probably want to return a copy of a to preserve its current state:
a = '123456789'
t = [[1, 4], [3, 4], [4, 5], [1, 2]]
r = t.map { |x, y|
a[x], a[y] = a[y], a[x]
a.dup
}
r #=> ["153426789", "153246789", "153264789", "135264789"]
I'm trying to confirm whether my understanding is correct of these six lines of code:
string="this is a sentence"
words=string.split
first_word=words[0]
first_word[0]=first_word[0].upcase
out=words.join(" ")
puts(out)
which prints "This is a sentence" (with the first letter capitalized).
It would appear that changing the "first_word" string, which is defined as the first element of the "words" array, also changes the original "words" array. Is this indeed Ruby's default behavior? Does it not make it more difficult to track where in the code changes to the array take place?
You just need need to distinguish between a variable and an object. Your string is an object. first_word is a variable.
Look for example
a = "hello"
b = a
c = b
now all variables contain the same object, a string with the value "hello". We say they reference the object. No copy is made.
a[0] = 'H'
This changes the first character of the object, a string which now has the value "Hello". Both b and c contain the same, now changed object.
a = "different"
This assigns a new object to the variable a. b and c still hold the original object.
Is this Rubys default behaviour? yes. And it also works like this in many other programming languages.
Does it make it difficult to track changes? Sometimes.
If you takes an element from an array (like your first_word), you need to know:
If you change the object itself, no matter how you access it,
all variables will still hold your object, which just happened to be changed.
But if you replace the object in the array, like words[0] = "That", then all your other variables will still hold the original object.
This behavior is caused by how ruby does pass-by-value and pass-by-reference.
This is probably one of the more confusing parts of Ruby. It is well accepted that Ruby is a pass-by-value, high level programming language. Unfortunately, this is slightly incorrect, and you have found yourself a perfect example. Ruby does pass-by-value, however, most values in ruby are references. When Ruby does an assignment of a simple datatypes, integers, floats, strings, it will create a new object. However, when assigning objects such as arrays and hashes, you are creating references.
original_hash = {name: "schylar"}
reference_hash = original_hash
reference_hash[:name] = "!schylar"
original_hash #=> "!schylar"
original_array = [1,2]
reference_array = original_array
reference_array[0] = 3
reference_array #=> [3,2]
original_fixnum = 1
new_object_fixnum = original_fixnum
new_object_fixnum = 2
original_fixnum #=> 1
original_string = "Schylar"
new_object_string = original_string
new_object_string = "!Schylar"
original_string #=> "Schylar'
If you find yourself needing to copy by value, you may re-think the design. A common way to pass-by-value complex datatypes is using the Marshal methods.
a = {name: "Schylar"}
b = Marshal.load(Marshal.dump(a))
b[:name] = "!!!Schylar"
a #=> {:name => "Schylar"}
I have been using Ruby on Rails for a while without studying Ruby, now I am taking Odin Project. I am not really sure about the answer of this question:
What does it mean that strings are "mutable" and why care?
Update:
so now I understand mutable string basically means the value in the memory can be changed after string is created.
immutable string means the value in the memory cannot be changed once created, only the reference can be changed.
based on the result of following code:
a = "foo"
a.object_id
=> 70218039369160
b = "bar"
a << b
=> "foobar"
a.object_id
=> 70218039369160
can I say string in Ruby is mutable? because the value in same memory is changed
a += b
=> "foobar"
a.object_id
=> 70218039184800
and the + method in Ruby actually create a new String object instead of change the value of the original String object, that's why the object id changed.
my question is will it cause any security problem if I use += and << interchangeably?
It means that you can modify an exising instance of a string, without constructing a new one. Consider the following code:
str1 = "foo"
str2 = str1
str1 += "bar"
In languages like javascript where strings are immutable, the value of str2 will still be "bar" after that code is executed, as you can see here. However, in languages where strings are mutable, like ruby, when you append "bar" at the end of str1 (using the ruby operator << which does that) you are actually modifying the instance, not creating a new one, so str2 will also be modified, as you can see here.
PS: Note that the append at the end of the string operator in ruby is << instead of += (+= actually creates a new string, but not because it's forced to do it, like in javascript).
This is what I tried:
a = "Hello world"
a.object_id # => -633222538
b = a
b.object_id # => -633222538
b << " i say" # => "Hello world i say"
a # => "Hello world i say"
Why is it that both the variables b and a have the same object id? Also, when I change b, how did a also change?
Update:
How about when the variable is passed as an argument to a method? Why is the receiving variable having the same reference?
They are referencing the same object:
a = "Hello world" # a now references #-633222538
b = a # b now references #-633222538, too
b << " i say" # this appends " i say" to #-633222538
a # a still references #-633222538
String#<< doesn't assign a new object, it appends to the given string, thus changing the receiver.
I you want a copy, you can use clone or dup:
b = a.clone
a == b #=> true (same string values)
a.equal? b #=> false (different objects)
Regarding integers
There's no difference in referencing:
a = 100
a.object_id #=> 201
b = a
b.object_id #=> 201
Now both, a and b reference the same object. The only difference is that an integer cannot be changed in Ruby, they are fixed.
Passing variables as arguments
Again, the reference is passed:
a = "foo"
p = proc { |x| x << "bar" }
p.call(a)
a
#=> "foobar"
ENTER REFERENCES
The answer is that variables in Ruby (with a few exceptions, most notably variables bound to integers) don’t hold object values. a doesn’t contain "Hello world". Rather, a contains a reference to a string object. It’s the string object that has the characteristic of containing the letters that make up "Hello World".
In an assignment with a variable name on the left and an object on the right, the variable receives a reference to the object. In an assignment from one variable to another (a = b), the variable on the left receives a copy of the reference stored in the variable on the right, with the result that both variables now contain references to the same object.
The fact that variables hold references to objects has implications for operations that change objects. The string-concat operation
b << " i say"
concats the characters of the string to which b is a reference with the text " i say". The variable a contains another reference to the same string object. Even though the replace message goes to b, it causes a change to the object to which the reference in b refers. When you print out a, you see the result: the contents of the string have changed.
Some objects in Ruby are stored in variables as immediate values. These include in- tegers, symbols (which look like :this), and the special objects true, false, and nil. When you assign one of these values to a variable (x = 1), the variable holds the value itself, rather than a reference to it.
Copied and modified from Manning The Well Grounded Rubyist.
Looks like you called a mutable function on a variable which shared the same object with another variable. if you instead did b = b + 'i say' a would be left unchanged.
The variables a and b are references to a String object. When you did the b = a assignment, you copied the reference. It doesn't make a new copy of the object. If you want to copy the string object into a new object, you might do something like this:
a = "abc"
b = "" [or, b = String.new]
b << a
Now a and b will be different, independent string objects with the value "abc".
Suppose I create the following arrays in ruby:
a = ["apple", "cherry"]
b = a.dup
Here, b is a shallow copy of a. So if I do:
a.each{|fruit| fruit << " pie"}
I get both a and b equal to ["apple pie", "cherry pie"]. No problem there. But suppose I change one element of b:
b[1] = "blueberry"
and issue the same "each" command. Now b is ["apple pie", "blueberry"], because a[0] and b[0] are the same strings, but a[1] and b[1] are different strings. I could run the command on both a and b, but then b is ["apple pie pie", "blueberry pie"], because I have run the append operation on the same string twice.
Is there a way to modify in place all the strings of a and b, without duplicates. In this simple example, I could test for the substring " pie", but this wouldn't work for other types of changes (such as deleting the first character).
I tried creating a set containing all the strings, so that each would be unique; but it seems the set creation copies the strings, so they cannot be modified in place. Is there a way to test if two strings are the same in memory? I have googled for that, but found nothing.
The application of this is that I have big arrays of strings, which I "dup" to create a history of them. Now I want to apply a change to the entire history, without double applying (or triple, etc) the changes.
I don't understand your use case; I suspect you're making things more complicated than they need to be.
Is there a way to test if two strings are the same in memory?
Object#object_id is what you're looking for.
Is there a way to modify in place all the strings of a and b, without duplicates?
You could keep a set of not all the object_ids, similar to what you were already trying. You can retrieve the string with ObjectSpace#_id2ref. Something like this:
require 'set'
set = Set.new
a = ["apple", "cherry"]
b = a.dup
b[1] = "blueberry"
# Collect unique string objects
a.each{|s| set << s.object_id}
b.each{|s| set << s.object_id}
# Make pie with each unique string object
set.each{|id| ObjectSpace._id2ref(id) << " pie"}
a
# => ["apple pie", "cherry pie"]
b
# => ["apple pie", "blueberry pie"]
That seems a bit crazy to me, though. Again, I think there's probably a better way to do what you're trying to do, but it's hard to tell based on the information provided.