Understanding how hash copy is behaving under the hood

Understanding how hash copy is behaving under the hood - ruby

I came out with an example of ruby hashes that I cannot quite understand what is happening under the hood:
root = {}
base = root
base[:a] = {}
base = base[:a]
p base
=> {}
p root
=> {:a=>{}}
When I assign base = base[:a] as I was expecting base becomes {}, but why root doesn't become {} too?

I just needed a little push to understand, and thanks to #Stefan I think I can answer my own question. Breaking it down we have:
root = {}
base = root
puts root.object_id
=> 47193371579760
puts base.object_id
=> 47193371579760
So both root and base became a reference for the same object.
base[:a] = {}
base[:a].object_id
=> 47193372751820
base = base[:a]
puts base.object_id
=> 47193372751820
puts root.object_id
=> 47193371579760
puts root
base[:a] is a new hash object, and base assigned to it becomes this object while root keeps the reference for the old object that was assigned {:a=>{}}. That's why root doesn't change at the end.

Array, Symbol and Hash are some reference variables. Similar concept is present in all language. Taking example of javascript, we can relate same with concept of shallow copy & deep copy where hash is replaced by object.
Following is little explanation added for your query.
root = {}
# root = {}
base = root
# both base & root variables point to single location, principle of reference object
base[:a] = {}
# base aka root variable is holding value { a: {} }
base = base[:a]
# base variable is not pointing to root variable location, it is re-assigned with new value which is also hash (reference object)
p base
=> {}
p root
=> {:a=>{}}
base[:b] = 4
=> 4
p base
=> {:b=>4}
p root
=> {:a=>{:b=>4}}
Using object_id function you can verify address. To make deep copy use clone function for hash.

Related

Defining a method to update ruby object property

Hey I am having trouble with the following question to be solved using ruby.
Question
Write a function that provides change directory (cd) function for an abstract file system.
Notes:
Root path is '/'.
Path separator is '/'.
Parent directory is addressable as '..'.
Directory names consist only of English alphabet letters (A-Z and a-z).
For example:
path = Path.new('/a/b/c/d')
puts path.cd('../x').current_path
should display '/a/b/c/x'.
Note: Do not use built-in path-related functions.
My Answer
class Path
def initialize(path)
#current_path = path
end
def current_path
#current_path
end
def cd(new_path)
if new_path.include? ".."
z = new_path.split("/")
b = #current_path
a = b.split('/')
a.shift
a.pop
#current_path = a.push(z[z.length-1]).join("/")
else
end
end
end
path = Path.new('/a/b/c/d')
path = path.cd('../x')
However this returns a string instead of an object from the 'path' variable.

You need to create a chain method. There are 2 ways to address it.
The immutable one - just create new instance of the class instead of modifying, e.g. return Path.new(calculated_path)
The mutable one - modify #current_path and return self in the end of the method #cd

After you've changed #current_path in the object, just return the object ('self')
#current_path = a.push(z[z.length-1]).join("/")
return self

Ruby regex into array of hashes but need to drop a key/val pair

I'm trying to parse a file containing a name followed by a hierarchy path. I want to take the named regex matches, turn them into Hash keys, and store the match as a hash. Each hash will get pushed to an array (so I'll end up with an array of hashes after parsing the entire file. This part of the code is working except now I need to handle bad paths with duplicated hierarchy (top_* is always the top level). It appears that if I'm using named backreferences in Ruby I need to name all of the backreferences. I have gotten the match working in Rubular but now I have the p1 backreference in my resultant hash.
Question: What's the easiest way to not include the p1 key/value pair in the hash? My method is used in other places so we can't assume that p1 always exists. Am I stuck with dropping each key/value pair in the array after calling the s_ary_to_hash method?
NOTE: I'm keeping this question to try and solve the specific issue of ignoring certain hash keys in my method. The regex issue is now in this ticket: Ruby regex - using optional named backreferences
UPDATE: Regex issue is solved, the hier is now always stored in the named 'hier' group. The only item remaining is to figure out how to drop the 'p1' key/value if it exists prior to creating the Hash.
Example file:
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat[1]/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
Expected output:
[{:name => "name1", :hier => "top_cat/mouse/dog/elephant/horse"},
{:name => "new12", :hier => "top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
{:name => "tops", :hier => "top_bat/car[0]"},
{:name => "ab123", :hier => "top_2/top_1/top_3/top_4/dog"}]
Code snippet:
def s_ary_to_hash(ary, regex)
retary = Array.new
ary.each {|x| (retary << Hash[regex.match(x).names.map{|key| key.to_sym}.zip(regex.match(x).captures)]) if regex.match(x)}
return retary
end
regex = %r{(?<name>\w+) (?<p1>[\w\/\[\]]+)?(?<hier>(\k<p1>.*)|((?<= ).*$))}
h_ary = s_ary_to_hash(File.readlines(filename), regex)

What about this regex ?
^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$
Demo
http://rubular.com/r/awEP9Mz1kB
Sample code
def s_ary_to_hash(ary, regex, mappings)
retary = Array.new
for item in ary
tmp = regex.match(item)
if tmp then
hash = Hash.new
retary.push(hash)
mappings.each { |mapping|
mapping.map { |key, groups|
for group in group
if tmp[group] then
hash[key] = tmp[group]
break
end
end
}
}
end
end
return retary
end
regex = %r{^(?<name>\S+)\s+(?<p1>top_.+?)(?:\/(?<hier>\k<p1>(?:\[.+?\])?.+))?$}
h_ary = s_ary_to_hash(
File.readlines(filename),
regex,
[
{:name => ['name']},
{:hier => ['hier','p1']}
]
)
puts h_ary
Output
{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse\r"}
{:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool\r"}
{:name=>"tops", :hier=>"top_bat/car[0]"}
Discussion
Since Ruby 2.0.0 doesn't support branch reset, I have built a solution that add some more power to the s_ary_to_hash function. It now admits a third parameter indicating how to build the final array of hashes.
This third parameter is an array of hashes. Each hash in this array has one key (K) corresponding to the key in the final array of hashes. K is associated with an array containing the named group to use from the passed regex (second parameter of s_ary_to_hash function).
If a group equals nil, s_ary_to_hash skips it for the next group.
If all groups equal nil, K is not pushed on the final array of hashes.
Feel free to modify s_ary_to_hash if this isn't a desired behavior.

Edit: I've changed the method s_ary_to_hash to conform with what I now understand to be the criterion for excluding directories, namely, directory d is to be excluded if there is a downstream directory with the same name, or the same name followed by a non-negative integer in brackets. I've applied that to all directories, though I made have misunderstood the question; perhaps it should apply to the first.
data =<<THE_END
name1 top_cat/mouse/dog/top_cat/mouse/dog/elephant/horse
new12 top_ab12/hat/top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool
tops top_bat/car[0]
ab123 top_2/top_1/top_3/top_4/top_2/top_1/top_3/top_4/dog
THE_END
text = data.split("\n")
def s_ary_to_hash(ary)
ary.map do |s|
name, _, downstream_path = s.partition(' ').map(&:strip)
arr = []
downstream_dirs = downstream_path.split('/')
downstream_dirs.each {|d| puts "'#{d}'"}
while downstream_dirs.any? do
dir = downstream_dirs.shift
arr << dir unless downstream_dirs.any? { |d|
d == dir || d =~ /#{dir}\[\d+\]/ }
end
{ name: name, hier: arr.join('/') }
end
end
s_ary_to_hash(text)
# => [{:name=>"name1", :hier=>"top_cat/mouse/dog/elephant/horse"},
# {:name=>"new12", :hier=>"top_ab12/hat[1]/path0_top_ab12/top_ab12path1/cool"},
# {:name=>"tops", :hier=>"top_bat/car[0]"},
# {:name=>"ab123", :hier=>"top_2/top_1/top_3/top_4/dog"}]
The exclusion criterion is implement in downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }, where dir is the directory that is being tested and downstream_dirs is an array of all the downstream directories. (When dir is the last directory, downstream_dirs is empty.) Localizing it in this way makes it easy to test and change the exclusion criterion. You could shorten this to a single regex and/or make it a method:
dir exclude_dir?(dir, downstream_dirs)
downstream_dirs.any? { |d| d == dir || d =~ /#{dir}\[\d+\]/ }end
end

Here is a non regexp solution:
result = string.each_line.map do |line|
name, path = line.split(' ')
path = path.split('/')
last_occur_of_root = path.rindex(path.first)
path = path[last_occur_of_root..-1]
{name: name, heir: path.join('/')}
end

When does variables in Ruby determine whether to hold a new reference?

I learned that in Ruby, variables hold references to objects, not the objects themselves.
For example:
a = "Tim"
b = a
a[0] = 'J'
Then a and b both have value "Jim".
However if I change the 3rd line to
a = "Jim"
Then a == Jim and b == Tim
I assume that means the code I changed created a new reference for a.
So why does changing a letter or changing the entire string make so much difference?
Follow-up question: Does Java work the same way?
Thank you.

The single thing to learn here is the difference between assignment and method call.
a = 'Jim'
is an assignment. You create a new string object (literal 'Jim') and assign it to variable a.
On the other side,
a[0] = 'J'
is a method call on an object already referenced by the variable a. A method call can't replace the object referenced by the variable with another one, it can just change the internal state of the object, and/or return another object.

I find that things like this are easiest to figure out using IRB:
>> a = 'Tim'
=> "Tim"
>> a.object_id
=> 2156046480
>> b = a
=> "Tim"
>> b.object_id
=> 2156046480
>> a.object_id == b.object_id
=> true
As you can see a and b have the same object_id, meaning they reference the same object. So when you change one, you change the other. Now assign something new to a:
>> a = 'Jim'
=> "Jim"
>> a.object_id
=> 2156019520
>> b.object_id
=> 2156046480
>> a.object_id == b.object_id
=> false
You made a point to a new object, while b still kept the old reference. Changing either of them now will not change the other one.

When you do a[0] = 'J', you're asking
Change the first character of the object referenced by a (which happens to be the same as b) to 'J'
While when you do a = "Jim", you're assigning an entirely new object reference (the string "Jim") to a. b is unaffected because you're not changing anything in the original reference.

How do I dynamically decide which hash to add a value to?

I have a class that has hashes in various stages of "completion". This is to optimize so that I don't have to keep recreating hashes with root data that I already know. For example this is a counter called #root that would serve as a starting point.
{3=>4, 4=>1, 10=>3, 12=>5, 17=>1}
and it took key+key+key+key+key number of iterations to create #root. But now I have all combinations of [x,y] left to be added to the counter and individually evaluated. So I could do it like:
a = (1..52)
a.combination{|x,y|
evaluate(x,y)
}
But instead of I would like to do this:
a.each{|x|
evaluate(x, "foo")
a.each {|y| evaluate(y, "bar")}
}
Where i have a method like this to keep track of the hash at each state:
def evaluate index, hsh
case hsh
when "root"
#root.key?(index) ? #root[index] += 1 : #root[index] = 1
when "foo"
#foo = #root.clone
#foo.key?(index) ? #foo[index] += 1 : #foo[index] = 1
when "bar"
#bar = #foo.clone
#bar.key?(index) ? #bar[index] += 1 : #bar[index] = 1
end
end
But there is alot of repetition in this method. Is there a way that I could do this dynamically without using eval?

Instead of using hsh as a string descriptor, you can directly pass the hash object as parameter to your method evaluate? E.g. instead of evaluate(x, "foo") you write
#foo = #root.clone
evaluate(x, #foo)
Also note the #root.clone in your code overwrites the field several times inside the loop.
Additionally if you use a default initializer for your hash you save quite some logic in your code. E.g. the code lines
h = Hash.new{0}
...
h[index] += 1
will set the default value to zero if non was set for index. Thus you do not have to take care of the special case inside your evaluate method.

What is this Hash-like/Tree-like Construct Called?

I want to create a "Config" class that acts somewhere between a hash and a tree. It's just for storing global values, which can have a context.
Here's how I use it:
Config.get("root.parent.child_b") #=> "value"
Here's what the class might look like:
class Construct
def get(path)
# split path by "."
# search tree for nodes
end
def set(key, value)
# split path by "."
# create tree node if necessary
# set tree value
end
def tree
{
:root => {
:parent => {
:child_a => "value",
:child_b => "another value"
},
:another_parent => {
:something => {
:nesting => "goes on and on"
}
}
}
}
end
end
Is there a name for this kind of thing, somewhere between Hash and Tree (not a Computer Science major)? Basically a hash-like interface to a tree.
Something that outputs like this:
t = TreeHash.new
t.set("root.parent.child_a", "value")
t.set("root.parent.child_b", "another value")
desired output format:
t.get("root.parent.child_a") #=> "value"
t.get("root") #=> {"parent" => {"child_a" => "value", "child_b" => "another value"}}
instead of this:
t.get("root") #=> nil
or this (which you get the value from by calling {}.value)
t.get("root") #=> {"parent" => {"child_a" => {}, "child_b" => {}}}

You can implement one in no-time:
class TreeHash < Hash
attr_accessor :value
def initialize
block = Proc.new {|h,k| h[k] = TreeHash.new(&block)}
super &block
end
def get(path)
find_node(path).value
end
def set(path, value)
find_node(path).value = value
end
private
def find_node(path)
path.split('.').inject(self){|h,k| h[k]}
end
end
You could improve implementation by setting unneeded Hash methods as a private ones, but it already works the way you wanted it. Data is stored in hash, so you can easily convert it to yaml.
EDIT:
To meet further expectations (and, convert to_yaml by default properly) you should use modified version:
class TreeHash < Hash
def initialize
block = Proc.new {|h,k| h[k] = TreeHash.new(&block)}
super &block
end
def get(path)
path.split('.').inject(self){|h,k| h[k]}
end
def set(path, value)
path = path.split('.')
leaf = path.pop
path.inject(self){|h,k| h[k]}[leaf] = value
end
end
This version is slight trade-off, as you cannot store values in non-leaf nodes.

I think the name for the structure is really a nested hash, and the code in the question is a reinvention of javascript's dictionaries. Since a dictionary in JS (or Python or ...) can be nested, each value can be another dictionary, which has its own key/val pairs. In javascript, that's all an object is.
And the best bit is being able to use JSON to define it neatly, and pass it around:
tree : {
'root' : {
'parent' : {
'child_a' : "value",
'child_b' : "another value"
},
'another_parent' : {
'something' : {
'nesting' : "goes on and on"
}
}
}
};
In JS you can then do tree.root.parent.child_a.
This answer to another question suggests using the Hashie gem to convert JSON objects into Ruby objects.

I think this resembles a TreeMap data structure similar to the one in Java described here. It does the same thing (key/value mappings) but retrieval might be different since you are using the nodes themselves as the keys. Retrieval from the TreeMap described is abstracted from the implementation since, when you pass in a key, you don't know the exact location of it in the tree.
Hope that makes sense!

Er... it can certainly be done, using a hierarchical hash table, but why do you need the hierarchy? IF you only need exactly-matching get and put, why can't you just make a single hash table that happens to use a dot-separated naming convention?
That's all that's needed to implement the functionality you've asked for, and it's obviously very simple...

Why use a hash-like interface at all? Why not use chaining of methods to navigate your tree? For example config.root.parent.child_b and use instance methods and if needed method_missing() to implement them?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Understanding how hash copy is behaving under the hood - ruby

I came out with an example of ruby hashes that I cannot quite understand what is happening under the hood: root = {} base = root base[:a] = {} base = base[:a] p base => {} p root => {:a=>{}} When I assign base = base[:a] as I was expecting base becomes {}, but why root doesn't become {} too?

Related

Defining a method to update ruby object property

Ruby regex into array of hashes but need to drop a key/val pair

When does variables in Ruby determine whether to hold a new reference?

How do I dynamically decide which hash to add a value to?

What is this Hash-like/Tree-like Construct Called?

Categories

Resources