Ruby - Iterating over a Nested Hash and counting values - ruby

Despite there being many questions on nested hashes I've not found any solutions to my problem.
I'm pulling in strings and matching each character to a hash like so:
numberOfChars = {}
string.each_char do |c|
RULES.each do |key, value|
if c === key
numberOfChars[value] += 1
end
end
end
This worked fine and would output something like "a appears 3 times" until I realised that my hash needed to be nested, akin to this:
RULES = {
:limb {
:colour {
'a' => 'foo',
'b' => 'bar'
},
'c' => 'baz'
}
}
So how would I go about getting the 'leaf' key and it's value?
While it's iterating over the hash it also needs to count how many times each key appears, e.g. does 'a' appear more than 'b'? If so add a to new hash. But I'm pretty lost as to how that would work in practice without knowing how it'll be iterating over a nested hash to begin with.
It just seems to me an overly convoluted way of doing this but if anyone's got any pointers they'd be hugely appreciated!
Also, if it's not painfully clear already I'm new to Ruby so I'm probably making some fundamental mistakes.

Are you looking for something like this?
RULES = {
:limb => {
:colour => {
'a' => 'foo',
'b' => 'bar'
},
'c' => 'baz',
:color => {
'b' => 'baz'
}
}
}
def count_letters(hash, results = {})
hash.each do |key, value|
if value.kind_of?(Hash)
count_letters(value, results)
else
results[key] = (results[key] || 0) + 1
end
end
results
end
p count_letters(RULES)

Related

Manipulate hash in Ruby

I have a hash that looks like
{
"lt"=>"456",
"c"=>"123",
"system"=>{"pl"=>"valid-player-name", "plv"=>"player_version_1"},
"usage"=>{"trace"=>"1", "cq"=>"versionid", "stream"=>"od",
"uid"=>"9", "pst"=>[["0", "1", "10"]], "dur"=>"0", "vt"=>"2"}
}
How can I go about turning it into a hash that looks like
{
"lt"=>"456",
"c"=>"123",
"pl"=>"valid-player-name",
"plv"=>"player_version_1",
"trace"=>"1",
"cq"=>"versionid",
"stream"=>"od",
"uid"=>"9",
"pst"=>[["0", "1", "10"]], "dur"=>"0", "vt"=>"2"
}
I basically want to get rid of the keys system and usage and keep what's nested inside them
"Low-tech" version :)
h = { ... }
h.merge!(h.delete('system'))
h.merge!(h.delete('usage'))
Assuming no rails:
hash.reject { |key, _| %w(system usage).include? key }.merge(hash['system']).merge(hash['usage'])
With active support:
hash.except('system', 'usage').merge(hash['system']).merge(hash['usage'])
A more generic version.
Merge any key that contains a hash:
h = { ... }
hnew = h.inject(h.dup) { |h2, (k, v)|
h2.merge!(h2.delete(k)) if v.is_a?(Hash)
h2
}
Assuming that your data has the same structure each time, I might opt for something simple and easy to understand like this:
def manipulate_hash(h)
{
"lt" => h["lt"],
"c" => h["c"],
"pl" => h["system"]["pl"],
"plv" => h["system"]["plv"],
"trace" => h["usage"]["trace"],
"cq" => h["usage"]["cq"],
"stream" => h["usage"]["stream"],
"uid" => h["uid"],
"pst" => h["pst"],
"dur" => h["dur"],
"vt" => h["vt"]
}
end
I chose to make the hash using one big hash literal expression that spans multiple lines. If you don't like that, you could build it up on multiple lines like this:
def manipulate_hash
r = {}
r["lt"] = h["lt"]
r["c"] = h["c"]
...
r
end
You might consider using fetch instead of the [] angle brackets. That way, you'll get an exception if the expected key is missing from the hash. For example, replace h["lt"] with h.fetch("lt").
If you plan to have an arbitrarily large list of keys to merge, this is an easily scaleable method:
["system", "usage"].each_with_object(myhash) do |key|
myhash.merge!(myhash.delete(key))
end

Ruby hash vivification weirdness

I'm running ruby 2.2.2:
$ ruby -v
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
Here I am initializing a hash with one key :b that has a value of Hash.new({})
irb(main):001:0> a = { b: Hash.new({}) }
=> {:b=>{}}
Now, I'm going to attempt to auto-vivify another hash at a[:b][:c] with a key 'foo' and a value 'bar'
irb(main):002:0> a[:b][:c]['foo'] = 'bar'
=> "bar"
At this point, I expected that a would contain something like:
{ :b => { :c => { 'foo' => 'bar' } } }
However, that is not what I'm seeing:
irb(main):003:0> a
=> {:b=>{}}
irb(main):004:0> a[:b]
=> {}
irb(main):005:0> a[:b][:c]
=> {"foo"=>"bar"}
This differs from the following:
irb(main):048:0> a = { :b => { :c => { "foo" => "bar" } } }
=> {:b=>{:c=>{"foo"=>"bar"}}}
irb(main):049:0> a
=> {:b=>{:c=>{"foo"=>"bar"}}}
So what is going on here?
I suspect this is something to do with Hash.new({}) returning a default value of {}, but I'm not exactly sure how to explain the end result...
Apologies for answering my own question, but I figured out what is happening.
The answer here is that we are assigning into the default hash being returned by a[:b], NOT a[:b] directly.
As before, we're going to create a hash with a single key of b and a value of Hash.new({})
irb(main):068:0> a = { b: Hash.new({}) }
=> {:b=>{}}
As you might expect, this should make things like a[:b][:unknown_key] return an empty hash {}, like so:
irb(main):070:0> a[:b].default
=> {}
irb(main):071:0> a[:b][:unknown_key]
=> {}
irb(main):072:0> a[:b].object_id
=> 70127981905400
irb(main):073:0> a[:b].default.object_id
=> 70127981905420
Notice that the object_id for a[:b] is ...5400 while the object_id for a[:b].default is ...5420
So what happens when we do the assignment from the original question?
a[:b][:c]["foo"] = "bar"
First, a[:b][:c] is resolved:
irb(main):075:0> a[:b][:c].object_id
=> 70127981905420
That's the same object_id as the .default object, because :c is treated the same as :unknown_key from above!
Then, we assign a new key 'foo' with a value 'bar' into that hash.
Indeed, check it out, we've effectively altered the default instead of a[:b]:
irb(main):081:0> a[:b].default
=> {"foo"=>"bar"}
Oops!
The answer is probably not as esoteric as it might seem at the onset, but this is just the way Ruby is handling that Hash.
If your initial Hash is the:
a = { b: Hash.new({}) }
b[:b][:c]['foo'] = 'bar'
Then seeing that each 'layer' of the Hash is just referencing the next element, such that:
a # {:b=>{}}
a[:b] # {}
a[:b][:c] # {"foo"=>"bar"}
a[:b][:c]["foo"] # "bar"
Your idea of:
{ :b => { :c => { 'foo' => 'bar' } } }
Is somewhat already accurate, so it makes me think that you already understand what's happening, but felt unsure of what was happening due to the way IRB was perhaps displaying it.
If I'm missing some element of your question though, feel free to comment and I'll revise my answer. But I feel like you understand Hashes better than you're giving yourself credit for in this case.

Parslet subtree doesn't fire

Resume (I shrinked down the following long story to the simple problem)
tree = {:properties => [{:a => 'b'}, {:c => 'd'}]}
big_tree = {:properties => [{:a => 'b'}, {:c => 'd'}], :moves => [{:a => 'b'}, {:c => 'd'}]}
trans = Parslet::Transform.new do
rule(:properties => subtree(:nested)) do
out = {}
nested.each {|pair| out = out.merge pair}
{:properties => out}
end
end
pp tree
pp trans.apply(tree)
pp big_tree
pp trans.apply(big_tree)
# OUTPUT
{:properties=>[{:a=>"b"}, {:c=>"d"}]}
{:properties=>{:a=>"b", :c=>"d"}} # Worked with small tree
{:properties=>[{:a=>"b"}, {:c=>"d"}], :moves=>[{:a=>"b"}, {:c=>"d"}]}
{:properties=>[{:a=>"b"}, {:c=>"d"}], :moves=>[{:a=>"b"}, {:c=>"d"}]} # Didn't work with bigger tree
=========================FULL STORY (Not so relevant after the header)
I am making an SGF files parser with Parslet.
Now I am at the stage to make a Transformer.
From the Parser I already get the structs like that:
[{:properties=>
[{:name=>"GM"#2, :values=>[{:value=>"1"#5}]},
{:name=>"FF"#7, :values=>[{:value=>"4"#10}]},
{:name=>"SZ"#12, :values=>[{:value=>"19"#15}]},
{:name=>"AP"#18, :values=>[{:value=>"SmartGo Kifu:2.2"#21}]},
{:name=>"GN"#40, :values=>[{:value=>"2013-05-11g"#43}]},
{:name=>"PW"#57, :values=>[{:value=>"Dahan"#60}]},
{:name=>"PB"#68, :values=>[{:value=>"SmartGo"#71}]},
{:name=>"DT"#81, :values=>[{:value=>"2013-05-11"#84}]},
{:name=>"KM"#97, :values=>[{:value=>"6.5"#100}]},
{:name=>"RE"#106, :values=>[{:value=>"W+R"#109}]},
{:name=>"RU"#115, :values=>[{:value=>"AGA (Area)"#118}]},
{:name=>"ID"#129, :values=>[{:value=>"ch0"#132}]}],
:moves=>
[{:player=>"B"#137, :place=>"oq"#139},
{:player=>"W"#143, :place=>"dd"#145},
{:player=>"B"#149, :place=>"oo"#151},
...etc...
The ruleset I am using to Transform:
# Rewrite player: COLOR, place: X to COLOR: X
rule( player: simple(:p), place: simple(:pl)) do
if p == 'W'
{ white: pl }
elsif p == 'B'
{ black: pl }
end
end
# Un-nest single-value hash
rule( value: simple(:v)) { v }
# Rewrite name: KEY, values: SINGLE_VALUE to KEY: SINGLE_VALUE
rule( name: simple(:n), values: [ simple(:v) ]) { {n.to_sym => v} }
# A Problem!!!
rule( properties: subtree(:props) ) do
out = {}
props.each {|pair| pair.each {|k, v| out[k] = v}}
{ properties: out }
end
With such rules I get the following struct:
[{:properties=>
[{:GM=>"1"#5},
{:FF=>"4"#10},
{:SZ=>"19"#15},
{:AP=>"SmartGo Kifu:2.2"#21},
{:GN=>"2013-05-11g"#43},
{:PW=>"Dahan"#60},
{:PB=>"SmartGo"#71},
{:DT=>"2013-05-11"#84},
{:KM=>"6.5"#100},
{:RE=>"W+R"#109},
{:RU=>"AGA (Area)"#118},
{:ID=>"ch0"#132}],
:moves=>
[{:black=>"oq"#139},
{:white=>"dd"#145},
{:black=>"oo"#151},
...etc...
Everything is perfect.
The only Problem of mine is that :properties Array of Hashes.
In the end I want to have
[{:properties=>
{:GM=>"1"#5,
:FF=>"4"#10,
:SZ=>"19"#15,
:AP=>"SmartGo Kifu:2.2"#21,
:GN=>"2013-05-11g"#43,
:PW=>"Dahan"#60,
:PB=>"SmartGo"#71,
:DT=>"2013-05-11"#84,
:KM=>"6.5"#100,
:RE=>"W+R"#109,
:RU=>"AGA (Area)"#118,
:ID=>"ch0"#132},
:moves=>
[{:black=>"oq"#139},
{:white=>"dd"#145},
{:black=>"oo"#151},
...etc...
You see? Merge all arrayed hashes inside :properties, because after the previous transformations they now have unique keys. Also flatten the struct a bit.
Hey! I can do it manually. I mean to run a separate method like
merged_stuff = {}
tree.first[:properties].each {|pair| pair.each {|k, v| merged_stuff[k] = v}}
tree.first[:properties] = merged_stuff
But Why I Cannot Do That With The Neat Transform Rules, To Have All Transformation Logic In One Place?
The point is that rule( properties: subtree(:props) ) does not get fired at all. Even if I just return nil from the block, it doesn't change anything. So, seems, that this subtree doesn't catch the things, or I don't.
The problem is that :properties and :moves are keys in the same hash, and subtree apparently doesn't want to match part of a hash. If you remove :moves, the rule will be executed. It is kinda explained in the documentation:
A word on patterns
Given the PORO hash
{
:dog => 'terrier',
:cat => 'suit' }
one might assume that the following rule matches :dog and replaces it by 'foo':
rule(:dog => 'terrier') { 'foo' }
This is frankly impossible. How would 'foo' live besides :cat => 'suit'
inside the hash? It cannot. This is why hashes are either matched completely,
cats n’ all, or not at all.
though I must admit it's not a really clear example.
So the problem rule should look like this:
rule( properties: subtree(:props), moves: subtree(:m) ) do
out = {}
props.each {|pair| pair.each {|k, v| out[k] = v}}
{ properties: out , moves: m}
end
Transform rules match a whole node and replace it, so you need to match the whole hash, not just one key.
rule( properties: subtree(:props), moves: subtree(:moves) )
If you converted the {:name=>"GM", :values=>[{:value=>"1"}]} type things into objects (using OpenStruct say) then you don't need to use subtree, you can use sequence.

Refactor ruby on rails model

Given the following code,
How would you refactor this so that the method search_word has access to issueid?
I would say that changing the function search_word so it accepts 3 arguments or making issueid an instance variable (#issueid) could be considered as an example of bad practices, but honestly I cannot find any other solution. If there's no solution aside from this, would you mind explaining the reason why there's no other solution?
Please bear in mind that it is a Ruby on Rails model.
def search_type_of_relation_in_text(issueid, type_of_causality)
relation_ocurrences = Array.new
keywords_list = {
:C => ['cause', 'causes'],
:I => ['prevent', 'inhibitors'],
:P => ['type','supersets'],
:E => ['effect', 'effects'],
:R => ['reduce', 'inhibited'],
:S => ['example', 'subsets']
}[type_of_causality.to_sym]
for keyword in keywords_list
relation_ocurrences + search_word(keyword, relation_type)
end
return relation_ocurrences
end
def search_word(keyword, relation_type)
relation_ocurrences = Array.new
#buffer.search('//p[text()*= "'+keyword+'"]/a').each { |relation|
relation_suggestion_url = 'http://en.wikipedia.org'+relation.attributes['href']
relation_suggestion_title = URI.unescape(relation.attributes['href'].gsub("_" , " ").gsub(/[\w\W]*\/wiki\//, ""))
if not #current_suggested[relation_type].include?(relation_suggestion_url)
if #accepted[relation_type].include?(relation_suggestion_url)
relation_ocurrences << {:title => relation_suggestion_title, :wiki_url => relation_suggestion_url, :causality => type_of_causality, :status => "A", :issue_id => issueid}
else
relation_ocurrences << {:title => relation_suggestion_title, :wiki_url => relation_suggestion_url, :causality => type_of_causality, :status => "N", :issue_id => issueid}
end
end
}
end
If you need additional context, pass it through as an additional argument. That's how it's supposed to work.
Setting #-type instance variables to pass context is bad form as you've identified.
There's a number of Ruby conventions you seem to be unaware of:
Instead of Array.new just use [ ], and instead of Hash.new use { }.
Use a case statement or a constant instead of defining a Hash and then retrieving only one of the elements, discarding the remainder.
Avoid using return unless strictly necessary, as the last operation is always returned by default.
Use array.each do |item| instead of for item in array
Use do ... end instead of { ... } for multi-line blocks, where the curly brace version is generally reserved for one-liners. Avoids confusion with hash declarations.
Try and avoid duplicating large chunks of code when the differences are minor. For instance, declare a temporary variable, conditionally manipulate it, then store it instead of defining multiple independent variables.
With that in mind, here's a reworking of it:
KEYWORDS = {
:C => ['cause', 'causes'],
:I => ['prevent', 'inhibitors'],
:P => ['type','supersets'],
:E => ['effect', 'effects'],
:R => ['reduce', 'inhibited'],
:S => ['example', 'subsets']
}
def search_type_of_relation_in_text(issue_id, type_of_causality)
KEYWORDS[type_of_causality.to_sym].collect do |keyword|
search_word(keyword, relation_type, issue_id)
end
end
def search_word(keyword, relation_type, issue_id)
relation_occurrences = [ ]
#buffer.search(%Q{//p[text()*= "#{keyword}'"]/a}).each do |relation|
relation_suggestion_url = "http://en.wikipedia.org#{relation.attributes['href']}"
relation_suggestion_title = URI.unescape(relation.attributes['href'].gsub("_" , " ").gsub(/[\w\W]*\/wiki\//, ""))
if (!#current_suggested[relation_type].include?(relation_suggestion_url))
occurrence = {
:title => relation_suggestion_title,
:wiki_url => relation_suggestion_url,
:causality => type_of_causality,
:issue_id => issue_id
}
occurrence[:status] =
if (#accepted[relation_type].include?(relation_suggestion_url))
'A'
else
'N'
end
relation_ocurrences << occurrence
end
end
relation_occurrences
end

Bidirectional Hash table in Ruby

I need a bidirectional Hash table in Ruby. For example:
h = {:abc => 123, :xyz => 789, :qaz => 789, :wsx => [888, 999]}
h.fetch(:xyz) # => 789
h.rfetch(123) # => abc
h.rfetch(789) # => [:xyz, :qaz]
h.rfetch(888) # => :wsx
Method rfetch means reversed fetch and is only my proposal.
Note three things:
If multiple keys map at the same value then rfetch returns all of them, packed in array.
If value is an array then rfetch looks for its param among elements of the array.
Bidirectional Hash means that both fetch and rfetch should execute in constant time.
Does such structure exists in Ruby (including external libraries)?
I thought about implementing it using two one-directional Hashes synchronized when one of them is modified (and packing it into class to avoid synchronization problems) but maybe I could use an already existing solution?
You could build something yourself pretty easily, just use a simple object that wraps two hashes (one for the forward direction, one for the reverse). For example:
class BiHash
def initialize
#forward = Hash.new { |h, k| h[k] = [ ] }
#reverse = Hash.new { |h, k| h[k] = [ ] }
end
def insert(k, v)
#forward[k].push(v)
#reverse[v].push(k)
v
end
def fetch(k)
fetch_from(#forward, k)
end
def rfetch(v)
fetch_from(#reverse, v)
end
protected
def fetch_from(h, k)
return nil if(!h.has_key?(k))
v = h[k]
v.length == 1 ? v.first : v.dup
end
end
Look ups will behave just like normal hash lookups (because they are normal hash lookups). Add some operators and maybe decent to_s and inspect implementations and you're good.
Such a thing works like this:
b = BiHash.new
b.insert(:a, 'a')
b.insert(:a, 'b')
b.insert(:a, 'c')
b.insert(:b, 'a')
b.insert(:c, 'x')
puts b.fetch(:a).inspect # ["a", "b", "c"]
puts b.fetch(:b).inspect # "a"
puts b.rfetch('a').inspect # [:a, :b]
puts b.rfetch('x').inspect # :c
puts b.fetch(:not_there).inspect # nil
puts b.rfetch('not there').inspect # nil
There's nothing wrong with building your tools when you need them.
There is no such structure built-in in Ruby.
Note that Hash#rassoc does something similar, but it returns only the first match and is linear-time:
h = {:abc => 123, :xyz => 789, :qaz => 789, :wsx => [888, 999]}
h.rassoc(123) # => [:abc, 123]
Also, it isn't possible to fullfill your requirements in Ruby in a perfectly safe manner, as you won't be able to detect changes in values that are arrays. E.g.:
h = MyBidirectionalArray.new(:foo => 42, :bar => [:hello, :world])
h.rfetch(:world) # => :bar
h[:bar].shift
h[:bar] # => [:world]
h.rfetch(:world) # => should be nil, but how to detect this??
Computing a hash everytime to detect a change will make your lookup linear-time. You could duplicate the array-values and freeze them, though (like Ruby does for Hash keys that are strings!)
What you seem to need is a Graph class, which could have a different API than a Hash, no? You can check out rgl or similar, but I don't know how they're implemented.
Good luck.
There is a Hash#invert method (http://www.ruby-doc.org/core-2.1.0/Hash.html#method-i-invert) to achieve this. It won't map multiple values to an array though.
Try this:
class Hash
def rfetch val
select { |k,v| v.is_a?(Array) ? v.include?(val) : v == val }.map { |x| x[0] }
end
end
If you're not doing lots of updates to this hash, you might be able to use inverthash.

Resources