What's the best way of converting a dot notation path (or even an array of strings) into a nested hash key-value? Ex: I need to convert 'foo.bar.baz' equal to 'qux' like this:
{
'foo' => {
'bar' => {
'baz' => 'qux'
}
}
}
I've done this in PHP, but I managed that by creating a key in the array and then setting a tmp variable to that array key's value by reference so any changes would also take place in the array.
Try this
f = "root/sub-1/sub-2/file"
f.split("/").reverse.inject{|a,n| {n=>a}} #=>{"root"=>{"sub-1"=>{"sub-2"=>"file"}}}
I'd probably use recursion. For example:
def hasherizer(arr, value)
if arr.empty?
value
else
{}.tap do |hash|
hash[arr.shift] = hasherizer(arr, value)
end
end
end
This results in:
> hasherizer 'foo.bar.baz'.split('.'), 'qux'
=> {"foo"=>{"bar"=>{"baz"=>"qux"}}}
I like this method below which operates on itself (or your own hash class). It'll create new hash keys or reuse/append to existing keys in a hash to add or update the value.
# set a new or existing nested key's value by a dotted-string key
def dotkey_set(dottedkey, value, deep_hash = self)
keys = dottedkey.to_s.split('.')
first = keys.first
if keys.length == 1
deep_hash[first] = value
else
# in the case that we are creating a hash from a dotted key, we'll assign a default
deep_hash[first] = (deep_hash[first] || {})
dotkey_set(keys.slice(1..-1).join('.'), value, deep_hash[first])
end
end
Usage:
hash = {}
hash.dotkey_set('how.are.you', 'good')
# => "good"
hash
# => {"how"=>{"are"=>{"you"=>"good"}}}
hash.dotkey_set('how.goes.it', 'fine')
# => "fine"
hash
# => {"how"=>{"are"=>{"you"=>"good"}, "goes"=>{"it"=>"fine"}}}
I did something similar when I wrote an HTTP server that had to move all the parameters passed in the request into a multiple value hash which might contain arrays or strings or hashes...
You can look at the code for the Plezi server and framework... although the code over there deals with values surrounded with []...
It could possibly be adjusted like so:
def add_param_to_hash param_name, param_value, target_hash = {}
begin
a = target_hash
p = param_name.split(/[\/\.]/)
val = param_value
# the following, somewhat complex line, runs through the existing (?) tree, making sure to preserve existing values and add values where needed.
p.each_index { |i| p[i].strip! ; n = p[i].match(/^[0-9]+$/) ? p[i].to_i : p[i].to_sym ; p[i+1] ? [ ( a[n] ||= ( p[i+1].empty? ? [] : {} ) ), ( a = a[n]) ] : ( a.is_a?(Hash) ? (a[n] ? (a[n].is_a?(Array) ? (a << val) : a[n] = [a[n], val] ) : (a[n] = val) ) : (a << val) ) }
rescue Exception => e
warn "(Silent): parameters parse error for #{param_name} ... maybe conflicts with a different set?"
target_hash[param_name] = param_value
end
end
This should preserve existing values while adding new values if they exist.
The long line looks something like this when broken down:
def add_param_to_hash param_name, param_value, target_hash = {}
begin
# a will hold the object to which we Add.
# As we walk the tree we change `a`. we start at the root...
a = target_hash
p = param_name.split(/[\/\.]/)
val = param_value
# the following, somewhat complex line, runs through the existing (?) tree, making sure to preserve existing values and add values where needed.
p.each_index do |i|
p[i].strip!
# converts the current key string to either numbers or symbols... you might want to replace this with: n=p[i]
n = p[i].match(/^[0-9]+$/) ? p[i].to_i : p[i].to_sym
if p[i+1]
a[n] ||= ( p[i+1].empty? ? [] : {} ) # is the new object we'll add to
a = a[n] # move to the next branch.
else
if a.is_a?(Hash)
if a[n]
if a[n].is_a?(Array)
a << val
else
a[n] = [a[n], val]
end
else
a[n] = val
end
else
a << val
end
end
end
rescue Exception => e
warn "(Silent): parameters parse error for #{param_name} ... maybe conflicts with a different set?"
target_hash[param_name] = param_value
end
end
Brrr... Looking at the code like this, I wonder what I was thinking...
Related
I am trying to loop an array which might look like following:
names = ['sid','john'] #this array will be dynamic, The values keep changing
I am trying to write a method where I will define an empty hash and loop the array using .each
and then store the values to hash.But not working.
def add_address
names = ['sid','john']
addr_arr = {}
names.each do |n|
addr_arr['name'] = n
end
addr_arr
end
this returns only {"name"=>"john"}.
What am I doing wrong?
The problem with your implementation is that there's only one hash and each time you set a value for the "name" key, the previous value for that key will be deleted and replaced by the new value.
I see addr_arr has arr in the name, so I assume you wanted something like this:
def add_address
names = ['sid','john']
addr_arr = []
names.each do |n|
addr_arr << { "name" => n}
end
addr_arr
end
add_address
#=> [{"name"=>"sid"}, {"name"=>"john"}]
or shorter:
['sid','john'].map{ |name| {"name" => name} }
#=> [{"name"=>"sid"}, {"name"=>"john"}]
If you always use the key 'name', you're overwriting its values every time, I don't think that's what you want. I don't know if this is what you want anyway, but this should be enough to understand the problem
names.each do |n|
addr_arr[n] = n
end
If I have a string like this
str =<<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
If a number in the first value shows up again, I want to add their second values together. So the final string would look like this
7312357006,1246.221
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
If the final output is an array that's fine too.
There are lots of ways to do this in Ruby. One particularly terse way is to use String#scan:
str = <<END
7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1
END
data = Hash.new(0)
str.scan(/(\d+),([\d.]+)/) {|k,v| data[k] += v.to_f }
p data
# => { "7312357006" => 1246.221,
# "3214058234" => 3499.2,
# "1324958723" => 232.1,
# "3214173443" => 234.1,
# "6134513494" => 23.2 }
This uses the regular expression /(\d+),([\d.]+)/ to extract the two values from each line. The block is called with each pair as arguments, which are then merged into the hash.
This could also be written as a single expression using each_with_object:
data = str.scan(/(\d+),([\d.]+)/)
.each_with_object(Hash.new(0)) {|(k,v), hsh| hsh[k] += v.to_f }
# => (same as above)
There are likewise many ways to print the result, but here are a couple I like:
puts data.map {|kv| kv.join(",") }.join("\n")
# => 7312357006,1246.221
# 3214058234,3499.2
# 1324958723,232.1
# 3214173443,234.1
# 6134513494,23.2
# or:
puts data.map {|k,v| "#{k},#{v}\n" }.join
# => (same as above)
You can see all of these in action on repl.it.
Edit: Although I don't recommend either of these for the sake of readability, here's more just for kicks (requires Ruby 2.4+):
data = str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.transform_values {|a| a.sum(&:to_f) }
...or, to going straight to a string:
puts str.lines.group_by {|s| s.slice!(/(\d+),/); $1 }
.map {|k,vs| "#{k},#{vs.sum(&:to_f)}\n" }.join
Since repl.it is stuck on Ruby 2.3: Try it online!
You could achieve this using each_with_object, as below:
str = "7312357006,1.121
3214058234,3456
7312357006,1234
1324958723,232.1
3214058234,43.2
3214173443,234.1
6134513494,23.2
7312357006,11.1"
# convert the string into nested pairs of floats
# to briefly summarise the steps: split entries by newline, strip whitespace, split by comma, convert to floats
arr = str.split("\n").map(&:strip).map { |el| el.split(",").map(&:to_f) }
result = arr.each_with_object(Hash.new(0)) do |el, hash|
hash[el.first] += el.last
end
# => {7312357006.0=>1246.221, 3214058234.0=>3499.2, 1324958723.0=>232.1, 3214173443.0=>234.1, 6134513494.0=>23.2}
# You can then call `to_a` on result if you want:
result.to_a
# => [[7312357006.0, 1246.221], [3214058234.0, 3499.2], [1324958723.0, 232.1], [3214173443.0, 234.1], [6134513494.0, 23.2]]
each_with_object iterates through each pair of data, providing them with access to an accumulator (in this the hash). By following this approach, we can add each entry to the hash, and add together the totals if they appear more than once.
Hope that helps - let me know if you've any questions.
def combine(str)
str.each_line.with_object(Hash.new(0)) do |s,h|
k,v = s.split(',')
h.update(k=>v.to_f) { |k,o,n| o+n }
end.reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair }
end
puts combine str
7312357006,1246.22
3214058234,3499.2
1324958723,232.1
3214173443,234.1
6134513494,23.2
Notes:
using String#each_line is preferable to str.split("\n") as the former returns an enumerator whereas the latter returns a temporary array. Each element generated by the enumerator is line of str that (unlike the elements of str.split("\n")) ends with a newline character, but that is of no concern.
see Hash::new, specifically when a default value (here 0) is used. If a hash has been defined h = Hash.new(0) and h does not have a key k, h[k] returns the default value, zero (h is not changed). When Ruby encounters the expression h[k] += 1, the first thing she does is expand it to h[k] = h[k] + 1. If h has been defined with a default value of zero, and h does not have a key k, h[k] on the right of the equality (syntactic sugar1 for h.[](k)) returns zero.
see Hash#update (aka merge!). h.update(k=>v.to_f) is syntactic sugar for h.update({ k=>v.to_f })
see Kernel#sprint for explanations of the formatting directives %s and %g.
the receiver for the expression reduce('') { |s,kv_pair| s << "%s,%g\n" % kv_pair } (in the penultimate line), is the following hash.
{"7312357006"=>1246.221, "3214058234"=>3499.2, "1324958723"=>232.1,
"3214173443"=>234.1, "6134513494"=>23.2}
1 Syntactic sugar is a shortcut allowed by Ruby.
Implemented this solution as hash was giving me issues:
d = []
s.split("\n").each do |line|
x = 0
q = 0
dup = false
line.split(",").each do |data|
if x == 0 and d.include? data then dup = true ; q = d.index(data) elsif x == 0 then d << data end
if x == 1 and dup == false then d << data end
if x == 1 and dup == true then d[q+1] = "#{'%.2f' % (d[q+1].to_f + data.to_f).to_s}" end
if x == 2 and dup == false then d << data end
x += 1
end
end
x = 0
s = ""
d.each do |val|
if x == 0 then s << "#{val}," end
if x == 1 then s << "#{val}\n ; x = 0" end
x += 1
end
puts(s)
I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.
I have two hashes:
a = {"0"=>"name", "1"=>"email"}
b = {"0"=>"source", "1"=>"info", "2"=>"extra", "3"=>"name"}
I want a hash created by doing the following:
1) When the two hashes contain identical values, keep value of original hash and discard value of second hash.
2) When the values of second hash are not in first hash, just add to the end of new hash, making sure that the key is ordered.
with this result:
{"0"=>"name", "1"=>"email", "2"=>"source", "3"=>"info", "4"=>"extra"}
I did it this ugly way:
l1 = a.keys.length
l2 = b.keys.length
max = l1 > l2 ? l1 : l2
counter = l1
result = {}
max.times do |i|
unless a.values.include? b[i.to_s]
result[counter.to_s] = b[i.to_s]
counter += 1
end
end
a.merge!(result)
Is there a built-in ruby method or utility that could achieve this same task in a cleaner fashion?
(a.values + b.values).uniq.map.with_index{|v, i| [i.to_s, v]}.to_h
# => {"0"=>"name", "1"=>"email", "2"=>"source", "3"=>"info", "4"=>"extra"}
First create an array containing the values in the hash. This can be accomplished with the concat method. Now that we have an array, we can call the uniq method to retrieve all unique values. This also preserves the order.
a = { "0" => "name", "1" => "email" }
b = { "0" => "source", "1" => "info", "2" => "extra", "3" => "name" }
values = a.values.concat(b.values).uniq
A shortcut to generating a hash in Ruby is with this trick.
Hash[[*0..values.length-1].zip(values)]
Output:
{0=>"name", 1=>"email", 2=>"source", 3=>"info", 4=>"extra"}
a = {"0"=>"name", "1"=>"email"}
b = {"0"=>"source", "1"=>"info", "2"=>"extra", "3"=>"name"}
key = (a.size-1).to_s
#=> "1"
b.each_value.with_object(a) { |v,h| (h[key.next!] = v) unless h.value?(v) }
#=> {"0"=>"name", "1"=>"email", "2"=>"source", "3"=>"info", "4"=>"extra"}
Edit: The issue is being unable to get the quantity of arrays within the hash, so it can be, x = amount of arrays. so it can be used as function.each_index{|x| code }
Trying to use the index of the amount of rows as a way of repeating an action X amount of times depending on how much data is pulled from a CSV file.
Terminal issued
=> Can't convert symbol to integer (TypeError)
Complete error:
=> ~/home/tests/Product.rb:30:in '[]' can't convert symbol into integer (TypeError) from ~home/tests/Product.rub:30:in 'getNumbRel'
from test.rb:36:in '<main>'
the function is that is performing the action is:
def getNumRel
if defined? #releaseHashTable
return #releaseHashTable[:releasename].length
else
#releaseHashTable = readReleaseCSV()
return #releaseHashTable[:releasename].length
end
end
The csv data pull is just a hash of arrays, nothing snazzy.
def readReleaseCSV()
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has started")
$log.debug("reading product csv file")
# Create a Hash where the default is an empty Array
result = Array.new
csvPath = "#{File.dirname(__FILE__)}"+"/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
if "#{column}" == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
productId = Integer(value)
if result[productId] == nil
result[productId] = Array.new
end
result[productId][result[productId].length] = proHash
end
end
end
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has finished")
#productReleaseArr = result
end
Sorry, couldn't resist, cleaned up your method.
# empty brackets unnecessary, no uppercase in method names
def read_release_csv
# you don't need + here
$log.info("Method #{self.class.name}.#{__method__} has started")
$log.debug("reading product csv file")
# you're returning this array. It is not a hash. [] is preferred over Array.new
result = []
csvPath = "#{File.dirname(__FILE__)}/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
# to_s is preferred
if column.to_s == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
# to_i is preferred
productId = value.to_i
# this notation is preferred
result[productId] ||= []
# this is identical to what you did and more readable
result[productId] << proHash
end
end
end
$log.info("Method #{self.class.name}.#{__method__} has finished")
#productReleaseArr = result
end
You haven't given much to go on, but it appears that #releaseHashTable contains an Array, not a Hash.
Update: Based on the implementation you posted, you can see that productId is an integer and that the return value of readReleaseCSV() is an array.
In order to get the releasename you want, you have to do this:
#releaseHashTable[productId][n][:releasename]
where productId and n are integers. Either you'll have to specify them specifically, or (if you don't know n) you'll have to introduce a loop to collect all the releasenames for all the products of a particular productId.
This is what Mark Thomas meant:
> a = [1,2,3] # => [1, 2, 3]
> a[:sym]
TypeError: can't convert Symbol into Integer
# here starts the backstrace
from (irb):2:in `[]'
from (irb):2
An Array is only accessible by an index like so a[1] this fetches the second element from the array
Your return a an array and thats why your code fails:
#....
result = Array.new
#....
#productReleaseArr = result
# and then later on you call
#releaseHashTable = readReleaseCSV()
#releaseHashTable[:releasename] # which gives you TypeError: can't convert Symbol into Integer