I have a nested hash that looks something like this:
{
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
The hash can have even more nesting, but the values of the last level are always arrays.
I would like to "flatten" the hash into a format where I get a an array of arrays representing all keys and values that makes up an entire "path/branch" of the nested hash all they way from lowest level value to the top of the hash. So kind of like traversing up through the "tree" starting from the bottom while collecting keys and values on the way.
The output of that for the particular hash should be:
[
['a', 'b', 'c'],
['a', 'd', 'e', 'f'],
['a', 'd', 'g', 'h', 'i', 'j', 'k'],
['a', 'l', 'm', 'n', 'o', 'p'],
['q', 'r', 's']
]
I tried many different things, but nothing worked so far. Again keep in mind that more levels than these might occur, so the solution has to be generic.
Note: the order of the arrays and the order of the elements in them is not important.
I did the following, but it's not really working:
tree_def = {
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
branches = [[]]
collect_branches = lambda do |tree, current_branch|
tree.each do |key, hash_or_values|
current_branch.push(key)
if hash_or_values.kind_of?(Hash)
collect_branches.call(hash_or_values, branches.last)
else # Reached lowest level in dependency tree (which is always an array)
# Add a new branch
branches.push(current_branch.clone)
current_branch.push(*hash_or_values)
current_branch = branches.last
end
end
end
collect_branches.call(tree_def, branches[0])
branches #=> wrong result
As hinted at in the comments:
Looks pretty straightforward. Descend into hashes recursively, taking note of keys you visited in this branch. When you see an array, no need to recurse further. Append it to the list of keys and return
Tracking is easy, just pass the temp state down to recursive calls in arguments.
I meant something like this:
def tree_flatten(tree, path = [], &block)
case tree
when Array
block.call(path + tree)
else
tree.each do |key, sub_tree|
tree_flatten(sub_tree, path + [key], &block)
end
end
end
tree_flatten(tree_def) do |path|
p path
end
This code simply prints each flattened path as it gets one, but you can store it in an array too. Or even modify tree_flatten to return you a ready array, instead of yielding elements one by one.
You can do it like that:
def flat_hash(h)
return [h] unless h.kind_of?(Hash)
h.map{|k,v| flat_hash(v).map{|e| e.unshift(k)} }.flatten(1)
end
input = {
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
p flat_hash(input)
The output will be:
[
["a", "b", "c"],
["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"],
["q", "r", "s"]
]
This of course calls for a recursive solution. The following method does not mutate the original hash.
Code
def recurse(h)
h.each_with_object([]) do |(k,v),arr|
v.is_a?(Hash) ? recurse(v).each { |a| arr << [k,*a] } : arr << [k,*v]
end
end
Example
h = { 'a'=>{ 'b'=>['c'],
'd'=>{ 'e'=>['f'], 'g' => ['h', 'i', 'j', 'k'] },
'l' => ['m', 'n', 'o', 'p'] },
'q'=>{ 'r'=>['s'] } }
recurse h
#=> [["a", "b", "c"],
# ["a", "d", "e", "f"],
# ["a", "d", "g", "h", "i", "j", "k"],
# ["a", "l", "m", "n", "o", "p"],
# ["q", "r", "s"]]
Explanation
The operations performed by recursive methods are always difficult to explain. In my experience the best way is to salt the code with puts statements. However, that in itself is not enough because when viewing output it is difficult to keep track of the level of recursion at which particular results are obtained and either passed to itself or returned to a version of itself. The solution to that is to indent and un-indent results, which is what I've done below. Note the way I've structured the code and the few helper methods I use are fairly general-purpose, so this approach can be adapted to examine the operations performed by other recursive methods.
INDENT = 8
def indent; #col += INDENT; end
def undent; #col -= INDENT; end
def pu(s); print " "*#col; puts s; end
def puhline; pu('-'*(70-#col)); end
#col = -INDENT
def recurse(h)
begin
indent
puhline
pu "passed h = #{h}"
h.each_with_object([]) do |(k,v),arr|
pu " k = #{k}, v=#{v}, arr=#{arr}"
if v.is_a?(Hash)
pu " calling recurse(#{v})"
ar = recurse(v)
pu " return value=#{ar}"
pu " calculating recurse(v).each { |a| arr << [k,*a] }"
ar.each do |a|
pu " a=#{a}"
pu " [k, *a] = #{[k,*a]}"
arr << [k,*a]
end
else
pu " arr << #{[k,*v]}"
arr << [k,*v]
end
pu "arr = #{arr}"
end.tap { |a| pu "returning=#{a}" }
ensure
puhline
undent
end
end
recurse h
----------------------------------------------------------------------
passed h = {"a"=>{"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}, "q"=>{"r"=>["s"]}}
k = a, v={"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}, arr=[]
calling recurse({"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]})
--------------------------------------------------------------
passed h = {"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}
k = b, v=["c"], arr=[]
arr << ["b", "c"]
arr = [["b", "c"]]
k = d, v={"e"=>["f"], "g"=>["h", "i", "j", "k"]}, arr=[["b", "c"]]
calling recurse({"e"=>["f"], "g"=>["h", "i", "j", "k"]})
------------------------------------------------------
passed h = {"e"=>["f"], "g"=>["h", "i", "j", "k"]}
k = e, v=["f"], arr=[]
arr << ["e", "f"]
arr = [["e", "f"]]
k = g, v=["h", "i", "j", "k"], arr=[["e", "f"]]
arr << ["g", "h", "i", "j", "k"]
arr = [["e", "f"], ["g", "h", "i", "j", "k"]]
returning=[["e", "f"], ["g", "h", "i", "j", "k"]]
------------------------------------------------------
return value=[["e", "f"], ["g", "h", "i", "j", "k"]]
calculating recurse(v).each { |a| arr << [k,*a] }
a=["e", "f"]
[k, *a] = ["d", "e", "f"]
a=["g", "h", "i", "j", "k"]
[k, *a] = ["d", "g", "h", "i", "j", "k"]
arr = [["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"]]
k = l, v=["m", "n", "o", "p"],
arr=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"]]
arr << ["l", "m", "n", "o", "p"]
arr = [["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
returning=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
--------------------------------------------------------------
return value=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
calculating recurse(v).each { |a| arr << [k,*a] }
a=["b", "c"]
[k, *a] = ["a", "b", "c"]
a=["d", "e", "f"]
[k, *a] = ["a", "d", "e", "f"]
a=["d", "g", "h", "i", "j", "k"]
[k, *a] = ["a", "d", "g", "h", "i", "j", "k"]
a=["l", "m", "n", "o", "p"]
[k, *a] = ["a", "l", "m", "n", "o", "p"]
arr = [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"]]
k = q, v={"r"=>["s"]}, arr=[["a", "b", "c"], ["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"], ["a", "l", "m", "n", "o", "p"]]
calling recurse({"r"=>["s"]})
--------------------------------------------------------------
passed h = {"r"=>["s"]}
k = r, v=["s"], arr=[]
arr << ["r", "s"]
arr = [["r", "s"]]
returning=[["r", "s"]]
--------------------------------------------------------------
return value=[["r", "s"]]
----------------------------------------------------------------------
calculating recurse(v).each { |a| arr << [k,*a] }
a=["r", "s"]
[k, *a] = ["q", "r", "s"]
arr = [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
returning=[["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
----------------------------------------------------------------------
#=> [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
# ["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
This will return an Array with all the paths.
def paths(element, path = [], accu = [])
case element
when Hash
element.each do |key, value|
paths(value, path + [key], accu)
end
when Array
accu << (path + element)
end
accu
end
For nicer printing you can do
paths(tree_def).map { |path| path.join(".") }
See following which will keep calling recursively till it reaches to array values.
This recursion call will go with multiple branches and op should be individual copy for each branch so I used string which is always created as a new object here otherwise array will be like going with call by reference
hash = {"a"=>{"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]}, "l"=>["m", "n", "o", "p"]}, "q"=>{"r"=>["s"]}}
#output = []
def nested_array(h, op='')
h.map do |k,v|
if Hash === v
nested_array(v, op+k)
else
#output << (op+k+v.join).chars
end
end
end
nested_array(hash)
#output will be your desired array.
[
["a", "b", "c"],
["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"],
["q", "r", "s"]
]
update: key values pair can be more than single character so following approach for nested_array may work better.
def nested_array(h, op=[])
h.map do |k,v|
if Hash === v
nested_array(v, Array.new(op) << k)
else
#output << ((Array.new(op) << k) + v)
end
end
end
All the solutions here are recursive, below is a non-recursive
solution.
def flatten(input)
sol = []
while(input.length > 0)
unprocessed_input = []
input.each do |l, r|
if r.is_a?(Array)
sol << l + r
else
r.each { |k, v| unprocessed_input << [l + [k], v] }
end
end
input = unprocessed_input
end
return sol
end
flatten([[[], h]])
Code Explanation:
Hash in array form is [[k1, v1], [k2, v2]].
When input_hash is presented in the above form, [[], { a: {..} }], partial_solutions of this form, [ [a], {..} ], can be generated. Index '0' holds the partial solution and Index '1' holds the yet to be processed input.
As this format is easy to map partial_solution with unprocessed input and accumulate unprocessed input, converting input_hash to this format result in, [[[], input_hash]]
Solution:
[["a", "b", "c"], ["a", "l", "m", "n", "o", "p"], ["q", "r", "s"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"]]
Let's say I have nested array like:
nested = [
[0.5623507523876472, ["h", "e", "l", "l", "o"]],
[0.07381531933500263, ["h", "a", "l", "l", "o"]],
[0.49993338806153054, ["n", "i", "h", "a", "o"]],
[0.6499234734532127, ["k", "o", "n", "n", "i", "c", "h", "i", "w", "a"]]
]
Initially I wanted to convert it into hash. But first I have to convert array(above example ["h", "e", "l", "l", "o"]) to "hello".
So my question is how to convert nested into :
[
[0.5623507523876472, "hello"],
[0.07381531933500263, "hallo"],
[0.49993338806153054, "nihao"],
[0.6499234734532127, "konnichiwa"]
]
If you don't want to destroy the source array nested :
Use Array#map :
nested = [
[0.5623507523876472, ["h", "e", "l", "l", "o"]],
[0.07381531933500263, ["h", "a", "l", "l", "o"]],
[0.49993338806153054, ["n", "i", "h", "a", "o"]],
[0.6499234734532127, ["k", "o", "n", "n", "i", "c", "h", "i", "w", "a"]]
]
nested_map = nested.map { |a,b| [a,b.join] }
# => [[0.5623507523876472, "hello"],
# [0.07381531933500263, "hallo"],
# [0.49993338806153054, "nihao"],
# [0.6499234734532127, "konnichiwa"]]
If you want to destroy the source array nested
Use Arry#map! method :
nested = [
[0.5623507523876472, ["h", "e", "l", "l", "o"]],
[0.07381531933500263, ["h", "a", "l", "l", "o"]],
[0.49993338806153054, ["n", "i", "h", "a", "o"]],
[0.6499234734532127, ["k", "o", "n", "n", "i", "c", "h", "i", "w", "a"]]
]
nested.map! { |a,b| [a,b.join] }
# => [[0.5623507523876472, "hello"],
# [0.07381531933500263, "hallo"],
# [0.49993338806153054, "nihao"],
# [0.6499234734532127, "konnichiwa"]]