Recursively rearrange a flat array into a multi dimensional array - ruby

I'm using the ruby RGL library to save a nested structure and later restore it. Using graph.edges.as_json returns the below graph. Where Im stuck at is to turn this array into its nested equivilant.
Example:
[{"source"=>1, "target"=>8},
{"source"=>8, "target"=>10},
{"source"=>8, "target"=>13},
{"source"=>8, "target"=>9},
{"source"=>10, "target"=>102},
{"source"=>102, "target"=>103},
{"source"=>102, "target"=>105},
{"source"=>102, "target"=>101},
{"source"=>103, "target"=>104},
{"source"=>104, "target"=>101},
{"source"=>101, "target"=>96},
]
Needs to turn into:
[{source: 1,
target: [
{source: 8,
target: [
{source: 10,
target: [
{source: 102,
target: [
{source: 103,
target: [
{source: 104,
target: [
{source: 101,
target: [
{source: 96,
target: []
]}
}
]
...

We can use the following algorithm. Please refer to the comments within the code for details about how this works.
Specifically, note the usage of the block version of Hash.new which allows you to set a default value which is set and returned when accessing a key which is not part of the hash.
require 'set'
edges = [{"source"=>1, "target"=>8},
{"source"=>8, "target"=>10},
{"source"=>8, "target"=>13},
{"source"=>8, "target"=>9},
{"source"=>10, "target"=>102},
{"source"=>102, "target"=>103},
{"source"=>102, "target"=>105},
{"source"=>102, "target"=>101},
{"source"=>103, "target"=>104},
{"source"=>104, "target"=>101},
{"source"=>101, "target"=>96},
]
# Hash which stores the (sub-)trees, using the source id as the key and the tree
# structure as values
graph = Hash.new { |h, k| h[k] = {'source' => k, 'target' => []} }
# Al list of target IDs we have seen. We use this later to remove redundant
# sub-trees
targets = Set.new
edges.each do |edge|
source = edge['source']
target = edge['target']
# For each edge, store it in the graph. We use the predefined structure from
# the hash to add targets to a source. As we can identify any subtree by its
# source ID in the graph hash, this even allows us to add multiple targets
graph[source]['target'] << graph[target]
targets << target
end
# Cleanup redundant entries, i.e. all those which are referenced in the graph as
# targets. All remaining entries were not targets and are thus roots
targets.each { |target| graph.delete(target) }
# We now have the distinct trees in the graph hash, keyed by their respective
# root source id. To get an array of trees as requested, we can just take the
# values from this hash
trees = graph.values

Probably could be cleaned up a bit but it does seem to produce the desired result
class EdgeTree
attr_reader :edges, :source_groups, :roots, :targets
def initialize(edges: )
# store edges
#edges = edges
#partition "sources" amd "targets"
#roots,#targets = edges.map {|h| h.values_at("source","target")}.transpose
# remove all "targets" from "sources"
#targets.each(&roots.method(:delete))
# group edges by "source" for lookup
#source_groups = edges.group_by {|h| h["source"] }
end
# starting points for unique trees
def root_groups
#source_groups.slice(*#roots)
end
# accessor method to return desired output
# all: true will output all trees from their starting source
def create_tree(all: false)
link(current_group: all ? source_groups : root_groups)
end
private
# recursive method to build the tree
def link(current_group: nil )
current_group.map do |k,v|
{"source" => k,
"target" => [*v].map do |h|
if source_groups.key?(h["target"])
link(current_group: source_groups.slice(h["target"]))
else
{"source"=>h["target"], "target" => []}
end
end.flatten(1)
}
end
end
end
Usage
require 'pp'
edges = [{"source"=>1, "target"=>8},
{"source"=>8, "target"=>10},
{"source"=>8, "target"=>13},
{"source"=>8, "target"=>9},
{"source"=>10, "target"=>102},
{"source"=>102, "target"=>103},
{"source"=>102, "target"=>105},
{"source"=>102, "target"=>101},
{"source"=>103, "target"=>104},
{"source"=>104, "target"=>101},
{"source"=>101, "target"=>96},
]
pp EdgeTree.new(edges: edges).create_tree
Output:
[{"source"=>1,
"target"=>
[{"source"=>8,
"target"=>
[{"source"=>10,
"target"=>
[{"source"=>102,
"target"=>
[{"source"=>103,
"target"=>
[{"source"=>104,
"target"=>
[{"source"=>101,
"target"=>[{"source"=>96, "target"=>[]}]}]}]},
{"source"=>105, "target"=>[]},
{"source"=>101, "target"=>[{"source"=>96, "target"=>[]}]}]}]},
{"source"=>13, "target"=>[]},
{"source"=>9, "target"=>[]}]}]}]
Working Example: https://replit.com/#engineersmnky/EdgeTree

arr = [
{"source"=> 1, "target"=> 8},
{"source"=> 8, "target"=> 10},
{"source"=> 8, "target"=> 13},
{"source"=> 8, "target"=> 9},
{"source"=> 10, "target"=>102},
{"source"=>102, "target"=>103},
{"source"=>102, "target"=>105},
{"source"=>102, "target"=>101},
{"source"=>103, "target"=>104},
{"source"=>104, "target"=>101},
{"source"=>101, "target"=> 96}
]
First create a hash that maps the first instance of each node to the following node.
s_to_t = arr.uniq { |g| g["source"] }.map(&:values).to_h
#=> {1=>8, 8=>10, 10=>102, 102=>103, 103=>104, 104=>101, 101=>96}
Note that the first part of this calculation is as follows.
arr.uniq { |g| g["source"] }
#=> [
# {"source"=> 1, "target"=> 8},
# {"source"=> 8, "target"=> 10},
# {"source"=> 10, "target"=>102},
# {"source"=>102, "target"=>103},
# {"source"=>103, "target"=>104},
# {"source"=>104, "target"=>101},
# {"source"=>101, "target"=> 96}
# ]
Next, determine the source node, which is the key that does not appear as a value, assuming there is exactly one key having that property, which is the case with the example.
first_source = (s_to_t.keys - s_to_t.values).first
#=> 1
Now create a recursive method.
def recurse(node, s_to_t)
["source"=> node,
"target"=> s_to_t.key?(node) ? recurse(s_to_t[node], s_to_t) : []
]
end
Let's try it.
recurse(first_source, s_to_t)
#=> [
# {"source"=>1, "target"=>[
# {"source"=>8, "target"=>[
# {"source"=>10, "target"=>[
# {"source"=>102, "target"=>[
# {"source"=>103, "target"=>[
# {"source"=>104, "target"=>[
# {"source"=>101, "target"=>[
# {"source"=>96, "target"=>[]
# }
# ]
# }
# ]
# }
# ]
# }
# ]
# }
# ]
# }
# ]
# }
# ]
# }
# ]

Related

Im trying to make a 2D array in ruby that basically functions as a excel sheet for tracking inventory

I'm trying to create an inventory list in ruby that holds objects based on a name attribute, e.g.
class Item
attr_reader :name
def initialize(name)
#name = name
end
end
I'm trying to make the inventory look like this: (one row per name)
[
[Item.new("foo"), Item.new("foo"), Item.new("foo"), Item.new("foo")],
[Item.new("bar"), Item.new("bar")],
[Item.new("baz"), Item.new("baz"), Item.new("baz")]
]
I want to be able to push and pop from each row, and I want to be able to create a new row for a specific name if I haven't reached my capacity of 3. I tried to implement it like how I would vectors in C++, but I think I'm missing some syntax and don't really know where to begin to be honest.
You can initialize an nested array of size 3 via:
list = [[], [], []]
or via:
list = Array.new(3) { [] }
To push to one of the arrays you can use: ([0] refers to the array at index 0)
list[0].push(Item.new("foo"))
list
#=> [[#<Item #name="foo">], [], []]
To pop it off again:
list[0].pop
#=> #<Item #name="foo">
list
#=> [[], [], []]
To add another row, you simply push a new array to list:
list.push([])
# or
list << []
You might also want to take a look at hashes. A hash with a default value of [] can be very versatile:
list = Hash.new { |h, k| h[k] = [] }
list[:foo].push(Item.new("foo"))
list[:foo].push(Item.new("foo"))
list[:bar].push(Item.new("bar"))
list
#=> {
# :foo=>[
# #<Item #name="foo">,
# #<Item #name="foo">
# ],
# :bar=>[
# #<Item #name="bar">
# ]
# }
list[:foo].pop
#=> #<Item #name="foo">
list
#=> {
# :foo=>[
# #<Item #name="foo">,
# ],
# :bar=>[
# #<Item #name="bar">
# ]
# }

Extend an array of hash with values from an array

I have this array
types = ['first', 'second', 'third']
and this array of hashes
data = [{query: "A"}, {query: "B"}, {query:"C", type: 'first'}]
Now I have to "extend" each Hash of data with each type if not already exists. All existing keys of the hash must be copied too (eg. :query).
So the final result must be:
results = [
{query: "A", type: 'first'}, {query: "A", type: "second"}, {query: "A", type: "third"},
{query: "B", type: 'first'}, {query: "B", type: "second"}, {query: "D", type: "third"},
{query: "C", type: 'first'}, {query: "C", type: "second"}, {query: "C", type: "third"}
]
the data array is quite big for performance matters.
You can use Array#product to combine both arrays and Hash#merge to add the :type key:
data.product(types).map { |h, t| h.merge(type: t) }
#=> [
# {:query=>"A", :type=>"first"}, {:query=>"A", :type=>"second"}, {:query=>"A", :type=>"third"},
# {:query=>"B", :type=>"first"}, {:query=>"B", :type=>"second"}, {:query=>"B", :type=>"third"},
# {:query=>"C", :type=>"first"}, {:query=>"C", :type=>"second"}, {:query=>"C", :type=>"third"}
# ]
Note that the above will replace existing values for :type with the values from the types array. (there can only be one :type per hash)
If you need more complex logic, you can pass a block to merge which handles existing / conflicting keys, e.g.:
h = { query: 'C', type: 'first' }
t = 'third'
h.merge(type: t) { |h, v1, v2| v1 } # preserve existing value
#=> {:query=>"C", :type=>"first"}
h.merge(type: t) { |h, v1, v2| [v1, v2] } # put both values in an array
#=> {:query=>"C", :type=>["first", "third"]}
We see that each hash in data is mapped to an array of three hashes and the resulting array of three arrays is then to be flattended, suggesting we skip a step by using the method Enumerable#flat_map on data. The construct is as follows.
n = types.size
#=> 3
data.flat_map { |h| n.times.map { |i| ... } }
where ... produces a hash such as
{:query=>"A", :type=>"second"}
Next we see that the value of :type in the array of hashes returned equals :first then :second then :third then :first and so on. That is, the value cycles among the elements of types. Also, the fact that one of the hashes in data has a key :type is irrelevant, as it will be overwritten. Therefore, for each value of i (0, 1 or 2) in map's block above, we wish to merge h with { type: types[i%n] }:
n = types.size
data.flat_map { |h| n.times.map { |i| h.merge(type: types[i%n]) } }
#=> [{:query=>"A", :type=>"first"}, {:query=>"A", :type=>"second"},
# {:query=>"A", :type=>"third"},
# {:query=>"B", :type=>"first"}, {:query=>"B", :type=>"second"},
# {:query=>"B", :type=>"third"},
# {:query=>"C", :type=>"first"}, {:query=>"C", :type=>"second"},
# {:query=>"C", :type=>"third"}]
We may alternatively make use of the method Array#cycle.
enum = types.cycle
#=> #<Enumerator: ["first", "second", "third"]:cycle>
As the name of the method suggests,
enum.next
#=> "first"
enum.next
#=> "second"
enum.next
#=> "third"
enum.next
#=> "first"
enum.next
#=> "second"
...
ad infinitum. Before continuing let me reset the enumerator.
enum.rewind
See Enumerator#next and Enumerator#rewind.
n = types.size
data.flat_map { |h| n.times.map { h.merge(type: enum.next) } }
#=> <as above>

Ruby chain two 'group_by' methods

I have an array of objects that looks like this:
[
{day: 'Monday', class: 1, name: 'X'},
{day: 'Monday', class: 2, name: 'Y'},
{day: 'Tuesday', class: 1, name: 'Z'},
{day: 'Monday', class: 1, name: 'T'}
]
I want to group them by days, and then by classes i.e.
groupedArray['Monday'] => {'1' => [{name: 'X'}, {name: 'T'}], '2' => [{name: 'Y'}]}
I've seen there is a
group_by { |a| [a.day, a.class]}
But this creates a hash with a [day, class] key.
Is there a way I can achieve this, without having to group them first by day, and then iterate through each day, and group them by class, then pushing them into a new hash?
arr = [
{day: 'Monday', class: 1, name: 'X'},
{day: 'Monday', class: 2, name: 'Y'},
{day: 'Tuesday', class: 1, name: 'Z'},
{day: 'Monday', class: 1, name: 'T'}
]
One way of obtaining the desired hash is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. Here that is done twice, first when values of :day are the same, then for each such occurrence, when the values of :class are the same (for a given value of :day).
arr.each_with_object({}) { |g,h|
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] }) { |_,h1,h2|
h1.update(h2) { |_,p,q| p+q } } }
#=> {"Monday" =>{"1"=>[{:name=>"X"}, {:name=>"T"}], "2"=>[{:name=>"Y"}]},
# "Tuesday"=>{"1"=>[{:name=>"Z"}]}}
The steps are as follows.
enum = arr.each_with_object({})
#=> #<Enumerator: [{:day=>"Monday", :class=>1, :name=>"X"},
# {:day=>"Monday", :class=>2, :name=>"Y"},
# {:day=>"Tuesday", :class=>1, :name=>"Z"},
# {:day=>"Monday", :class=>1, :name=>"T"}]:each_with_object({})>
We can see the values that will be generated by this enumerator by converting it to an array:
enum.to_a
#=> [[{:day=>"Monday", :class=>1, :name=>"X"}, {}],
# [{:day=>"Monday", :class=>2, :name=>"Y"}, {}],
# [{:day=>"Tuesday", :class=>1, :name=>"Z"}, {}],
# [{:day=>"Monday", :class=>1, :name=>"T"}, {}]]
The empty hash in each array is the hash being built and returned. It is initially empty, but will be partially formed as each element of enum is processed.
The first element of enum is passed to the block (by Enumerator#each) and the block variables are assigned using parallel assignment (somtimes called multiple assignment):
g,h = enum.next
#=> [{:day=>"Monday", :class=>1, :name=>"X"}, {}]
g #=> {:day=>"Monday", :class=>1, :name=>"X"}
h #=> {}
We now perform the block calculation:
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
#=> {}.update("Monday"=>{ "1"=>[{name: "X"}] })
#=> {"Monday"=>{"1"=>[{:name=>"X"}]}}
This operation returns the updated value of h, the hash being constructed.
Note that update's argument
"Monday"=>{ "1"=>[{name: "X"}] }
is shorthand for
{ "Monday"=>{ "1"=>[{name: "X"}] } }
Because the key "Monday" was not present in both hashes being merged (h had no keys), the block
{ |_,h1,h2| h1.update(h2) { |_,p,q| p+q } } }
was not used to determine the value of "Monday".
Now the next value of enum is passed to the block and the block variables are assigned:
g,h = enum.next
#=> [{:day=>"Monday", :class=>2, :name=>"Y"},
# {"Monday"=>{"1"=>[{:name=>"X"}]}}]
g #=> {:day=>"Monday", :class=>2, :name=>"Y"}
h #=> {"Monday"=>{"1"=>[{:name=>"X"}]}}
Note that h was updated. We now perform the block calculation:
h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
# {"Monday"=>{"1"=>[{:name=>"X"}]}}.update("Monday"=>{ "2"=>[{name: "Y"}] })
Both hashes being merged share the key "Monday". We therefore must use the block to determine the merged value of "Monday":
{ |k,h1,h2| h1.update(h2) { |m,p,q| p+q } } }
#=> {"1"=>[{:name=>"X"}]}.update("2"=>[{name: "Y"}])
#=> {"1"=>[{:name=>"X"}], "2"=>[{:name=>"Y"}]}
See the doc for update for an explanation of the block variables k, h1 and h2 for the outer update and m, p and q for the inner update. k and m are the values of the common key. As they are not used in the block calculations, I have replaced them with underscores, which is common practice.
So now:
h #=> { "Monday" => { "1"=>[{ :name=>"X" }], "2"=>[{ :name=>"Y"}] } }
Prior to this operation the hash h["Monday] did not yet have a key 2, so the second update did not require use of the block
{ |_,p,q| p+q }
This block is used, however, when the last element of enum is merged into h, since the values of both :day and :class are the same for the two hashes being merged.
The remaining calculations are similar.

Ruby - sort_by using dynamic keys

I have an array of hashes:
array = [
{
id: 1,
name: "A",
points: 20,
victories: 4,
goals: 5,
},
{
id: 1,
name: "B",
points: 20,
victories: 4,
goals: 8,
},
{
id: 1,
name: "C",
points: 21,
victories: 5,
goals: 8,
}
]
To sort them using two keys I do:
array = array.group_by do |key|
[key[:points], key[:goals]]
end.sort_by(&:first).map(&:last)
But in my program, the sort criterias are stored in a database and I can get them and store in a array for example: ["goals","victories"] or ["name","goals"].
How can I sort the array using dinamic keys?
I tried many ways with no success like this:
criterias_block = []
criterias.each do |criteria|
criterias_block << "key[:#{criteria}]"
end
array = array.group_by do |key|
criterias_block
end.sort_by(&:first).map(&:last)
Array#sort can do this
criteria = [:points, :goals]
array.sort_by { |entry|
criteria.map { |c| entry[c] }
}
#=> [{:id=>1, :name=>"A", :points=>20, :victories=>4, :goals=>5},
# {:id=>1, :name=>"B", :points=>20, :victories=>4, :goals=>8},
# {:id=>1, :name=>"C", :points=>21, :victories=>5, :goals=>8}]
This works because if you sort an array [[1,2], [1,1], [2,3]], it sorts by the first elements, using any next elements to break ties
You can use values_at:
criteria = ["goals", "victories"]
criteria = criteria.map(&:to_sym)
array = array.group_by do |key|
key.values_at(*criteria)
end.sort_by(&:first).map(&:last)
# => [[{:id=>1, :name=>"A", :points=>20, :victories=>4, :goals=>5},
# {:id=>1, :name=>"B", :points=>20, :victories=>4, :goals=>8},
# {:id=>1, :name=>"C", :points=>21, :victories=>5, :goals=>8}]]
values_at returns an array of all the keys requested:
array[0].values_at(*criteria)
# => [4, 5]
I suggest doing it like this.
Code
def sort_it(array,*keys)
array.map { |h| [h.values_at(*keys), h] }.sort_by(&:first).map(&:last)
end
Examples
For array as given by you:
sort_it(array, :goals, :victories)
#=> [{:id=>1, :name=>"A", :points=>20, :victories=>4, :goals=>5},
# {:id=>1, :name=>"B", :points=>20, :victories=>4, :goals=>8},
# {:id=>1, :name=>"C", :points=>21, :victories=>5, :goals=>8}]
sort_it(array, :name, :goals)
#=> [{:id=>1, :name=>"A", :points=>20, :victories=>4, :goals=>5},
# {:id=>1, :name=>"B", :points=>20, :victories=>4, :goals=>8},
# {:id=>1, :name=>"C", :points=>21, :victories=>5, :goals=>8}]
For the first of these examples, you could of course write:
sort_it(array, *["goals", "victories"].map(&:to_sym))

How do I count items for some time period?

I have records in my database like:
id | item_name | 2013-06-05T17:55:13+03:00
I want to group them by 'items per Day', 'items per Hour', 'items per 20 minutes'.
What is the best way to implement it?
The simple way:
by_day = array.group_by{|a| a.datetime.to_date}
by_hour = array.group_by{|a| [a.datetime.to_date, a.datetime.hour]}
by_20_minutes = array.group_by{|a| [a.datetime.to_date, a.datetime.hour, a.datetime.minute/20]}
require 'time'
def group_by_period(items)
groups = { :day => {}, :hour => {}, :t20min => {} }
items.reduce(groups) do |memo, item|
# Compute the correct buckets for the item's timestamp.
timestamp = Time.parse(item[2]).utc
item_day = timestamp.to_date.to_s
item_hour = timestamp.iso8601[0..12]
item_20min = timestamp.iso8601[0..15]
item_20min[14..18] = (item_20min[14..15].to_i / 20) * 20
# Place the item in each bucket.
[[:day,item_day], [:hour,item_hour], [:t20min,item_20min]].each do |k,v|
memo[k][v] = [] unless memo[k][v]
memo[k][v] << item
end
memo
end
end
sample_db_output = [
[1, 'foo', '2010-01-01T12:34:56Z'],
[2, 'bar', '2010-01-02T12:34:56Z'],
[3, 'gah', '2010-01-02T13:34:56Z'],
[4, 'zip', '2010-01-02T13:54:56Z']
]
group_by_period(sample_db_output)
# {:day=>
# {"2010-01-01"=>[[1, "foo", "2010-01-01T12:34:56Z"]],
# "2010-01-02"=>
# [[2, "bar", "2010-01-02T12:34:56Z"],
# [3, "gah", "2010-01-02T13:34:56Z"],
# [4, "zip", "2010-01-02T13:54:56Z"]]},
# :hour=>
# {"2010-01-01T12"=>[[1, "foo", "2010-01-01T12:34:56Z"]],
# "2010-01-02T12"=>[[2, "bar", "2010-01-02T12:34:56Z"]],
# "2010-01-02T13"=>
# [[3, "gah", "2010-01-02T13:34:56Z"], [4, "zip", "2010-01-02T13:54:56Z"]]},
# :t20min=>
# {"2010-01-01T12:20:00"=>[[1, "foo", "2010-01-01T12:34:56Z"]],
# "2010-01-02T12:20:00"=>[[2, "bar", "2010-01-02T12:34:56Z"]],
# "2010-01-02T13:20:00"=>[[3, "gah", "2010-01-02T13:34:56Z"]],
# "2010-01-02T13:40:00"=>[[4, "zip", "2010-01-02T13:54:56Z"]]}}

Resources