Finding duplicates in nested arrays - ruby

I have a hash, which contains a hash, which contains a number of arrays, like this:
{ "bob" =>
{
"foo" => [1, 3, 5],
"bar" => [2, 4, 6]
},
"fred" =>
{
"foo" => [1, 7, 9],
"bar" => [8, 10, 12]
}
}
I would like to compare the arrays against the other arrays, and then alert me if they are duplicates. It is possible for hash["bob"]["foo"] and hash["fred"]["foo"] to have duplicates, but not for hash["bob"]["foo"] and hash["bob"]["bar"]. Same with hash["fred"].
I can't even figure out where to begin with this one. I suspect inject will be involved somewhere, but I could be wrong.

This snippet will return an array of duplicates for each key. Duplicates can only be generated for equal keys.
duplicates = (keys = h.values.map(&:keys).flatten.uniq).map do |key|
{key => h.values.map { |h| h[key] }.inject(&:&)}
end
This will return [{"foo"=>[1]}, {"bar"=>[]}] which indicates that the key foo was the only one containing a duplicate of 1.
The snippet above assume h is the variable name of your hash.

h = {
"bob" =>
{
"foo" => [1, 3, 5],
"bar" => [2, 4, 6]
},
"fred" =>
{
"foo" => [1, 7, 9],
"bar" => [1, 10, 12]
}
}
h.each do |k, v|
numbers = v.values.flatten
puts k if numbers.length > numbers.uniq.length
end

There are many ways to do it.
Here's one that should be easy to read.
It works in Ruby 1.9. It uses + to combine two arrays and then uses the uniq! operator to figure out whether there is a duplicate number.
h = { "bob" =>
{
"foo" => [1, 3, 5],
"bar" => [2, 4, 6]
},
"fred" =>
{
"foo" => [1, 7, 12],
"bar" => [8, 10, 12]
}
}
h.each do |person|
if (person[1]["foo"] + person[1]["bar"]).uniq! != nil
puts "Duplicate in #{person[1]}"
end
end

I'm not sure what exactly you are looking for. But at look at a possible solution, perhaps you can reuse something.
outer_hash.each do |person, inner_hash|
seen_arrays = Hash.new
inner_hash.each do |inner_key, array|
other = seen_arrays[array]
if other
raise "array #{person}/#{inner_key} is a duplicate of #{other}"
end
seen_arrays[array] = "#{person}/#{inner_key}"
end
end

Related

Group hash of arrays by array element

I have this data structure resulted from a query grouping
{
[0, "AR"]=>2,
[0, nil]=>1,
[0, "AQ"]=>6,
[1, nil]=>4,
[1, "AQ"]=>3,
[2, "BG"]=>1,
[2, nil]=>1,
}
I want to manipulate it so I end up with a structure grouped like this
{
0 => {
'AR' => 2,
'AQ' => 6,
nil => 1
},
1 => {
'AQ' => 1,
nil => 4
},
2 => {
'BG' => 1,
nil => 1
}
}
input = {
[0, "AR"]=>2,
[0, nil]=>1,
[0, "AQ"]=>6,
[1, nil]=>4,
[1, "AQ"]=>3,
[2, "BG"]=>1,
[2, nil]=>1,
}
result = {}
input.each do |k, v|
if result[k[0]]
result[k[0]].merge!({ k[1] => v })
else
result[k[0]] = { k[1] => v }
end
end
puts result
#{0=>{"AR"=>2, nil=>1, "AQ"=>6}, 1=>{nil=>4, "AQ"=>3}, 2=>{"BG"=>1, nil=>1}}
I think this is not the most succinct way, I hope some advice!
hash = {
[0, "AR"]=>2,
[0, nil]=>1,
[0, "AQ"]=>6,
[1, nil]=>4,
[1, "AQ"]=>3,
[2, "BG"]=>1,
[2, nil]=>1,
}
new_hash = {}
hash.each{|k, v| new_hash[k[0]] ||= {}; new_hash[k[0]].merge!({k[1] => v})}
puts new_hash # {0=>{"AR"=>2, nil=>1, "AQ"=>6}, 1=>{nil=>4, "AQ"=>3}, 2=>{"BG"=>1, nil=>1}}
Here is one more very similar to previous answers but with using of #each_with_object:
hash = {
[0, "AR"]=>2,
[0, nil]=>1,
[0, "AQ"]=>6,
[1, nil]=>4,
[1, "AQ"]=>3,
[2, "BG"]=>1,
[2, nil]=>1,
}
result_hash = Hash.new { |h,k| h[k] = {} }
hash.each_with_object(result_hash) do |((parrent_key, key), value), res|
res[parrent_key].merge!(key => value)
end
=> {0=>{"AR"=>2, nil=>1, "AQ"=>6}, 1=>{nil=>4, "AQ"=>3}, 2=>{"BG"=>1, nil=>1}}
I came up with an answer that doesn't require additional variable assignments in its enclosing scope (it has "referential transparency": https://en.wikipedia.org/wiki/Referential_transparency)
input
.group_by { |(arr, num)| arr.first }
.each_with_object(Hash.new) do |(key, vals), hsh|
vals.each do |((key, innerkey), innerval)|
hsh[key] ||= {}
hsh[key][innerkey] = innerval
end
hsh
end
# {0=>{"AR"=>2, nil=>1, "AQ"=>6}, 1=>{nil=>4, "AQ"=>3}, 2=>{"BG"=>1, nil=>1}}
Two high-level steps:
I noticed the output object is grouped by the first array element (here, 0/1/2). I use #group_by to create a hash with that structure.
# output of `#group_by` on first array element:
key: 0, vals: [ [[0, "AR"], 2], [[0, nil], 1], [[0, "AQ"], 6] ]
key: 1, vals: [ [[1, nil], 4], [[1, "AQ"], 3] ]
key: 2, vals: [ [[2, "BG"], 1], [[2, nil], 1] ]
I use #each_with_object to construct the nested hashes. For each vals array above, I extracted the second and third values by destructuring the arrays in the block parameter (((key, innerkey), innerval)) and then the hash assignment was straightforward.

How to instantiate a multilevel hash with an arbitrary depth

I want to create a hash from the result of a database query such that its keys are the column values except the last one, and the value are the last column value (or a default value). For example, if I have rows:
1 2 3 1
1 2 4 9
1 3 2 nil
and a default value of 111, I should get:
{
1 =>
{
2 => { 3 => 1, 4 => 9},
3 => { 2 => 111}
}
}
I want to make the method generic enough to handle an arbitrary number of columns, so the signature could be:
to_lookup(rows, default_value, value_column, *columns)
How would I go about that?
Update: forgot a comma in the output.
[Edit: after reading #cthulhu's answer, I think I may have misinterpreted the question. I assumed that consecutive rows were to be grouped, rather than all rows to be grouped. I will leave my answer for the former interpretation.]
I believe this is what you are looking for:
def hashify(arr)
return arr.first.first if arr.first.size == 1
arr.slice_when { |f,s| f.first != s.first }.
each_with_object({}) do |a,h|
key, *rest = a.transpose
h[key.first] = hashify(rest.transpose)
end
end
hashify [[1, 2, 3, 1], [1, 2, 4, 9], [1, 3, 2, nil]]
#=> {1=>{2=>{3=>1, 4=>9}, 3=>{2=>nil}}}
hashify [[1, 2, 3, 1], [1, 2, 4, 9], [2, 3, 2, nil]]
#=> {1=>{2=>{3=>1, 4=>9}}, 2=>{3=>{2=>nil}}}
Replacing nil with the default can be done before or after the construction of the hash.
Enumerable#slice_when was bestowed upon us in v2.2. For earlier versions, you could replace:
arr.slice_when { |f,s| f.first != s.first }
with
arr.chunk { |row| row.first }.map(&:last)
I simplified things by removing the ability to pass a default,
I also simplified the signature method to have only one parameter.
RSpec.describe "#to_lookup" do
def to_lookup(rows)
return rows.first.first if rows.flatten.size == 1
h = {}
rows.group_by { |e| e.first }.each_entry do |k, v|
v.each &:shift
h[k] = to_lookup(v)
end
h
end
let :input do
[
[1, 2, 3, 1],
[1, 2, 4, 9],
[1, 3, 2, 111],
]
end
let :output do
{
1 => {
2 => {3 => 1, 4 => 9},
3 => {2 => 111}
}
}
end
it { expect(to_lookup(input)).to eq(output) }
end
BTW I wonder what output do you want for following input:
1 2 3 1
1 2 3 2
EDIT: working code snippet: http://rubysandbox.com/#/snippet/566aefa80195f1000c000000

Sort hash by array

I have a hash and an array with same length like the following:
h = {:a => 1, :b => 2, :c => 3, :d => 4}
a = [2, 0, 1, 0]
I want to order the hash in increasing order of the values in the array. So the output would be something like:
h = {:b => 2, :d => 4, :c=> 3, :a => 1}
Ideally I want to introduce some randomness for ties. For the previous example, I want either the previous output or:
h = {:d => 4, :b => 2, :c=> 3, :a => 1}
This is what I tried.
b = a.zip(h).sort.map(&:last)
p Hash[b]
# => {:b=>2, :d=>4, :c=>3, :a=>1}
But I am not sure how to introduce the randomness.
h.to_a.sort_by.each_with_index{|el,i| [a[i], rand]}.to_h
You could modify what you have slightly:
def doit(h,a)
Hash[a.zip(h).sort_by { |e,_| [e,rand] }.map(&:last)]
end
doit(h,a) #=> { b=>2, d=>4, c=>3, a=>1 }
doit(h,a) #=> { d=>4, b=>2, c=>3, a=>1 }
doit(h,a) #=> { b=>2, d=>4, c=>3, a=>1 }
doit(h,a) #=> { b=>2, d=>4, c=>3, a=>1 }

Ruby: Building a hash from a string and two array values at a time

I'm trying to build a hash with:
hash = {}
strings = ["one", "two", "three"]
array = [1, 2, 3, 4, 5, 6]
so that I end up with:
hash = { "one" => [1, 2] ,
"two" => [3, 4] ,
"three" => [5, 6] }
I have tried:
strings.each do |string|
array.each_slice(2) do |numbers|
hash[string] = [numbers[0], numbers[1]]
end
end
But that yields:
hash = { "one" => [5,6] , "two" => [5,6], "three" => [5,6] }
I know why it does this (nested loops) but I don't know how to achieve what I'm looking for.
If you want a one-liner:
hash = Hash[strings.zip(array.each_slice(2))]
For example:
>> strings = ["one", "two", "three"]
>> array = [1, 2, 3, 4, 5, 6]
>> hash = Hash[strings.zip(array.each_slice(2))]
=> {"one"=>[1, 2], "two"=>[3, 4], "three"=>[5, 6]}
hash = {}
strings.each { |string| hash[string] = array.slice!(0..1) }
This is a solution using methods and techniques you seem familiar with. It is not a 'one liner' solution but if you are new might be more understandable for you. The first answer is very elegant though.
As Mu says, Zip method is the best choose:
Converts any arguments to arrays, then merges elements of self with corresponding elements from each argument. This generates a sequence of self.size n-element arrays, where n is one more that the count of arguments. If the size of any argument is less than enumObj.size, nil values are supplied. If a block is given, it is invoked for each output array, otherwise an array of arrays is returned.

How to quickly print Ruby hashes in a table format?

Is there a way to quickly print a ruby hash in a table format into a file?
Such as:
keyA keyB keyC ...
123 234 345
125 347
4456
...
where the values of the hash are arrays of different sizes. Or is using a double loop the only way?
Thanks
Try this gem I wrote (prints hashes, ruby objects, ActiveRecord objects in tables): http://github.com/arches/table_print
Here's a version of steenslag's that works when the arrays aren't the same size:
size = h.values.max_by { |a| a.length }.length
m = h.values.map { |a| a += [nil] * (size - a.length) }.transpose.insert(0, h.keys)
nil seems like a reasonable placeholder for missing values but you can, of course, use whatever makes sense.
For example:
>> h = {:a => [1, 2, 3], :b => [4, 5, 6, 7, 8], :c => [9]}
>> size = h.values.max_by { |a| a.length }.length
>> m = h.values.map { |a| a += [nil] * (size - a.length) }.transpose.insert(0, h.keys)
=> [[:a, :b, :c], [1, 4, 9], [2, 5, nil], [3, 6, nil], [nil, 7, nil], [nil, 8, nil]]
>> m.each { |r| puts r.map { |x| x.nil?? '' : x }.inspect }
[:a, :b, :c]
[ 1, 4, 9]
[ 2, 5, ""]
[ 3, 6, ""]
["", 7, ""]
["", 8, ""]
h = {:a => [1, 2, 3], :b => [4, 5, 6], :c => [7, 8, 9]}
p h.values.transpose.insert(0, h.keys)
# [[:a, :b, :c], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
No, there's no built-in function. Here's a code that would format it as you want it:
data = { :keyA => [123, 125, 4456], :keyB => [234000], :keyC => [345, 347] }
length = data.values.max_by{ |v| v.length }.length
widths = {}
data.keys.each do |key|
widths[key] = 5 # minimum column width
# longest string len of values
val_len = data[key].max_by{ |v| v.to_s.length }.to_s.length
widths[key] = (val_len > widths[key]) ? val_len : widths[key]
# length of key
widths[key] = (key.to_s.length > widths[key]) ? key.to_s.length : widths[key]
end
result = ""
data.keys.each {|key| result += key.to_s.ljust(widths[key]) + " " }
result += "\n"
for i in 0.upto(length)
data.keys.each { |key| result += data[key][i].to_s.ljust(widths[key]) + " " }
result += "\n"
end
# TODO write result to file...
Any comments and edits to refine the answer are very welcome.

Resources