Grouping data from nested arrays in ruby - ruby

Assuming the following data tuple containing a person's name, age and the books he has read:
list = [
["Peter", 21, ["Book 1", "Book 2", "Book 3", "Book 4"],
["Amy", 19, ["Book 3", "Book 4"],
["Sanders", 32, ["Book 1", "Book 2",],
["Charlie", 21, ["Book 4", "Book 5", "Book 6"],
["Amanda", 21, ["Book 2", "Book 5"]
]
What is the optimal way to extract names grouped by the books read, into the following format (basically a an array of arrays containing the book name and an array of names of people who read it)
results = [
["Book 1", ["Sanders", "Peter"]],
["Book 2", ["Sanders" "Amanda", "Peter"]],
["Book 3", ["Peter", "Amy"]],
["Book 4", ["Charlie", "Peter", "Amy"]],
["Book 5", ["Amanda","Charlie"]],
["Book 6", ["Charlie"]]
]
I've tried the following iterating method which extracts the lists of names and puts them into a hash, with the book title as the keys.
book_hash = Hash.new([])
list.each { |name,age,books|
books { |x| book_hash[x] = book_hash[x] + [name] }
}
results = book_hash.to_a.sort
However, the above method seems rather inefficient when handling large datasets containing millions of names. I've attempted to use the Array.group_by, but so far I'm unable to make it work with nested arrays.
Does anyone have any idea about the above?

Hash output. More suitable.
list.each_with_object({}) do |(name, age, books), hash|
books.each do |book|
(hash[book] ||= []) << name
end
end
If you must make it an array, then append a .to_a to the output of the above.

Related

Iterating over a list to append an array to a key/value pair

I have a list like follows
ID MODEL
001 Model A
001 Model B
001 Model C
002 Model A
002 Model B
002 Model D
I have to perform a query based on the ID, which I have working currently. It's currently performing one query per line in the list. It seems like it would be much faster to reduce the number of queries I have to run.
I thought if i had a hash that looked like this:
{
:001 => ["Model A", "Model B", "Model C"],
:002 => ["Model A", "Model B", "Model D"]
}
I would be able to perform less queries.
The problem I am having is being able to determine how it is possible to iterate over a list like this and generate the hash necessary.
Right now my code looks like this:
id = parsed_line[0]
model = parsed_line[1]
hash["#{id}"] = models << model
inside of a loop that iterates over the text file. Where "models" is an array of the model list.
The problem from here is that the hash then looks like this:
{
:001 => ["Model A", "Model B", "Model C", "Model A", "Model B", "Model D"],
:002 => ["Model A", "Model B", "Model C", "Model A", "Model B", "Model D"]
}
I understand why it's happening, but I do not understand how to get the desired hash.
I'm assuming here that parsed_line is an array that looks like this: [ "001", "Model A" ]. I don't know what models is, but I'm guessing it's unnecessary. Something like this ought to work:
parsed_lines = [ [ "001", "Model A" ],
[ "002", "Model B" ],
# ...
]
hash = {}
parsed_lines.each do |id, model|
hash[id] ||= []
hash[id] << model
end
Or, more simply (using Hash.new's handy block default):
hash = Hash.new {|hash, id| hash[id] = [] } # If `hash[id]` isn't yet set when
# we try to access it, automatically
# initialize it with an empty array
parsed_lines.each do |id, model|
hash[id] << model
end
I am guessing that you don't have any database. If all you have is a two dimentional array and you want to convert it into a hash, then this is what you should do
model_array= [["001", "Model A"], ["001", "Model B"], ["001", "Model C"], ["002", "Model A"], ["002", "Model B"], ["002", "Model D"]]
hash = {}
model_arrays.each do |arr|
hash[arr[0]] ||= []
hash[arr[0]] << arr[1]
end
How about this one liner (sans declaration of parsed_lines)
parsed_lines = [
["001", "Model A"],
["001", "Model B"],
["001", "Model C"],
["002", "Model A"],
["002", "Model B"],
["002", "Model D"]
]
Hash[parsed_lines.group_by(&:first).map{|k,v| [k.to_sym,v.map(&:last)]}]
#=> {:"001"=>["Model A", "Model B", "Model C"], :"002"=>["Model A", "Model B", "Model D"]}
But it is probably better to do what #ArupRakshit stated and use a group_by query and skip this additional processing step.

Pulling out Keys and Values from an hash of arrays

I have an hash like this -
{"examples"=>
[{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>711},
"url"=> "http://example.com/1",
"reference"=>"abc",
"text"=> "Sample text 1",
"title"=> "Sample Title 1",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>3243},
"url"=> "http://example.com/2",
"reference"=>"dec",
"text"=> "Sample text 2",
"title"=> "Sample Title 2",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>191920},
"url"=> "http://example.com/3",
"reference"=>"wer",
"text"=> "Sample text 3",
"title"=> "Sample Title 3",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115}]
}
and I would like to create a new array by pulling out the keys, and values for just the "text", "url" and "title" keys like below.
[
{"text"=> "Sample text 1", "title"=> "Sample Title 1", "url"=> "http://example.com/1"},
{"text"=> "Sample text 2", "title"=> "Sample Title 2", "url"=> "http://example.com/2"},
{"text"=> "Sample text 3", "title"=> "Sample Title 3", "url"=> "http://example.com/3"}
]
Any help is sincerely appreciated.
You should do as
hash['examples'].map do |hash|
keys = ["text", "title", "url"]
keys.zip(hash.values_at(*keys)).to_h
end
If you are below < 2.1 use,
Hash[keys.zip(hash.values_at(*keys))]
Here's another way this could be done (where h is the hash given in the question).
KEEPERS = ['text','url','title']
h.each_key.with_object({}) { |k,g|
g[k] = h[k].map { |h| h.select { |sk,_| KEEPERS.include? sk } } }
#=> {"examples"=>[
# [{"url"=>"http://example.com/1", "text"=>"Sample text 1",
# "title"=>"Sample Title 1"},
# {"url"=>"http://example.com/2", "text"=>"Sample text 2",
# "title"=>"Sample Title 2"},
# {"url"=>"http://example.com/3", "text"=>"Sample text 3",
# "title"=>"Sample Title 3"}]}
Here we simply create a new hash (denoted by the outer block variable g) which has all the keys of the original hash h (just one, "examples", but there could be more), and for each associated value, which is an array of hashes, we use Enumerable#map and Hash#select to retain only the desired key/value pairs from each of those hashes.

Sort Ruby String Array by the number in the string

If I have a string array that looks like this:
array = ["STRING1", "STRING05", "STRING20", "STRING4", "STRING3"]
or
array = ["STRING: 1", "STRING: 05", "STRING: 20", "STRING: 4", "STRING: 3"]
How can I sort the array by the number in each string (descending)?
I know that If the array consisted of integers and not strings, I could use:
sort_by { |k, v| -k }
I've searched all around but can't come up with a solution
The below would sort by the number in each string and not the string itself
array.sort_by { |x| x[/\d+/].to_i }
=> ["STRING: 1", "STRING: 2", "STRING: 3", "STRING: 4", "STRING: 5"]
descending order:
array.sort_by { |x| -(x[/\d+/].to_i) }
=> ["STRING: 5", "STRING: 4", "STRING: 3", "STRING: 2", "STRING: 1"]
sort the array by the number in each string (descending)
array.sort_by { |x| -x[/\d+/].to_i }

Selecting items from a Ruby Hash

I have a hash in Ruby that looks like this:
{"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
I want to select the contents of the "Values" element by using the name from the "Names" element, e.g., locate the "Data 3" string by searching for "Field 3" etc.
You could use the Enumerable#find method to find the hash by name:
hash = {"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
p hash['NameValues'].find{ |h| h['Name'] == 'Field 3'}['Values']
#=> ["Data 3"]
find basically iterates through the NameValues array until a matching element is found. You can then get the Values from the returned element.

Array to Hash Ruby

Convert this Array:
a = ["item 1", "item 2", "item 3", "item 4"]
...to a Hash:
{ "item 1" => "item 2", "item 3" => "item 4" }
i.e. elements at even indexes are keys and odd ones are values.
a = ["item 1", "item 2", "item 3", "item 4"]
h = Hash[*a] # => { "item 1" => "item 2", "item 3" => "item 4" }
That's it. The * is called the splat operator.
One caveat per #Mike Lewis (in the comments): "Be very careful with this. Ruby expands splats on the stack. If you do this with a large dataset, expect to blow out your stack."
So, for most general use cases this method is great, but use a different method if you want to do the conversion on lots of data. For example, #Łukasz Niemier (also in the comments) offers this method for large data sets:
h = Hash[a.each_slice(2).to_a]
Ruby 2.1.0 introduced a to_h method on Array that does what you require if your original array consists of arrays of key-value pairs: http://www.ruby-doc.org/core-2.1.0/Array.html#method-i-to_h.
[[:foo, :bar], [1, 2]].to_h
# => {:foo => :bar, 1 => 2}
Just use Hash.[] with the values in the array. For example:
arr = [1,2,3,4]
Hash[*arr] #=> gives {1 => 2, 3 => 4}
Or if you have an array of [key, value] arrays, you can do:
[[1, 2], [3, 4]].inject({}) do |r, s|
r.merge!({s[0] => s[1]})
end # => { 1 => 2, 3 => 4 }
This is what I was looking for when googling this:
[{a: 1}, {b: 2}].reduce({}) { |h, v| h.merge v }
=> {:a=>1, :b=>2}
Enumerator includes Enumerable. Since 2.1, Enumerable also has a method #to_h. That's why, we can write :-
a = ["item 1", "item 2", "item 3", "item 4"]
a.each_slice(2).to_h
# => {"item 1"=>"item 2", "item 3"=>"item 4"}
Because #each_slice without block gives us Enumerator, and as per the above explanation, we can call the #to_h method on the Enumerator object.
You could try like this, for single array
irb(main):019:0> a = ["item 1", "item 2", "item 3", "item 4"]
=> ["item 1", "item 2", "item 3", "item 4"]
irb(main):020:0> Hash[*a]
=> {"item 1"=>"item 2", "item 3"=>"item 4"}
for array of array
irb(main):022:0> a = [[1, 2], [3, 4]]
=> [[1, 2], [3, 4]]
irb(main):023:0> Hash[*a.flatten]
=> {1=>2, 3=>4}
a = ["item 1", "item 2", "item 3", "item 4"]
Hash[ a.each_slice( 2 ).map { |e| e } ]
or, if you hate Hash[ ... ]:
a.each_slice( 2 ).each_with_object Hash.new do |(k, v), h| h[k] = v end
or, if you are a lazy fan of broken functional programming:
h = a.lazy.each_slice( 2 ).tap { |a|
break Hash.new { |h, k| h[k] = a.find { |e, _| e == k }[1] }
}
#=> {}
h["item 1"] #=> "item 2"
h["item 3"] #=> "item 4"
All answers assume the starting array is unique. OP did not specify how to handle arrays with duplicate entries, which result in duplicate keys.
Let's look at:
a = ["item 1", "item 2", "item 3", "item 4", "item 1", "item 5"]
You will lose the item 1 => item 2 pair as it is overridden bij item 1 => item 5:
Hash[*a]
=> {"item 1"=>"item 5", "item 3"=>"item 4"}
All of the methods, including the reduce(&:merge!) result in the same removal.
It could be that this is exactly what you expect, though. But in other cases, you probably want to get a result with an Array for value instead:
{"item 1"=>["item 2", "item 5"], "item 3"=>["item 4"]}
The naïve way would be to create a helper variable, a hash that has a default value, and then fill that in a loop:
result = Hash.new {|hash, k| hash[k] = [] } # Hash.new with block defines unique defaults.
a.each_slice(2) {|k,v| result[k] << v }
a
=> {"item 1"=>["item 2", "item 5"], "item 3"=>["item 4"]}
It might be possible to use assoc and reduce to do above in one line, but that becomes much harder to reason about and read.

Resources