Pulling out Keys and Values from an hash of arrays - ruby

I have an hash like this -
{"examples"=>
[{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>711},
"url"=> "http://example.com/1",
"reference"=>"abc",
"text"=> "Sample text 1",
"title"=> "Sample Title 1",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>3243},
"url"=> "http://example.com/2",
"reference"=>"dec",
"text"=> "Sample text 2",
"title"=> "Sample Title 2",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>191920},
"url"=> "http://example.com/3",
"reference"=>"wer",
"text"=> "Sample text 3",
"title"=> "Sample Title 3",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115}]
}
and I would like to create a new array by pulling out the keys, and values for just the "text", "url" and "title" keys like below.
[
{"text"=> "Sample text 1", "title"=> "Sample Title 1", "url"=> "http://example.com/1"},
{"text"=> "Sample text 2", "title"=> "Sample Title 2", "url"=> "http://example.com/2"},
{"text"=> "Sample text 3", "title"=> "Sample Title 3", "url"=> "http://example.com/3"}
]
Any help is sincerely appreciated.

You should do as
hash['examples'].map do |hash|
keys = ["text", "title", "url"]
keys.zip(hash.values_at(*keys)).to_h
end
If you are below < 2.1 use,
Hash[keys.zip(hash.values_at(*keys))]

Here's another way this could be done (where h is the hash given in the question).
KEEPERS = ['text','url','title']
h.each_key.with_object({}) { |k,g|
g[k] = h[k].map { |h| h.select { |sk,_| KEEPERS.include? sk } } }
#=> {"examples"=>[
# [{"url"=>"http://example.com/1", "text"=>"Sample text 1",
# "title"=>"Sample Title 1"},
# {"url"=>"http://example.com/2", "text"=>"Sample text 2",
# "title"=>"Sample Title 2"},
# {"url"=>"http://example.com/3", "text"=>"Sample text 3",
# "title"=>"Sample Title 3"}]}
Here we simply create a new hash (denoted by the outer block variable g) which has all the keys of the original hash h (just one, "examples", but there could be more), and for each associated value, which is an array of hashes, we use Enumerable#map and Hash#select to retain only the desired key/value pairs from each of those hashes.

Related

Group List of hashes and index the values

I has an array of hashes
Some hashes are duplicate
I want to keep the duplicate, but add counter to the title
For example "TITLE #1" And "TITLE #2"
This is my Array
list = []
#temp = {}
#temp["name"] = "Germany"
#temp["id"] = 1
list << #temp
#temp["name"] = "USA"
#temp["id"] = 2
list << #temp
#temp["name"] = "USA"
#temp["id"] = 3
list << #temp
#temp["name"] = "France"
#temp["id"] = 4
list << #temp
#temp["name"] = "France"
#temp["id"] = 5
list << #temp
#temp["name"] = "France"
#temp["id"] = 6
list << #temp
I Want the result Same as the source but near "USA" add the counter "USA #1" and "USA #2"
And France change to "France #1", "France #2" "France #3"
No change on germany element because there are not multiple items
You need #temp = 0 at the beginning of each block of code.
After executing your code with the modification
list = [{"name"=>"Germany", "id"=>1},
{"name"=>"USA", "id"=>2},
{"name"=>"USA", "id"=>3},
{"name"=>"France", "id"=>4},
{"name"=>"France", "id"=>5},
{"name"=>"France", "id"=>6}]
We can then obtain your desired result as follows.
list.group_by { |h| h["name"] }.values.flat_map do |a|
a.map.with_index(1) do |h,i|
base = h["name"]
h.merge("name"=>base +" #{i}")
end
end
#=> [{"name"=>"Germany 1", "id"=>1},
# {"name"=>"USA 1", "id"=>2},
# {"name"=>"USA 2", "id"=>3},
# {"name"=>"France 1", "id"=>4},
# {"name"=>"France 2", "id"=>5},
# {"name"=>"France 3", "id"=>6}]
Note
arr = list.group_by { |h| h["name"] }.values
#=> [[{"name"=>"Germany", "id"=>1}],
# [{"name"=>"USA", "id"=>2}, {"name"=>"USA", "id"=>3}],
# [{"name"=>"France", "id"=>4}, {"name"=>"France", "id"=>5},
# {"name"=>"France", "id"=>6}]]
Had I used Enumerable#map rather than Enumerable#flat_map, the result would have been
[[{"name"=>"Germany 1", "id"=>1}],
[{"name"=>"USA 1", "id"=>2}, {"name"=>"USA 2", "id"=>3}],
[{"name"=>"France 1", "id"=>4}, {"name"=>"France 2", "id"=>5},
{"name"=>"France 3", "id"=>6}]]
Using flat_map is equivalent to inserting a splat in front of each of this array's elements.
[*[{"name"=>"Germany 1", "id"=>1}],
*[{"name"=>"USA 1", "id"=>2}, {"name"=>"USA 2", "id"=>3}],
*[{"name"=>"France 1", "id"=>4}, {"name"=>"France 2", "id"=>5},
{"name"=>"France 3", "id"=>6}]]

Boost elastic [MoreLikeThis] search query for begining of array

I have elastic search documents with structure like this:
{
"name": "item1",
"storages": [
{"items": ["a", "b", "c", "d", "e", "f"]},
{"items": ["a 1", "b 2", "c 3", "d 4", "e 5", "f 6"]}]
}
{
"name": "item2",
"storages": [
{"items": ["d", "e", "f", "g", "h", "i", "j"]},
{"items": ["d 4", "e 5", "f 6", "g 7", "h 8", "i 9", "j 10"]}
]
}
and I want to search for sequence of strings, for example ["d 4","e 5"].
For this I use MoreLikeThis query:
{
"query": {
"more_like_this" : {
"fields" : ["storages.items"],
"like" : ["d 4","e 5"],
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}
and it works almost fine, but it returns "_score": 0.1620518 for first document and "_score": 0.13890153 for second.
I want to boost score for terms from the begining of array ('items'), so because "d 4", "e 5" appears on the begining of array it should be ranked higher.
Is there way to create such query in elasticsearch? May be it should be not more like this query?
Tricky part is that query could be something like ["d 4","e 5", "xxx"] (xxx not present in document, but it's ok)
as you can see in this answer to a related question,
arrays are indexed—made searchable—as multivalue fields, which are
unordered
so you can't count on the order when you search.
Even worse, the array of objects is not stored as you think.
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested datatype instead of the object datatype.

Grouping data from nested arrays in ruby

Assuming the following data tuple containing a person's name, age and the books he has read:
list = [
["Peter", 21, ["Book 1", "Book 2", "Book 3", "Book 4"],
["Amy", 19, ["Book 3", "Book 4"],
["Sanders", 32, ["Book 1", "Book 2",],
["Charlie", 21, ["Book 4", "Book 5", "Book 6"],
["Amanda", 21, ["Book 2", "Book 5"]
]
What is the optimal way to extract names grouped by the books read, into the following format (basically a an array of arrays containing the book name and an array of names of people who read it)
results = [
["Book 1", ["Sanders", "Peter"]],
["Book 2", ["Sanders" "Amanda", "Peter"]],
["Book 3", ["Peter", "Amy"]],
["Book 4", ["Charlie", "Peter", "Amy"]],
["Book 5", ["Amanda","Charlie"]],
["Book 6", ["Charlie"]]
]
I've tried the following iterating method which extracts the lists of names and puts them into a hash, with the book title as the keys.
book_hash = Hash.new([])
list.each { |name,age,books|
books { |x| book_hash[x] = book_hash[x] + [name] }
}
results = book_hash.to_a.sort
However, the above method seems rather inefficient when handling large datasets containing millions of names. I've attempted to use the Array.group_by, but so far I'm unable to make it work with nested arrays.
Does anyone have any idea about the above?
Hash output. More suitable.
list.each_with_object({}) do |(name, age, books), hash|
books.each do |book|
(hash[book] ||= []) << name
end
end
If you must make it an array, then append a .to_a to the output of the above.

Iterating over a list to append an array to a key/value pair

I have a list like follows
ID MODEL
001 Model A
001 Model B
001 Model C
002 Model A
002 Model B
002 Model D
I have to perform a query based on the ID, which I have working currently. It's currently performing one query per line in the list. It seems like it would be much faster to reduce the number of queries I have to run.
I thought if i had a hash that looked like this:
{
:001 => ["Model A", "Model B", "Model C"],
:002 => ["Model A", "Model B", "Model D"]
}
I would be able to perform less queries.
The problem I am having is being able to determine how it is possible to iterate over a list like this and generate the hash necessary.
Right now my code looks like this:
id = parsed_line[0]
model = parsed_line[1]
hash["#{id}"] = models << model
inside of a loop that iterates over the text file. Where "models" is an array of the model list.
The problem from here is that the hash then looks like this:
{
:001 => ["Model A", "Model B", "Model C", "Model A", "Model B", "Model D"],
:002 => ["Model A", "Model B", "Model C", "Model A", "Model B", "Model D"]
}
I understand why it's happening, but I do not understand how to get the desired hash.
I'm assuming here that parsed_line is an array that looks like this: [ "001", "Model A" ]. I don't know what models is, but I'm guessing it's unnecessary. Something like this ought to work:
parsed_lines = [ [ "001", "Model A" ],
[ "002", "Model B" ],
# ...
]
hash = {}
parsed_lines.each do |id, model|
hash[id] ||= []
hash[id] << model
end
Or, more simply (using Hash.new's handy block default):
hash = Hash.new {|hash, id| hash[id] = [] } # If `hash[id]` isn't yet set when
# we try to access it, automatically
# initialize it with an empty array
parsed_lines.each do |id, model|
hash[id] << model
end
I am guessing that you don't have any database. If all you have is a two dimentional array and you want to convert it into a hash, then this is what you should do
model_array= [["001", "Model A"], ["001", "Model B"], ["001", "Model C"], ["002", "Model A"], ["002", "Model B"], ["002", "Model D"]]
hash = {}
model_arrays.each do |arr|
hash[arr[0]] ||= []
hash[arr[0]] << arr[1]
end
How about this one liner (sans declaration of parsed_lines)
parsed_lines = [
["001", "Model A"],
["001", "Model B"],
["001", "Model C"],
["002", "Model A"],
["002", "Model B"],
["002", "Model D"]
]
Hash[parsed_lines.group_by(&:first).map{|k,v| [k.to_sym,v.map(&:last)]}]
#=> {:"001"=>["Model A", "Model B", "Model C"], :"002"=>["Model A", "Model B", "Model D"]}
But it is probably better to do what #ArupRakshit stated and use a group_by query and skip this additional processing step.

Selecting items from a Ruby Hash

I have a hash in Ruby that looks like this:
{"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
I want to select the contents of the "Values" element by using the name from the "Names" element, e.g., locate the "Data 3" string by searching for "Field 3" etc.
You could use the Enumerable#find method to find the hash by name:
hash = {"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
p hash['NameValues'].find{ |h| h['Name'] == 'Field 3'}['Values']
#=> ["Data 3"]
find basically iterates through the NameValues array until a matching element is found. You can then get the Values from the returned element.

Resources