Most efficient way to extract an item from a Ruby array of hashes - ruby

I have some large Ruby structures that I need to quickly extract data from. I have no control over the format of the data, although I'm open to transforming it under certain circumstances. What is the most efficient way to extract a single item from the following hash, when using the displayName as the 'key'.
[
{'displayName'=>'Some Key 1', 'values'=>[1,2,3]},
{'displayName'=>'Some Key 2', 'values'=>["Some text"]},
{'displayName'=>'Some Key 3', 'values'=>["Different text","More text"]},
{'displayName'=>'Some Key 4', 'values'=>[2012-12-12]}
]
Each hash has other keys in it that I've removed to assist understanding.
The challenge is that in certain circumstances, the displayName field will need to be matched on a prefix sub-string. Does anybody have any practical experience knowing when to use .each and match manually, or .select to get the common case exact matches and fallback for the prefixes afterwards. Or is there some common trick I'm missing.

If you're doing this once, you'll probably just have to iterate over the set and find what you need:
row = data.find do |row|
row['displayName'] == name
end
row && row['values']
If you're doing it more than once, you should probably make an indexed structure out of it with a simple transform to create a temporary derivative structure:
hashed = Hash[
data.collect do |row|
[ row['displayName'], row['values'] ]
end
]
hashed[name]

You can use simple select thought it may no be as fast as it could with large sized arrays:
data = [
{'displayName'=>'Some Key 1', 'values'=>[1,2,3]},
{'displayName'=>'Some Key 2', 'values'=>["Some text"]},
{'displayName'=>'Some Key 3', 'values'=>["Different text","More text"]},
{'displayName'=>'Some Key 4', 'values'=>[2012-12-12]}
]
data.select { |e| e['displayName'] == 'Some Key 2' }.first
You can group_by the desired key instead, wich will make access faster
hashed_data = data.group_by { |e| e['displayName'] }
hashed_data['Some Key 4']
=> [{"displayName"=>"Some Key 4", "values"=>[1988]}]

Related

Rxjs GroupBy, Reduce in order to Pivot on ID

I'm looking for a bit of help understanding this example taken from the rxjs docs.
Observable.of<Obj>({id: 1, name: 'aze1'},
{id: 2, name: 'sf2'},
{id: 2, name: 'dg2'},
{id: 1, name: 'erg1'},
{id: 1, name: 'df1'},
{id: 2, name: 'sfqfb2'},
{id: 3, name: 'qfs1'},
{id: 2, name: 'qsgqsfg2'}
)
.groupBy(p => p.id, p => p.name)
.flatMap( (group$) => group$.reduce((acc, cur) => [...acc, cur], ["" + group$.key]))
.map(arr => ({'id': parseInt(arr[0]), 'values': arr.slice(1)}))
.subscribe(p => console.log(p));
So the aim here is to group all the items by id and produce an object with a single ID and a values property which includes all the emitted names with matching IDs.
The second parameter to the groupBy operator identifies the return value. Effectively filtering the emitted object's properties down to the name. I suppose the same thing could be achieved by mapping the observable beforehand. Is it possible to pass more than one value to the return value parameter?
The line I am finding very confusing is this one:
.flatMap( (group$) => group$.reduce((acc, cur) => [...acc, cur], ["" + group$.key]))
I get that we now have three grouped observables (for the 3 ids) that are effectively arrays of emitted objects. With each grouped observable the aim of this code is to reduce it an array, where the first entry in the array is the key and subsequent entries in the array are the names.
But why is the reduce function initialized with ["" + group$.key], rather than just [group$.key]?
And why is this three dot notation [...acc, cur] used when returning the reduced array on each iteration?
But why is the reduce function initialized with ["" + group$.key], rather than just [group$.key]?
The clue to answer this question is in the .map() function a bit further down in the code.
.map(arr => ({'id': parseInt(arr[0]), 'values': arr.slice(1)}))
^^^^^^^^
Note the use parseInt. Without the "" + in the flatMap this simply wouldn't compile since you'd be passing a number type to a function that expects a string. Remove the parseInt and just use arr[0] and you can remove "" + as well.
And why is this three dot notation [...acc, cur] used when returning
the reduced array on each iteration?
The spread operator here is used to add to the array without mutating the array. But what does it do? It will copy the original array, take all the existing elements out of the array, and deposit the elements in the new array. In simpler words, take all elements in acc, copy them to a new array with cur in the end. Here is a nice blog post about object mutation in general.

Get the count of duplicate keys in a hash

I have a hash say
test = [ {:a1=>"a", :b1=>"q"},
{:a1=>"c", :b1=>"z"},
{:a1=>"a", :b1=>"zcq"} ]
Need to find out count of key with "a" (e.g. :a1=>"a") in a hash. The output should be 2 if i am searching for key as "a".
How to find the count of the selected key.
Try this one
test.count { |item| item[:a1] == 'a' }

Ruby put in order columns when creating CSV document from Mongoid

I need to create CSV document from database. So I want to organise columns in particular order and I have template of this order and this template stored as array of headers
header = ["header1", "header2", "header3", "header4", "header5"]
record = [{"header4" =>"value4"}, {"header3" =>"value3"}, {"header5"=>"value5"}, {"header1"=>"value1"}, {"header2"=>"value2"}]
I need to get array like tis
record = [{"header1" =>"value1"}, {"header2" =>"value2"}, {"header3"=>"value3"}, {"header4"=>"value4"}, {"header5"=>"value5"}]
but when I doing
csv<< mymodel.attributes.values.sort_by! { |h| header.index(h.keys[0])
It does not work
When you call mymodel.attributes, you get a Hash back which maps attributes names (as strings) to their values. If your attribute names are header1 through header5 then mymodel.attributes will be something like this:
{
'header1' => 'value1',
'header2' => 'value2',
'header3' => 'value3',
'header4' => 'value4',
'header5' => 'value5'
}
Of course, the order depends on how things come out of MongoDB. The easiest way to extract a bunch of values from a Hash in a specified order is to use values_at:
mymodel.attributes.values_at(*header)

Ruby: Hash w/ Arrays, Returning Associated Key If Value Is In Array

New to Ruby and have run out of ideas. I have an array of books that I would like to 1) Shelve 2) Find which shelf it is on 3) Remove it from the associated shelf if found. For brevity I have an array of 6 books. Each shelf contains 5 books.
library_catalog = [ "Book1", "Book2", "Book3", "Book4", "Book5", "Book6" ]
shelves = Hash.new(0)
catalog_slice = library_catalog.each_slice(5).to_a
count = 1
catalog_slice.each do | x |
shelves.merge!(count=>x)
count+=1
end
From this I know have a Hash w/ arrays as such
{1=>["Book1", "Book2", "Book3", "Book4", "Book5"], 2=>["Book6"]}
This is where I'm having trouble traversing the hash to find a match inside the array and return the key(shelf). If I have title = "Book1" and I am trying to match and return 1, how would I go about this?
I think this should work.
shelves.select { |k,v| v.include?("Book1")}.keys.first
selected the hashes that have a value equal to the title you are looking for (in this case "Book1")
get the keys for these hashes as an array
get the first entry in the array.
to remove the Book from the shelf try this:
key = shelves.select { |k,v| v.include?("Book1")}.keys.first
shelves[key].reject! { |b| b == "Book1" }
get a reference to the array and then reject the entry you want to remove

Ruby Nested Hash with Composite Unique Keys

Given a comma separated CSV file in the following format:
Day,User,Requests,Page Views,Browse Time,Total Bytes,Bytes Received,Bytes Sent
"Jul 25, 2012","abc123",3,0,0,13855,3287,10568
"Jul 25, 2012","abc230",1,0,0,1192,331,861
"Jul 25, 2012",,7,0,0,10990,2288,8702
"Jul 24, 2012","123456",3,0,0,3530,770,2760
"Jul 24, 2012","abc123",19,1,30,85879,67791,18088
I wanted to drop the entire dataset (1000 users over 30 days = 30,000 records) into a hash such that Key 1 may be a duplicate key, key 2 may be a duplicate key, but Key 1 & 2 will be unique together.
Example using line 1 above:
report_hash = "Jul 25, 2012" => "abc123" => {"PageRequest" => 3, "PageViews" => 0, "BrowseTime" => 0, "TotalBytes" => 13855, "BytesReceived" => 3287, "BytesSent" => 10568}
def hashing(file)
#read the CSV file into an Array
report_arr = CSV.read(file)
#drop the header row
report_arr.drop(1)
#Create an empty hash to save the data to
report_hash = {}
#for each row in the array,
#if the first element in the array is not a key in the hash, make one
report_arr.each{|row|
if report_hash[row[0]].nil?
report_hash[row[0]] = Hash.new
#If the key exists, does the 2nd key exist? if not, make one
elsif report_hash[row[0]][row[1]].nil?
report_hash[row[0]][row[1]] = Hash.new
end
#throw all the other data into the 2-key hash
report_hash[row[0]][row[1]] = {"PageRequest" => row[2].to_i, "PageViews" => row[3].to_i, "BrowseTime" => row[4].to_i, "TotalBytes" => row[5].to_i, "BytesReceived" => row[6].to_i, "BytesSent" => row[7].to_i}
}
return report_hash
end
I spent several hours learning hashes and associated content to get this far, but feel like there is a much more efficient method to do this. Any suggestions on the proper/more efficient way of creating a nested hash with the first two keys being the first two elements of the array such that they create a "composite" unique key?
You could use the array [day, user] as the hash key.
report_hash = {
["Jul 25, 2012","abc123"] =>
{
"PageRequest" => 3,
"PageViews" => 0,
"BrowseTime" => 0,
"TotalBytes" => 13855,
"BytesReceived" => 3287,
"BytesSent" => 10568
}
}
You just have to make sure the date and user always appear the same. If your date (for example) appears in a different format sometimes, you'll have to normalize it before using it to read or write the hash.
A similar way would be to convert the day + user into a string, using some delimiter between them. But you have to be more careful that the delimiter doesn't appear in the day or the user.
EDIT:
Also make sure you don't modify the hash keys. Using arrays as keys makes this a very easy mistake to make. If you really wanted to, you could modify a copy using dup, like this:
new_key = report_hash.keys.first.dup
new_key[1] = 'another_user'

Resources