Looking up an array of keys on an array of hashes in ruby like excel vlookup - ruby

This post is very similar to my previous one, but the data structures are different here:
Joining an array of keys to a hash with key value pairs like excel vlookup
My data from my Mysql2::Result comes back like this array of hashes:
data = [{"isbn" => "1234", "title"=>"apple"},{"isbn" => "5678", "title"=>"banana"},{"isbn" => "2121", "title"=>"car"}]
And my original list of isbns that I would like to compare is this array:
isbns = ["1234","2121", "5454", "5678"]
I'm seeking a function which uses the isbns array and returns a result like this:
result = [{"isbn"=>"1234","title"=>"apple"}, {"isbn"=> "2121", "title"=>"car"}, nil, {"isbn"=>"5678","title"=>"banana"}]
The "driving" array is the isbns... imagine doing a vlookup from isbns to data ... any items that are not in data, but in isbns should return nil. The original order of isbns should be returned, and the return data should be an array of hashes.

isbns.map { |isbn| data.find { |h| h["isbn"] == isbn} }
#=> [{"isbn"=>"1234", "title"=>"apple"}, {"isbn"=>"2121", "title"=>"car"}, nil, {"isbn"=>"5678", "title"=>"banana"}]

#Michael Kohl's answer is succinct and correct. However if these data sets are big, it's inefficient O(n*m/2). An alternative is to transform the data vector into a hash in O(m) then do the map in O(n) for a runtime of O(n+m).
data_lookup = data.inject({}) {|m,v| m[v["isbn"]] = v; m} # O(data.size)
result = isbns.map { |isbn| data_lookup[isbn] } # O(isbns.size)
If your data and isbn collections were of size 1000 each, this would be faster by a factor of 250.

Related

Ruby: Hash w/ Arrays, Returning Associated Key If Value Is In Array

New to Ruby and have run out of ideas. I have an array of books that I would like to 1) Shelve 2) Find which shelf it is on 3) Remove it from the associated shelf if found. For brevity I have an array of 6 books. Each shelf contains 5 books.
library_catalog = [ "Book1", "Book2", "Book3", "Book4", "Book5", "Book6" ]
shelves = Hash.new(0)
catalog_slice = library_catalog.each_slice(5).to_a
count = 1
catalog_slice.each do | x |
shelves.merge!(count=>x)
count+=1
end
From this I know have a Hash w/ arrays as such
{1=>["Book1", "Book2", "Book3", "Book4", "Book5"], 2=>["Book6"]}
This is where I'm having trouble traversing the hash to find a match inside the array and return the key(shelf). If I have title = "Book1" and I am trying to match and return 1, how would I go about this?
I think this should work.
shelves.select { |k,v| v.include?("Book1")}.keys.first
selected the hashes that have a value equal to the title you are looking for (in this case "Book1")
get the keys for these hashes as an array
get the first entry in the array.
to remove the Book from the shelf try this:
key = shelves.select { |k,v| v.include?("Book1")}.keys.first
shelves[key].reject! { |b| b == "Book1" }
get a reference to the array and then reject the entry you want to remove

Combining multiple array/hash selects

I have the following code:
sum = array_of_hashes.select{ |key| (date_range).include? Date.parse(key[:created])}.map { |h| h[:amount] }.sum
size = array_of_hashes.select{ |key| (date_range).include? Date.parse(key[:created])}.size
total = sum / size
sum selects all hashes with a date that is inside a date range and the adds up all the values of the :amount key.
size counts the number of hashes that are in the date range.
total divides the sum by the size.
How can I combine those so it's not 3 separate items?
I think it's as simple as:
selected = array_of_hashes.select { ... }
avarage = selected.map { ... }.sum / selected.size
Note: using include? with ranges of dates is pretty inefficient since it needs to traverse the whole dates range, I suggest to use cover? instead.
There is really no nice way of doing this more compact. One alternative could be the following:
average = (selected = array_of_hashes.select { ... }.map { ... }).sum/selected.size.to_f

How to get rows as Arrays (not Hashes) in Sequel ORM?

In the Sequel ORM for Ruby, the Dataset class has an all method which produces an Array of row hashes: each row is a Hash with column names as keys.
For example, given a table T:
a b c
--------------
0 22 "Abe"
1 35 "Betty"
2 58 "Chris"
then:
ds = DB['select a, b, c from T']
ah = ds.all # Array of row Hashes
should produce:
[{"a":0,"b":22,"c":"Abe"},{"a":1,"b":35,"c":"Betty"},{"a":2,"b":58,"c":"Chris"}]
Is there a way built in to Sequel to instead produce an Array of row Arrays, where each row is an array of only the values in each row in the order specified in the query? Sort of how select_rows works in ActiveRecord? Something like this:
aa = ds.rows # Array of row Arrays
which would produce:
[[0,22,"Abe"],[1,35,"Betty"],[2,58,"Chris"]]
Note: the expression:
aa = ds.map { |h| h.values }
produces an array of arrays, but the order of values in the rows is NOT guaranteed to match the order requested in the original query. In this example, aa might look like:
[["Abe",0,22],["Betty",1,35],["Chris",2,58]]
Old versions of Sequel (pre 2.0) had the ability in some adapters to return arrays instead of hashes. But it caused numerous issues, nobody used it, and I didn't want to maintain it, so it was removed. If you really want arrays, you need to drop down to the connection level and use a connection specific method:
DB.synchronize do |conn|
rows = conn.exec('SQL Here') # Hypothetical example code
end
The actual code you need will depend on the adapter you are using.
DB[:table].where().select_map(:id)
If you want just an array of array of values...
DB['select * from T'].map { |h| h.values }
seems to work
UPDATE given the updated requirement of the column order matching the query order...
cols= [:a, :c, :b]
DB[:T].select{cols}.collect{ |h| cols.collect {|c| h[c]}}
not very pretty but guaranteed order is the same as the select order.
There does not appear to be a builtin to do this.
You could make a request for the feature.
I haven't yet found a built-in method to return an array of row arrays where the values in the row arrays are ordered by the column order in the original query. The following function does* although I suspect an internal method could be more effecient:
def rows( ds )
ret = []
column_keys = ds.columns # guaranteed to match query order?
ds.all { |row_hash|
row_array = []
column_keys.map { |column_key| row_array << row_hash[column_key] }
ret << row_array
}
ret
end
*This function depends on the order of the array returned by Dataset.columns. If this order is undefined, then this rows function isn't very useful.
have you tried this?
ds = DB['select a, b, c from T'].to_a
not sure it it works but give it a shot.

increment value in a hash

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Lua - Sorting a table alphabetically

I have a table that is filled with random content that a user enters. I want my users to be able to rapidly search through this table, and one way of facilitating their search is by sorting the table alphabetically. Originally, the table looked something like this:
myTable = {
Zebra = "black and white",
Apple = "I love them!",
Coin = "25cents"
}
I was able to implement a pairsByKeys() function which allowed me to output the tables contents in alphabetical order, but not to store them that way. Because of the way the searching is setup, the table itself needs to be in alphabetical order.
function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do
table.insert(a, n)
end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then
return nil
else
return a[i], t[a[i]]
end
end
return iter
end
After a time I came to understand (perhaps incorrectly - you tell me) that non-numerically indexed tables cannot be sorted alphabetically. So then I started thinking of ways around that - one way I thought of is sorting the table and then putting each value into a numerically indexed array, something like below:
myTable = {
[1] = { Apple = "I love them!" },
[2] = { Coin = "25cents" },
[3] = { Zebra = "black and white" },
}
In principle, I feel this should work, but for some reason I am having difficulty with it. My table does not appear to be sorting. Here is the function I use, with the above function, to sort the table:
SortFunc = function ()
local newtbl = {}
local t = {}
for title,value in pairsByKeys(myTable) do
newtbl[title] = value
tinsert(t,newtbl[title])
end
myTable = t
end
myTable still does not end up being sorted. Why?
Lua's table can be hybrid. For numerical keys, starting at 1, it uses a vector and for other keys it uses a hash.
For example, {1="foo", 2="bar", 4="hey", my="name"}
1 & 2, will be placed in a vector, 4 & my will be placed in a hashtable. 4 broke the sequence and that's the reason for including it into the hashtable.
For information on how to sort Lua's table take a look here: 19.3 - Sort
Your new table needs consecutive integer keys and needs values themselves to be tables. So you want something on this order:
SortFunc = function (myTable)
local t = {}
for title,value in pairsByKeys(myTable) do
table.insert(t, { title = title, value = value })
end
myTable = t
return myTable
end
This assumes that pairsByKeys does what I think it does...

Resources