How to improve enumerable find in ruby? - ruby

I am making a request to a webservice API and I have several ids pointing to a location.
In order to re assign the results from the API which contains only the id (sid).
I have the following code:
Location.all.each do |city|
accommodations = Accommodation.within_distance(city.lonlat, [1, 2])
lookups << LookUp.new(city.id, accommodations.select(:supplier,:sid).to_a.map(&:serializable_hash))
end
after the webservice call I try re assigning results ids (sid's) to cities:
results = call_to_api
lookups.each do | lup|
res << {:city=> lup.city, :accommodations => lup.accommodations.map{ |j|
results.find { |i|
i.sid == j['sid']
}
}
}
end
The lookups iteration in incredibly slow and takes up to 50s for just 4000 entries.
So how can I improve from a performance point of view?

Imagine you have three lookups that all have accomodations A, B, and C.
The way it is done now, the first lookup will perform the map and search for A, B, and C.
the second lookup will perform the map and search for A, B, and C.
And so on. Given the basic nature of the search criteria, it doesn't look like the results for accomodation A is really going to change between different lookups in the same collection.
In that case I would consider caching the results of each sid search and if you ever have an accomodation with the same sid, just pull it from the cache.
For example, something like
cache = {}
if cache.include?(yourSID)
// use cache[yourSID]
else
mappings = //doYourMappingHere
// cache it for future use. Might need to dup
cache[yourSID] = mappings
end
Of course this is under the assumption that the same accomodation appears several times.

so here comes the boost
ids = Rails.cache.fetch(:ids) {
ids = {}
Location.all.each do |city|
Accommodation.within_distance(city.lonlat, [1, 2]).each do |acc|
if acc.supplier == 2
if ids.include? acc.id
ids[acc.sid] << city.attributes
else
ids[acc.sid] = [city.attributes]
end
end
end
end
ids
}
results = Rails.cache.fetch(:results) {
results = api.rates_by_ids(ids.keys)
}
p results.size
accommodations_in_cities={}
results.each do |res|
ids[res.sid].each do |city|
if accommodations_in_cities.include? city['id']
accommodations_in_cities[city['id']] << res
else
accommodations_in_cities[city['id']] = [res]
end
end
end
accommodations_in_cities
end

Related

Iterating over big arrays with limited memory and time of execution

I’m having trouble using Ruby to pass some tests that make the array too big and return an error.
Solution.rb: failed to allocate memory (NoMemoryError)
I have failed to pass it twice.
The problem is about scheduling meetings. The method receives two parameters in order: a matrix with all the first days that investors can meet in the company, and a matrix with all the last days.
For example:
firstDay = [1,5,10]
lastDay = [4,10,10]
This shows that the first investor will be able to find himself between the days 1..4, the second between the days 5..10 and the last one in 10..10.
I need to return the largest number of investors that the company will serve. In this case, all of them can be attended to, the first one on day 1, the second one on day 5, and the last one on day 10.
So far, the code works normally, but with some hidden tests with at least 1000 investors, the error I mentioned earlier appears.
Is there a best practice in Ruby to handle this?
My current code is:
def countMeetings(firstDay, lastDay)
GC::Profiler.enable
GC::Profiler.clear
first = firstDay.sort.first
last = lastDay.sort.last
available = []
#Construct the available days for meetings
firstDay.each_with_index do |d, i|
available.push((firstDay[i]..lastDay[i]).to_a)
end
available = available.flatten.uniq.sort
investors = {}
attended_day = []
attended_investor = []
#Construct a list of investor based in their first and last days
firstDay.each_index do |i|
investors[i+1] = (firstDay[i]..lastDay[i]).to_a
end
for day in available
investors.each do |key, value|
next if attended_investor.include?(key)
if value.include?(day)
next if attended_day.include?(day)
attended_day.push(day)
attended_investor.push(key)
end
end
end
attended_investor.size
end
Using Lazy as far as I could understand, I escaped the MemoryError, but I started receiving a runtime error:
Your code was not executed on time. Allowed time: 10s
And my code look like this:
def countMeetings(firstDay, lastDay)
loop_size = firstDay.size
first = firstDay.sort.first
last = lastDay.sort.last
daily_attendance = {}
(first..last).each do |day|
for ind in 0...loop_size
(firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
next if daily_attendance.has_value?(ind)
if investor_day == day
daily_attendance[day] = ind
end
end
end
end
daily_attendance.size
end
And it went through the cases with few investors. I thought about using multi-thread and the code became the following:
def countMeetings(firstDay, lastDay)
loop_size = firstDay.size
first = firstDay.sort.first
last = lastDay.sort.last
threads = []
daily_attendance = {}
(first..last).lazy.each_slice(25000) do |slice|
slice.each do |day|
threads << Thread.new do
for ind in 0...loop_size
(firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
next if daily_attendance.has_value?(ind)
if investor_day == day
daily_attendance[day] = ind
end
end
end
end
end
end
threads.each{|t| t.join}
daily_attendance.size
end
Unfortunately, it went back to the MemoryError.
This can be done without consuming any more memory than the range of days. The key is to avoid Arrays and keep things as Enumerators as much as possible.
First, rather than the awkward pair of Arrays that need to be converted into Ranges, pass in an Enumerable of Ranges. This both simplifies the method, and it allows it to be Lazy if the list of ranges is very large. It could be read from a file, fetched from a database or an API, or generated by another lazy enumerator. This saves you from requiring big arrays.
Here's an example using an Array of Ranges.
p count_meetings([(1..4), (5..10), (10..10)])
Or to demonstrate transforming your firstDay and lastDay Arrays into a lazy Enumerable of Ranges...
firstDays = [1,5,10]
lastDays = [4,10,10]
p count_meetings(
firstDays.lazy.zip(lastDays).map { |first,last|
(first..last)
}
)
firstDays.lazy makes everything that comes after lazy. .zip(lastDays) iterates through both Arrays in pairs: [1,4], [5,10], and [10,10]. Then we turn them into Ranges. Because it's lazy it will only map them as needed. This avoids making another big Array.
Now that's fixed, all we need to do is iterate over each Range and increment their attendance for the day.
def count_meetings(attendee_ranges)
# Make a Hash whose default values are 0.
daily_attendance = Hash.new(0)
# For each attendee
attendee_ranges.each { |range|
# For each day they will attend, add one to the attendance for that day.
range.each { |day| daily_attendance[day] += 1 }
}
# Get the day/attendance pair with the maximum value, and only return the value.
daily_attendance.max[1]
end
Memory growth is limited to how big the day range is. If the earliest attendee is on day 1 and the last is on day 1000 daily_attendance is just 1000 entries which is a long time for a conference.
And since you've built the whole Hash anyway, why waste it? Write one function that returns the full attendance, and another that extracts the max.
def count_meeting_attendance(attendee_ranges)
daily_attendance = Hash.new(0)
attendee_ranges.each { |range|
range.each { |day| daily_attendance[day] += 1 }
}
return daily_attendance
end
def max_meeting_attendance(*args)
count_meeting_attendance(*args).max[1]
end
Since this is an exercise and you're stuck with the wonky arguments, we can do the same trick and lazily zip firstDays and lastDays together and turn them into Ranges.
def count_meeting_attendance(firstDays, lastDays)
attendee_ranges = firstDays.lazy.zip(lastDays).map { |first,last|
(first..last)
}
daily_attendance = Hash.new(0)
attendee_ranges.each { |range|
range.each { |day| daily_attendance[day] += 1 }
}
return daily_attendance
end

How can I collect block values in Ruby without an iterator?

Here's some code:
i = 0
collection = []
loop do
i += 1
break if complicated_predicate_of(i)
collection << i
end
collection
I don't know in advance how many times I'll need to iterate; that depends on complicated_predicate_of(i). I could do something like 0.upto(Float::INFINITY).times.collect do ... end but that's pretty ugly.
I'd like to do this:
i = 0
collection = loop.collect do
i += 1
break if complicated_predicate_of(i)
i
end
But, though it's not a syntax error for some reason, loop.collect doesn't seem to collect anything. (Neither does loop.reduce). collection is nil at the end of the statement.
In other words, I want to collect the values of a loop statement without an explicit iterator. Is there some way to achieve this?
You could write
collection = 1.step.take_while do |i|
i <= 3 # this block needs to return *false* to stop the taking
end
Whatever solution you choose in the end, remember that you can always opt to introduce a helper method with a self-explanatory name. Especially if you need to collect numbers like this in many places in your source code.
Say you wanted to hide the intricate bowels of your solution above, then this could be your helper method:
def numbers_until(&block)
i = 0
collection = []
loop do
i += 1
break if yield i
collection << i
end
collection
end
collection = numbers_until do |i|
i > 3 # this block needs to return *true* to stop the taking
end
You could write
def complicated_predicate_of(i)
i > 3
end
1.step.with_object([]) { |i,collection| complicated_predicate_of(i) ?
(break collection) : collection << i }
#=> [1, 2, 3]

Trying to create nested loops dynamically in Ruby

I currently have the following method:
def generate_lineups(max_salary)
player_combos_by_position = calc_position_combinations
lineups = []
player_combos_by_position[:qb].each do |qb_set|
unless salary_of(qb_set) > max_salary
player_combos_by_position[:rb].each do |rb_set|
unless salary_of(qb_set, rb_set) > max_salary
lineups << create_team_from_sets(qb_set, rb_set)
end
end
end
end
return lineups
end
player_combos_by_position is a hash that contains groupings of players keyed by position:
{ qb: [[player1, player2], [player6, player7]], rb: [[player3, player4, player5], [player8, player9, player10]] }
salary_of() takes the sets of players and calculates their total salary.
create_team_from_sets() takes sets of players and returns a new Team of the players
Ideally I want to remove the hardcoded nested loops as I do not know which positions will be available. I think recursion is the answer, but I'm having a hard time wrapping my head around the solution. Any ideas would be greatly appreciated.
Some answers have recommended the use of Array#product. This is normally an elegant solution however I'm dealing with very large sets of data (there's about 161,000 combinations of WRs and about 5000 combinations of RBs to form together alone). In my loops I use the unless salary_of(qb_set, rb_set) > max_salary check to avoid making unnecessary calculations as this weeds out quite a few. I cannot do this using Array#product and therefore the combinations take very long times to put together. I'm looking for away to rule out combinations early and save on computer cycles.
You can use Array#product to get all the possible lineups and then select the ones that are within budget. This allows for variable number of positions.
first_pos, *rest = player_combos_by_position.values
all_lineups = first_pos.product(*rest)
#=> all possible lineups
lineups = all_lineups.
# select lineups within budget
select{|l| salary_of(*l) <= max_salary}.
# create teams from selected lineups
map{|l| create_team_from_sets(*l) }
Other option: Recursive Method (not tested but should get you started)
def generate_lineups(player_groups,max_salary)
first, *rest = player_groups
lineups = []
first.each do |player_group|
next if salary_of(player_group) > max_salary
if rest.blank?
lineups << player_group
else
generate_lineups(rest,max_salary).each do |lineup|
new_lineup = create_team_from_sets(player_group, *lineup)
lineups << new_lineup unless salary_of(*new_lineup) > max_salary
end
end
end
return lineups
end
Usage:
lineups = generate_lineups(player_combos_by_position.values,max_salary)
After reading your edit, I see your problem. Here I've modified my code to show you how you could impose a salary limit for each combination for each position group, as well as for the entire team. Does this help? You may want to consider putting your data in a database and using Rails.
team_max_salary = 300
players = {player1: {position: :qb, salary: 15, rating: 9}, player2: {postion: :rb, salary: 6, rating: 6},...}
group_info = {qb: {nplayers: 2, max_salary: 50}, rb: {nplayers: 2, max_salary: 50}, ... }
groups = group_info.keys
players_by_group = {}
groups.each {|g| players_by_group[g] = []}
players.each {|p| players_by_group[p.position] << p}
combinations_for_team = []
groups.each do |g|
combinations_by_group = players_by_group[g].combinations(group_info[g][:nplayers]).select {|c| salary(c) <= group_info[g][:max_salary]}
# Possibly employ other criteria here to further trim combinations_by_group
combinations_for_team = combinations_for_team.product(combinations_by_group).flatten(1).select {|c| salary(c) <= team_max_salary}
end
I may be missing a flatten(1). Note I've made the player keys symbols (e.g., :AaronRogers`), but you could of course use strings instead.

Subtraction of two arrays with incremental indexes of the other array to a maximum limit

I have lots of math to do on lots of data but it's all based on a few base templates. So instead of say, when doing math between 2 arrays I do this:
results = [a[0]-b[1],a[1]-b[2],a[2]-b[3]]
I want to instead just put the base template: a[0]-b[1] and make it automatically fill say 50 places in the results array. So I don't always have to manually type it.
What would be the ways to do that? And would a good way be to create 1 method that does this automatically. And I just tell it the math and it fills out an array?
I have no clue, I'm really new to programming.
a = [2,3,4]
b = [1,2,3,4]
results = a.zip(b.drop(1)).take(50).map { |v,w| v - w }
Custom
a = [2,3,4..............,1000]
b = [1,2,3,4,.............900]
class Array
def self.calculate_difference(arr1,arr2,limit)
begin
result ||= Array.new
limit.send(:times) {|index| result << arr1[index]-arr2[index+=1]}
result
rescue
raise "Index/Limit Error"
end
end
end
Call by:
Array.calculate_difference(a,b,50)

How to rewrite this Ruby loop in a cleaner fashion

I'm implementing a loop in Ruby, but it looks ugly and I wonder if there's a neater, more Ruby-like way of writing it:
def get_all_items
items = []; page = 1; page_items = nil
while page_items != [] # Loop runs until no more items are received
items += (page_items = get_page_items(page))
page += 1
end
items
end
Note that the get_page_items method runs a HTTP request to get the items for the page, and there is no way of knowing the number of pages, or the total number of items, or the number of items for any page before actually executing the requests in order until one of them returns an empty item set.
Imagine leafing through a catalog and writing down all the products, without knowing in advance how many pages it has, or how many products there are.
I think that this particular problem is compounded because A) there's no API for getting the total number of items and B) the response from get_page_items is always truthy. Further, it doesn't make sense for you to iteratively call a method that is surely making individual requests to your DB with an arbitrary limit, only to concatenate them together. You should, at the risk of repeating yourself, implement this method to prompt a DB query (i.e. model.all).
Normally when you are defining an empty collection, iterating and transforming a set, and then returning a result, you should be using reduce (a.k.a inject):
array.reduce(0) { |result, item| result + item } # a quick sum
Your need to do a form of streaming in this same process makes this difficult without tapping into Enumerable. I find this to be a good compromise that is much more readable, even if a bit distasteful in fondling this items variable too much:
items = []
begin
items << page_items = get_page_items(page ||= 1)
page += 1
end until page_items.empty?
items.flatten
Here's how I'd have written it. You'll see it's actually more lines, but it's easier to read and more Rubyish.
def get_all_items
items = []
page = 1
page_items = get_page_items page
until page_items.empty? # Loop runs until no more items are received
items += page_items
page += 1
page_items = get_page_items page
end
items
end
You could also implement get_page_items as an Enumerator which would eliminate the awkward page += 1 pattern but that might be overkill.
I don't know that this is any better, but it does have a couple of Ruby-isms in it:
def get_all_items
items = []; n = 0; page = 1
while items.push(*get_page_items(page)).length > n
page += 1
n = items.length
end
end
I would use this solution, which is a good compromise between readability and length:
def get_all_items
[].tap do |items|
page = 0
until (page_items = get_page_items(page)).empty?
items << page_items
page += 1
end
end
end
The short version, just for fun ;-)
i=[]; p=0; loop { i+=get_page_items(p+=1).tap { |r| return i if r.empty? } }
I wanted to write a functional solution which would closely resemble the task you want to achieve.
I'd say that your solution comes down to this:
For all page numbers from 1 on, you get the corresponding list of
items; Take lists while they are not empty, and join them into a
single array.
Sounds ok?
Now let's try to translate this, almost literally, to Ruby:
(1..Float::INFINITY). # For all page numbers from 1 on
map{|page| get_page_items page}. # get the corresponding list of items
take_while{|items| !items.empty?}. # Take lists while they are not empty
inject(&:+) # and join them into a single array.
Unfortunately, the above code won't work right away, as Ruby's map is not lazy, i.e. it would try to evaluate on all members of the infinite range first, before our take_while had the chance to peek at the values.
However, implementing a lazy map is not that hard at all, and it could be useful for other stuff. Here's one straightforward implementation, along with nice examples in the blog post.
module Enumerable
def lazy_map
Enumerator.new do |yielder|
self.each do |value|
yielder.yield(yield value)
end
end
end
end
Along with a mockup of your actual HTTP call, which returns arrays of random length between 0 and 4:
# This simulates actual HTTP call, sometimes returning an empty array
def get_page_items page
(1..rand(5)).to_a
end
Now we have all the needed parts to solve our problem easily:
(1..Float::INFINITY). # For all page numbers from 1 on
lazy_map{|page| get_page_items page}. # get the corresponding list of items
take_while{|items| !items.empty?}. # Take lists while they are not empty
inject(&:+) # and join them into a single array.
#=> [1, 1, 2, 3, 1]
It's a small (and almost entirely cosmetic) tweak, but one option would be to replace while page_items != [] with until page_items.empty?. It's a little more "Ruby-ish," in my opinion, which is what you're asking about.
def get_all_items
items = []; page = 0
items << page_items while (page_items = get_page_items(page += 1))
items
end

Resources