Iterating over big arrays with limited memory and time of execution - ruby

I’m having trouble using Ruby to pass some tests that make the array too big and return an error.
Solution.rb: failed to allocate memory (NoMemoryError)
I have failed to pass it twice.
The problem is about scheduling meetings. The method receives two parameters in order: a matrix with all the first days that investors can meet in the company, and a matrix with all the last days.
For example:
firstDay = [1,5,10]
lastDay = [4,10,10]
This shows that the first investor will be able to find himself between the days 1..4, the second between the days 5..10 and the last one in 10..10.
I need to return the largest number of investors that the company will serve. In this case, all of them can be attended to, the first one on day 1, the second one on day 5, and the last one on day 10.
So far, the code works normally, but with some hidden tests with at least 1000 investors, the error I mentioned earlier appears.
Is there a best practice in Ruby to handle this?
My current code is:
def countMeetings(firstDay, lastDay)
GC::Profiler.enable
GC::Profiler.clear
first = firstDay.sort.first
last = lastDay.sort.last
available = []
#Construct the available days for meetings
firstDay.each_with_index do |d, i|
available.push((firstDay[i]..lastDay[i]).to_a)
end
available = available.flatten.uniq.sort
investors = {}
attended_day = []
attended_investor = []
#Construct a list of investor based in their first and last days
firstDay.each_index do |i|
investors[i+1] = (firstDay[i]..lastDay[i]).to_a
end
for day in available
investors.each do |key, value|
next if attended_investor.include?(key)
if value.include?(day)
next if attended_day.include?(day)
attended_day.push(day)
attended_investor.push(key)
end
end
end
attended_investor.size
end
Using Lazy as far as I could understand, I escaped the MemoryError, but I started receiving a runtime error:
Your code was not executed on time. Allowed time: 10s
And my code look like this:
def countMeetings(firstDay, lastDay)
loop_size = firstDay.size
first = firstDay.sort.first
last = lastDay.sort.last
daily_attendance = {}
(first..last).each do |day|
for ind in 0...loop_size
(firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
next if daily_attendance.has_value?(ind)
if investor_day == day
daily_attendance[day] = ind
end
end
end
end
daily_attendance.size
end
And it went through the cases with few investors. I thought about using multi-thread and the code became the following:
def countMeetings(firstDay, lastDay)
loop_size = firstDay.size
first = firstDay.sort.first
last = lastDay.sort.last
threads = []
daily_attendance = {}
(first..last).lazy.each_slice(25000) do |slice|
slice.each do |day|
threads << Thread.new do
for ind in 0...loop_size
(firstDay[ind]..lastDay[ind]).lazy.each do |investor_day|
next if daily_attendance.has_value?(ind)
if investor_day == day
daily_attendance[day] = ind
end
end
end
end
end
end
threads.each{|t| t.join}
daily_attendance.size
end
Unfortunately, it went back to the MemoryError.

This can be done without consuming any more memory than the range of days. The key is to avoid Arrays and keep things as Enumerators as much as possible.
First, rather than the awkward pair of Arrays that need to be converted into Ranges, pass in an Enumerable of Ranges. This both simplifies the method, and it allows it to be Lazy if the list of ranges is very large. It could be read from a file, fetched from a database or an API, or generated by another lazy enumerator. This saves you from requiring big arrays.
Here's an example using an Array of Ranges.
p count_meetings([(1..4), (5..10), (10..10)])
Or to demonstrate transforming your firstDay and lastDay Arrays into a lazy Enumerable of Ranges...
firstDays = [1,5,10]
lastDays = [4,10,10]
p count_meetings(
firstDays.lazy.zip(lastDays).map { |first,last|
(first..last)
}
)
firstDays.lazy makes everything that comes after lazy. .zip(lastDays) iterates through both Arrays in pairs: [1,4], [5,10], and [10,10]. Then we turn them into Ranges. Because it's lazy it will only map them as needed. This avoids making another big Array.
Now that's fixed, all we need to do is iterate over each Range and increment their attendance for the day.
def count_meetings(attendee_ranges)
# Make a Hash whose default values are 0.
daily_attendance = Hash.new(0)
# For each attendee
attendee_ranges.each { |range|
# For each day they will attend, add one to the attendance for that day.
range.each { |day| daily_attendance[day] += 1 }
}
# Get the day/attendance pair with the maximum value, and only return the value.
daily_attendance.max[1]
end
Memory growth is limited to how big the day range is. If the earliest attendee is on day 1 and the last is on day 1000 daily_attendance is just 1000 entries which is a long time for a conference.
And since you've built the whole Hash anyway, why waste it? Write one function that returns the full attendance, and another that extracts the max.
def count_meeting_attendance(attendee_ranges)
daily_attendance = Hash.new(0)
attendee_ranges.each { |range|
range.each { |day| daily_attendance[day] += 1 }
}
return daily_attendance
end
def max_meeting_attendance(*args)
count_meeting_attendance(*args).max[1]
end
Since this is an exercise and you're stuck with the wonky arguments, we can do the same trick and lazily zip firstDays and lastDays together and turn them into Ranges.
def count_meeting_attendance(firstDays, lastDays)
attendee_ranges = firstDays.lazy.zip(lastDays).map { |first,last|
(first..last)
}
daily_attendance = Hash.new(0)
attendee_ranges.each { |range|
range.each { |day| daily_attendance[day] += 1 }
}
return daily_attendance
end

Related

How to map one array to two in Ruby and perform some function if a condition is met

I’m trying to improve the readability of a piece of code and also make it more concise if possible.
I have an array that needs to be iterated over and if any item matches some criteria I want to collect it and also do some other work ie updating the balance as we iterate if the if condition is met is necessary
need_bananas = []
need_apples = []
balance = 10
array.each do |item|
if need_bananas?(item)
need_bananas << item
elsif need_apples?(item)
need_apples << item
end
balance -= item.amount
end
def need_bananas?(item)
balance >= item.amount
end
def need_apples?(item)
balance < item.amount
end
This feels too cumbersome and there must be a way to make it more concise. I have thoughts around using reduce or partition etc but I can’t settle on a nice solution
Thanks in advance
Is this something that will work for you?
balance = 10
need_bananas, need_apples = array.partition do |item|
(balance -= item.amound) >= 0
end

Simple Ruby Rate Limiting

I am trying to build a very simple rate limit algorithm using an array.
Let's for example use the following rate limit as an example "5 requests every 5 minutes"
I have an array that stores a list of timestamps (where each element is a Time.now) and is added to the array when an API gets called (assuming it's under the rate limit)
I also used a Mutex here so different threads can both share the timestamp resource as well as ensuring there's no race condition happening.
However, I'd like this array to be self-cleaning of sorts. If there are 5 (or more) elements in the array AND one or more of it is outside of the 5 minute interval, it would automatically remove this entry.
And this is sort of where I am stuck on.
I have the following code:
def initialize(max, interval)
#max, #interval = max, interval
#m = Mutex.new
#timestamp = []
end
def validate_rate
#m.synchronize do
if #timestamp.count > #max && self.is_first_ts_expired
#timestamp.shift
if self.rate_count < #max
#timestamp << Time.now
return false
else
return true
end
end
end
end
def is_first_ts_expired
return false if ##timestamp[#name].first.nil? # no logged entries = no expired timestamps
return ##timestamp[#name].first <= Time.now - #interval
end
# Gets the number of requests that are under the allowed interval
def rate_count
count = 0
#timestamp.each { |x|
if x >= Time.now - #interval
count += 1
end
}
count
end
The following is how you will call this simple class. rl.validate_rate will return true if it's under the rate limit, but false if it's above. And ideally it will self-clean the timestamp array when it's greater than the max variable.
rl = RateLimit.new(5, 5.minutes)
raise RateLimitException unless rl.validate_rate do
# stuff
end
I am curious if where I put the "clean up" is_first_ts_expired code is called at the right place?
I think this is a totally valid approach.
Two quick notes:
1) It seems like you're only allowing insertion into the array when there are less than the max number of elements:
if rate_count < #max
#timestamp << Time.now
return true
else
return false
end
However, you're also only clearing out expired elements when there are greater than the number of allowed elements in the array:
if #timestamp.count > max && is_first_ts_expired
#timestamp.shift
I think in order to get this working, you want to remove that first condition when you are checking if you should clear elements from the array. It will look something like this:
if is_first_ts_expired
#timestamp.shift
2) You will only ever clean one item out of your array here:
if is_first_ts_expired
#timestamp.shift
To make this solution more robust, you may want to replace the if with a while so you can clean out multiple expired items. For example:
while is_first_ts_expired do
#timestamp.shift
end
Updated based on comment below:
Since you'll potentially be going through all of the timestamps if the timestamps are all expired, you'll want to slightly modify the is_first_ts_expired to handle an empty timestamp array. Something like this:
def is_first_ts_expired
current_ts = #timestamp.first
current_ts && current_ts <= Time.now - #interval
end

Trying to create nested loops dynamically in Ruby

I currently have the following method:
def generate_lineups(max_salary)
player_combos_by_position = calc_position_combinations
lineups = []
player_combos_by_position[:qb].each do |qb_set|
unless salary_of(qb_set) > max_salary
player_combos_by_position[:rb].each do |rb_set|
unless salary_of(qb_set, rb_set) > max_salary
lineups << create_team_from_sets(qb_set, rb_set)
end
end
end
end
return lineups
end
player_combos_by_position is a hash that contains groupings of players keyed by position:
{ qb: [[player1, player2], [player6, player7]], rb: [[player3, player4, player5], [player8, player9, player10]] }
salary_of() takes the sets of players and calculates their total salary.
create_team_from_sets() takes sets of players and returns a new Team of the players
Ideally I want to remove the hardcoded nested loops as I do not know which positions will be available. I think recursion is the answer, but I'm having a hard time wrapping my head around the solution. Any ideas would be greatly appreciated.
Some answers have recommended the use of Array#product. This is normally an elegant solution however I'm dealing with very large sets of data (there's about 161,000 combinations of WRs and about 5000 combinations of RBs to form together alone). In my loops I use the unless salary_of(qb_set, rb_set) > max_salary check to avoid making unnecessary calculations as this weeds out quite a few. I cannot do this using Array#product and therefore the combinations take very long times to put together. I'm looking for away to rule out combinations early and save on computer cycles.
You can use Array#product to get all the possible lineups and then select the ones that are within budget. This allows for variable number of positions.
first_pos, *rest = player_combos_by_position.values
all_lineups = first_pos.product(*rest)
#=> all possible lineups
lineups = all_lineups.
# select lineups within budget
select{|l| salary_of(*l) <= max_salary}.
# create teams from selected lineups
map{|l| create_team_from_sets(*l) }
Other option: Recursive Method (not tested but should get you started)
def generate_lineups(player_groups,max_salary)
first, *rest = player_groups
lineups = []
first.each do |player_group|
next if salary_of(player_group) > max_salary
if rest.blank?
lineups << player_group
else
generate_lineups(rest,max_salary).each do |lineup|
new_lineup = create_team_from_sets(player_group, *lineup)
lineups << new_lineup unless salary_of(*new_lineup) > max_salary
end
end
end
return lineups
end
Usage:
lineups = generate_lineups(player_combos_by_position.values,max_salary)
After reading your edit, I see your problem. Here I've modified my code to show you how you could impose a salary limit for each combination for each position group, as well as for the entire team. Does this help? You may want to consider putting your data in a database and using Rails.
team_max_salary = 300
players = {player1: {position: :qb, salary: 15, rating: 9}, player2: {postion: :rb, salary: 6, rating: 6},...}
group_info = {qb: {nplayers: 2, max_salary: 50}, rb: {nplayers: 2, max_salary: 50}, ... }
groups = group_info.keys
players_by_group = {}
groups.each {|g| players_by_group[g] = []}
players.each {|p| players_by_group[p.position] << p}
combinations_for_team = []
groups.each do |g|
combinations_by_group = players_by_group[g].combinations(group_info[g][:nplayers]).select {|c| salary(c) <= group_info[g][:max_salary]}
# Possibly employ other criteria here to further trim combinations_by_group
combinations_for_team = combinations_for_team.product(combinations_by_group).flatten(1).select {|c| salary(c) <= team_max_salary}
end
I may be missing a flatten(1). Note I've made the player keys symbols (e.g., :AaronRogers`), but you could of course use strings instead.

converting a hours from a time stamp into a hash listing the hour and frequency in Ruby

Ok, so I'm pretty new at this, I hope I explain this correctly. I'm using Ruby, and I have a program which takes a CSV file and performs some various functions on it. What I'm concerned with here is the TIME portion. I took a column of data which was a string, and used this method to convert it to DateTime and give me just the hour part:
def hour_reg(regdate)
regdate.to_s
time_stamp = DateTime.strptime("#{regdate}", "%m/%d/%y %H:%M").hour
time_stamp
end
that part works fine. so now I'm trying to take that HOUR that I just got, and convert that into a HASH which displays the Hour of the day (1 through 24), and how many times each hour comes up. For example, if the hour 1 came up (for 1AM) 3 separate times, it would display: {1 => 3} in the hash. here's what the code looks like that iterates through the column of TIMES, indicated by ":regdate"
contents.each do |row|
id = row[0]
name = row[:first_name]
zipcode = clean_zipcode(row[:zipcode])
**reg_time = hour_reg(row[:regdate])**
end
Basically I want the frequency of each hour. can anyone help with this? I'm having a great deal of trouble
You will need to create a Hash with 1-24 keys, initialized to 0.
h = { 1 => 0, 2 => 0, ...}
Then do this to increment the hash. I'm assuming the hour_reg method returns an integer.
h[hour_reg(row[:regdate])] += 1
Also you can simplify your hour_reg method to:
def hour_reg(regdate)
DateTime.strptime("#{regdate}", "%m/%d/%y %H:%M").hour
end
Updating my answer to reflect the discussion in comments:
#get contents from CSV file
contents = CSV.open 'event_attendees.csv', headers: true, header_converters: :symbol
# create Hash h with 1-24 keys initialized to 0
h = {}
(1..24).each {|x| h[x] = 0}
contents.each do |row|
reg_time = hour_reg(row[:regdate]).to_i
h[reg_time] += 1
end
The hour frequency is stored in the "h" hash.
You can simplify the above "contents" block to a single line if you want:
contents.each do {|row| h[hour_reg(row[:regdate]).to_i] += 1}

optimize this ruby code, switch arrays to sets/hash?

I need to optimize this code. Any suggestions to make it go faster, please tell me. I don't have a specific amount that I want it to go faster, any suggestion would be helpful. In terms of complexity I want to keep it below O(n^2)
I'm wondering if trying to convert the array that I'm using into like a set or hash because that is quicker right? How much faster in terms of complexity might this allow me to run?
The main problem I think might be my use of the ruby combination function which runs pretty slow, does anyone know exactly the complexity for this ruby function? is there a faster alternative to this?
the point of this code is basically to find the single point that is the shortest combined distance from all the other points ie (the friends house that is most convenient for everyone to go to). there is a little extra code here which has some debugging/printing functions.
class Point
attr_accessor :x, :y, :distance, :done, :count
def initialize(x,y)
#x = x
#y = y
#distance = 0
#closestPoint = []
#done = false
#count = 0
end
end
class Edge
attr_accessor :edge1, :edge2, :weight
def initialize(edge1,edge2,weight)
#edge1 = edge1
#edge2 = edge2
#weight = weight
end
end
class AdjacencyList
attr_accessor :name, :minSumList, :current
def initialize(name)
#name = name
#minSumList = []
#current = nil
#vList = []
#edgeList = []
end
def addVertex(vertex)
#vList.push(vertex)
end
def generateEdges2
minSumNode = nil
current = nil
last = nil
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
edge = Edge.new(vertex1,vertex2,distance)
if (current == nil)
current = vertex1
minSumNode = vertex1
end
vertex1.distance += distance
vertex2.distance += distance
vertex1.count += 1
vertex2.count += 1
if (vertex1.count == #vList.length-1)
vertex1.done = true
elsif (vertex2.count == #vList.length-1)
vertex2.done = true
end
if ((vertex1.distance < minSumNode.distance) && (vertex1.done == true))
minSumNode = vertex1
end
##edgeList.push(edge)
}
return minSumNode.distance
end
def generateEdges
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
#edgeList.push(Edge.new(vertex1,vertex2,distance))
}
end
def printEdges
#edgeList.each {|edge| puts "(#{edge.edge1.x},#{edge.edge1.y}) <=> (#{edge.edge2.x},#{edge.edge2.y}) weight: #{edge.weight}"}
end
def printDistances
#vList.each {|v| puts "(#{v.x},#{v.y} distance = #{v.distance})"}
end
end
def distance2points(point1,point2)
xdistance = (point1.x - point2.x).abs
ydistance = (point1.y - point2.y).abs
total_raw = xdistance + ydistance
return totaldistance = total_raw - [xdistance,ydistance].min
end
#pointtest1 = Point.new(0,1)
#pointtest2 = Point.new(2,5)
#pointtest3 = Point.new(3,1)
#pointtest4 = Point.new(4,0)
graph = AdjacencyList.new("graph1")
gets
while (line = gets)
graph.addVertex(Point.new(line.split[0].to_i,line.split[1].to_i))
end
#graph.addVertex(pointtest1)
#graph.addVertex(pointtest2)
#graph.addVertex(pointtest3)
#graph.addVertex(pointtest4)
puts graph.generateEdges2
#graph.printEdges
#graph.printDistances
Try to do this, and then post some more code:
ruby -rprofile your_script your_args
This will run the script under the profiler, and generate a nice table with results. If you post that here, it's more likely to get better help. Plus, you will have a more exact idea of what's consuming your CPU cycles.
Sets are basically hashes, and the advantage of hashes over arrays is O(1) find operations. Since you are simply iterating over the entire array, hashes will not offer any speed improvements if you simply replace the arrays with hashes.
Your real problem is that the running time of your algorithm is O(n^2), as in given a set of n points it will have to perform n^2 operations since you're matching every point with every other possible point.
This can be somewhat improved using hashes to cache values. For example, lets say you want the distance between point "a" and point "b". You could have a hash #distances which stores #distances["a,b"] = 52 (of course you'll have to be smart about what to use as the key). Basically just try to remove redundant operations wherever you can.
That said, the largest speed boost would be from a smarter algorithm, but I can't think of something applicable off the top of my head right now.
There's something many people know, and it won't cost you anything.
While you're trying to guess how to make the code faster, or scouring the internet for some kind of profiler, just run the program under the debugger and interrupt it while it's being slow.
Do it several times, and each time take careful note of what it's doing and why.
Here's an example in python.
The slower it is, the more obvious the problem will be.

Resources