Find the time period with the maximum number of overlapping intervals - algorithm

There is one very famous problem. I am asking the same here.
There is number of elephants time span given, here time span means, year of birth to year of death.
You have to calculate the period where maximum number of elephants are alive.
Example:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Answer is 1995 - 1999
I tried hard to solve this, but I am unable to do so.
How can I solve this problem?
I got approach for when a user asks to find the number of elephants in any year. I solved that by using segment tree, whenever any elephants time span given, increase every year of that time span by 1. We can solve that in this way. Can this be used to solve the above problem?
For above question, I only need the high-level approach, I will code it myself.

Split each date range into start date and end date.
Sort the dates. If a start date and an end date are the same, put the end date first (otherwise you could get an empty date range as the best).
Start with a count of 0.
Iterate through the dates using a sweep-line algorithm:
If you get a start date:
Increment the count.
If the current count is higher than the last best count, set the count, store this start date and set a flag.
If you get an end date:
If the flag is set, store the stored start date and this end date with the count as the best interval so far.
Reset the flag.
Decrement the count.
Example:
For input:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Split and sorted: (S = start, E = end)
1990 S, 1992 S, 1995 S, 1999 E, 2000 E, 2010 S, 2013 E, 2020 E
Iterating through them:
count = 0
lastStart = N/A
1990: count = 1
count = 1 > 0, so set flag
and lastStart = 1990
1992: count = 2
count = 2 > 0, so set flag
and lastStart = 1992
1995: count = 3
count = 3 > 0, so set flag
and lastStart = 1995
1999: flag is set, so
record [lastStart (= 1995), 1999] with a count of 3
reset flag
count = 2
2000: flag is not set
reset flag
count = 1
2010: count = 2
since count = 2 < 3, don't set flag
2013: flag is not set
reset flag
count = 1
2020: flag is not set
reset flag
count = 0

How about this?
Say I have all the above data stored in a file. Read it into two arrays separated by the " - ".
Hence, now I have birthYear[] which contains all the birth years and deathYear[] containing all the death years.
so birthYear[] = [1990, 1995, 2010, 1992]
deathYear[] = [2013, 2000, 2020, 1999]
Get the min birth year and the max death year. Create a Hashtable with the Key as a year, and the Value as the count.
Hence,
HashTable<String, Integer> numOfElephantsAlive = new HashTable<String, Integer>();
Now, from the min(BirthYear) to the max(BirthYear), do the following :
Iterate through the Birth Year Array and do an add to the HashTable all the years in between the BirthYear and Corresponding DeathYear with the count being 1. If the key already exists, add 1 to it. Hence, for the last case :
1992 - 1999
HashTable.put(1992, 1)
HashTable.put(1993, 1)
and so on for every year.
Say, for example, you have a Hashtable that looks like this at the end of it:
Key Value
1995 3
1996 3
1992 2
1993 1
1994 3
1998 1
1997 2
1999 2
Now, you need the range of the Years when the number of elephants were maximum. Hence, let's iterate and find the year with the max value. This is pretty easy. Iterate over the keySet() and get the year.
Now, you need a contiguous range of years. You can either do this in two ways:
Do Collections.sort() over the keySet() and when you hit the max value, save all contiguous locations.
Hence, on hitting 3 for our example at 1994, we would check for all the following years with a 3. This will return you your range which is the min-year, max-year combo.

One approach maybe:
Iterate through the periods. Keep track of a list of periods up to now. Note: At each step, the number of periods increases by 2 (or 1 if there is no overlap with the existing list of periods).
For example
1990 - 2013
Period List contains 1 period { (1990,2013) }
Count List contains 1 entry { 1 }
1995 - 2000
Period List contains 3 periods { (1990,1995), (1995,2000), (2000,2013) }
Count List contains 3 entries { 1, 2, 1 }
2010 - 2020
Period List contains 5 periods { (1990,1995), (1995,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 5 entries { 1, 2, 1, 2, 1 }
1992 - 1999
Period List contains 7 periods { (1990,1992), (1992,1995), (1995,1999), (1999,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 7 entries { 1, 2, 3, 2, 1, 2, 1 }

1) arrange in assending order year wise starting from the largest series.
2) count the years for largest series for whole data set
3) then identify the largest count.
4) the largest count is your answer for years... this can be done in Algo.

Related

Optimizing the algorithm for checking available reservations for the store

I would like to ask about some algorithms related to checking if a customer can book a table at the store?
I will describe my problem with the following example:
Restaurant:
User M has a restaurant R. R is open from 08:00 to 17:00.
Restaurant R has 3 tables (T1, T2, T3), each table will have 6 seats.
R offers F1 food, which can be eaten within 2 hours.
Booking:
R has a customer C has booked a table T1 for 5 people with F1 food | B[0]
B[0] has a start time: 9AM
M is the manager of the store, so M wants to know if the YYYY-MM-DD date has been ordered by the customer or not?
My current algorithm is:
I will create an array with the elements as the number of minutes of the day, and their default value is 0
24 * 60 = 1440
=> I have: arr[1440] = [0, 0, 0, ...]
Next I will get all the bookings for the day YYYY-MM-DD. The result will be an array B[].
Then I will loop the array B[]
for b in B[]
I then keep looping for the start_time, to the end_time of b with step of 1 min.
for time = start_time, time <= end_time. time++
With each iteration I will reassign the value of the array arr with index as the corresponding number of minutes in the day to 1
(It is quite similar to Sieve of Eratosthenes)
Then what I need to do is iterate over the array arr 1 more time, if there is at least 1 value 0 in the array it means YYYY-MM-DDdate is still bookable.
But my algorithm will not be optimal if increase the number of tables that the store has, the number of days to check is many days (for example from 2022-01-01 -> 2022-02-01), ...
Thank you very much.
P/S: Regarding the technology background, I am currently using laravel 9

Ruby - how to generate random time intervals matching a total amount of hours?

I am trying to write a simple script, where the input would be a start date, end date and a total amount of hours (150) and the script would generate a simple report containing random date-time intervals (with ideally weekdays) that would sum the entered amount of hours.
This is what I am trying to achieve:
Start: 2020-01-01
End: 2020-01-31
Total hours: 150
Report:
Jan 1, 2019, 08:02:20 – Jan 1, 2019, 08:55:00: sub time -> 52:40 (52 minutes 40 seconds)
Jan 1, 2019, 09:00:00 – Jan 1, 2019, 09:38:13: sub time -> 38:13 (38 minutes 13 seconds)
...
Jan 3, 2019, 13:15:00 – Jan 3, 2019, 14:45:13: sub time -> 01:30:13 (1 hour 30 minutes 13 seconds)
...
TOTAL TIME: 150 hours (or in minutes)
How do I generate time intervals where the total amount of minutes/hours would be equal to a given number of hours?
I assume the question is loosely-worded in the sense that "random" is not meant in a probability sense; that is, the intent is not to select a set of intervals (that total a given number of hours in length) with a mechanism that ensures all possible sets of such intervals have an equal likelihood of being selected. Rather, I understand that a set of intervals is to be chosen (e.g., for testing purposes) in a way that incorporates elements of randomness.
I have assumed the intervals are to be non-overlapping and the number of intervals is to be specified. I don't understand what "with ideally weekdays" means so I have disregarded that.
The heart of the approach I will propose is the following method.
def rnd_lengths(tot_secs, target_nbr)
max_secs = 2 * tot_secs/target_nbr - 1
arr = []
loop do
break(arr) if tot_secs.zero?
l = [(0.5 + max_secs * rand).round, tot_secs].min
arr << l
tot_secs -= l
end
end
The method generates an array of integers (lengths of intervals), measured in seconds, ideally having target_nbr elements. tot_secs is the required combined length of the "random" intervals (e.g., 150*3600).
Each element of the array is drawn randomly drawn from a uniform distribution that ranges from zero to max_secs (to be computed). This is done sequentially until tot_secs is reached. Should the last random value cause the total to exceed tot_secs it is reduced to make the total equal tot_secs.`
Suppose tot_secs equals 100 and we wish to generate 4 random intervals (target_nbr = 4). That means the average length of the intervals would be 25. As we are using a uniform distribution having an average of (1 + max_secs)/2, we may derive the value of max_secs from the expression
target_nbr * (1 + max_secs)/2 = tot_secs
which is
max_secs = 2 * tot_secs/target_nbr - 1
the first line of the method. For the example I mentioned, this would be
max_secs = 2 * 100/4 - 1
#=> 49
Let's try it.
rnd_lengths(100, 4)
#=> [49, 36, 15]
As you see the array that is returned sums to 100, as required, but it contains only 3 elements. That's why I named the argument target_nbr, as there is no assurance the array returned will have that number of elements. What to do? Try again!
rnd_lengths(100, 4)
#=> [14, 17, 26, 37, 6]
Still not 4 elements, so keep trying:
rnd_lengths(100, 4)
#=> [11, 37, 39, 13]
Success! It may take a few tries to get the correct number of elements, but for parameters likely to be used, and the nature of the probability distribution employed, I wouldn't expect that to be a problem.
Let's put this in a method.
def rdm_intervals(tot_secs, nbr_intervals)
loop do
arr = rnd_lengths(tot_secs, nbr_intervals)
break(arr) if arr.size == nbr_intervals
end
end
intervals = rdm_intervals(100, 4)
#=> [29, 26, 7, 38]
We can compute random gaps between intervals in the same way. Suppose the intervals fall within a range of 175 seconds (the number of seconds between the start time and end time). Then:
gaps = rdm_intervals(175-100, 5)
#=> [26, 5, 19, 4, 21]
As seen, the gaps sum to 75, as required. We can disregard the last element.
We can now form the intervals. The first interval begins at 26 seconds and ends at 26+29 #=> 55 seconds. The second interval begins at 55+5 #=> 60 seconds and ends at 60+26 #=> 86 seconds, and so on. We therefore find the intervals (each in ranges of seconds from zero) to be:
[26..55, 60..86, 105..112, 116..154]
Note that 175 - 154 = 21, the last element of gaps.
If one is uncomfortable with the fact that the last elements of intervals and gaps that are generally constrained in size one could of course randomly reposition those elements within their respective arrays.
One might not care if the number of intervals is exactly target_nbr. It would be simpler and faster to just use the first array of interval lengths produced. That's fine, but we still need the above methods to compute the random gaps, as their number must equal the number of intervals plus one:
gaps = rdm_intervals(175-100, intervals.size + 1)
We can now use these two methods to construct a method that will return the desired result. The argument tot_secs of this method equals total number of seconds spanned by the array intervals returned (e.g., 3600 * 150). The method returns an array containing nbr_intervals non-overlapping ranges of Time objects that fall between the given start and end dates.
require 'date'
def construct_intervals(start_date_str, end_date_str, tot_secs, nbr_intervals)
start_time = Date.strptime(start_date_str, '%Y-%m-%d').to_time
secs_in_period = Date.strptime(end_date_str, '%Y-%m-%d').to_time - start_time
intervals = rdm_intervals(tot_secs, nbr_intervals)
gaps = rdm_intervals(secs_in_period - tot_secs, nbr_intervals+1)
nbr_intervals.times.with_object([]) do |_,arr|
start_time += gaps.shift
end_time = start_time + intervals.shift
arr << (start_time..end_time)
start_time = end_time
end
end
See Date::strptime.
Let's try an example.
start_date_str = '2020-01-01'
end_date_str = '2020-01-31'
tot_secs = 3600*150
#=> 540000
construct_intervals(start_date_str, end_date_str, tot_secs, 4)
#=> [2020-01-06 18:05:04 -0800..2020-01-09 03:48:00 -0800,
# 2020-01-09 06:44:16 -0800..2020-01-11 23:33:44 -0800,
# 2020-01-20 20:30:21 -0800..2020-01-21 17:27:44 -0800,
# 2020-01-27 19:08:38 -0800..2020-01-28 01:38:51 -0800]
construct_intervals(start_date_str, end_date_str, tot_secs, 8)
#=> [2020-01-03 18:43:36 -0800..2020-01-04 10:49:14 -0800,
# 2020-01-08 07:55:44 -0800..2020-01-08 08:17:18 -0800,
# 2020-01-11 00:54:36 -0800..2020-01-11 23:00:53 -0800,
# 2020-01-14 05:20:14 -0800..2020-01-14 22:48:45 -0800,
# 2020-01-16 18:28:28 -0800..2020-01-17 22:50:24 -0800,
# 2020-01-22 02:59:31 -0800..2020-01-22 22:33:08 -0800,
# 2020-01-23 00:36:59 -0800..2020-01-24 12:15:37 -0800,
# 2020-01-29 11:22:21 -0800..2020-01-29 21:46:10 -0800]
See Date::strptime
START -xxx----xxx--x----xxxxx---xx--xx---xx-xx-x-xxx-- END
We need to fill a timespan with alternating periods of ON and OFF. This can be
denoted by a list of timestamps. Let's say that the period always starts with
an OFF period for simplicity's sake.
From the start/end of the timespan and the total seconds in ON state, we
gather useful facts:
the timespan's total size in seconds total_seconds
the second totals of both the ON (on_total_seconds) and the OFF (off_total_seconds) periods
Once we know these, a workable algorithm looks more or less like this - pardon
the functions without implementation:
# this can be a parameter as well
MIN_PERIODS = 10
MAX_PERIODS = 100
def fill_periods(start_date, end_date, on_total_seconds = 150*60*60)
total_seconds = get_total_seconds(start_date, end_date)
off_total_seconds = total_seconds - on_total_seconds
# establish two buckets to pull from alternately in populating our array of durations
on_bucket = on_total_seconds
off_bucket = off_total_seconds
result = []
# populate `result` with durations in seconds. `result` will sum to `total_seconds`
while on_bucket > 0 || off_bucket > 0 do
off_slice = rand(off_total_seconds / MAX_PERIODS / 2, off_total_seconds / MIN_PERIODS / 2).to_i
off_bucket -= [off_slice, off_bucket].min
on_slice = rand(on_total_seconds / MAX_PERIODS / 2, on_total_seconds / MIN_PERIODS / 2).to_i
on_bucket -= [on_slice, on_bucket].min
# randomness being random, we're going to hit 0 in one bucket before the
# other. when this happens, just add this (off, on) pair to the last one.
if off_slice == 0 || on_slice == 0
last_off, last_on = result.pop(2)
result << last_off + off_slice << last_on + on_slice
else
result << off_slice << on_slice
end
end
# build up an array of datetimes by progressively adding seconds to the last timestamp.
datetimes = result.each_with_object([start_date]) do |period, memo|
memo << add_seconds(memo.last, period)
end
# we want a list of datetime pairs denoting ON periods. since we know our
# timespan starts with OFF, we start our list of pairs with the second element.
datetimes.slice(1..-1).each_slice(2).to_a
end

Schedule meeting problem (count how many meetings an owner can schedule based on investor availabilities)

I tried to solve the task which sounds like "Given the schedules of the days investors are available, determine how many meetings the owner can schedule". The owner is looking to meet new investors to get some funds for his company. The owner must respect the investor's schedule. Note that the owner can only have one meeting per day.
The schedule consists of 2 integer arrays, firstDay and lastDay. Each element in the array firstDay represents the first day an investor is available, and each element in lastDay represents the last day an investor is available, both inclusive.
Example:
firstDay = [1,2,3,3,3]
lastDay = [2,2,3,4,4]
There are 5 investors [i0, i1, i2, i3, i4]
The investor i0 is available from day 1 to day 2 inclusive [1,2]
The investor i1 is available in day 2 only [2,2]
The investor i2 is available in day3 only [3,3]
The investors i3 and i4 are available from day 3 to day 4 only [3,4]
The owner can only meet 4 investors out of 5: i0 in day 1, i1 in day 2, i2 in day 3 and i3 in day 4. The image below shows the scheduled meetings in green and blocked days are in gray.
A graphic shows the scheduled meetings
The task is to implement the function which takes 2 lists of integers as input parameters and returns integer result that represents the maximum number of meetings possible.
Constraints
array length - bigger or equal 1 and less or equal 100000
firstDay[i], lastDay[i] bigger or equal 1 and less or equal 100000 (i bigger than or equal 0 less than n)
firstDay[i] less or equal lastDay[i]
My implementation of this task is the following:
public static int countMeetings(List<int> firstDay, List<int> lastDay)
{
var count = 0;
count = firstDay.Concat(lastDay).Distinct().Count();
if (count > firstDay.Count)
{
count = firstDay.Count;
}
return count;
}
And this code successfully passes 8 of 12 provided tests. I'll be glad to see and discuss any working solutions to this issue. Thanks.
For the input
firstDay = [1,1,1]
lastDay = [5,5,5]
your code returns 2 however correct answer is 3

rollapply + specnumber = species richness over sampling intervals that vary in length?

I have a community matrix (samples x species of animals). I sampled the animals weekly over many years (in this example, three years). I want to figure out how sampling timing (start week and duration a.k.a. number of weeks) affects species richness. Here is an example data set:
Data <- data.frame(
Year = rep(c('1996', '1997', '1998'), each = 5),
Week = rep(c('1', '2', '3', '4', '5'), 3),
Species1 =sample(0:5, 15, replace=T),
Species2 =sample(0:5, 15, replace=T),
Species3 =sample(0:5, 15, replace=T)
)
The outcome that I want is something along the lines of:
Year StartWeek Duration(weeks) SpeciesRichness
1996 1 1 2
1996 1 2 3
1996 1 3 1
...
1998 5 1 1
I had tried doing this via a combination of rollapply and vegan's specnumber, but got a sample x species matrix instead of a vector of Species Richness. Weird.
For example, I thought that this should give me species richness for sampling windows of two weeks:
test<-rollapply(Data[3:5],width=2,specnumber,align="right")
Thank you for your help!
I figured it out by breaking up the task into two parts:
1. Summing up species abundances using rollapplyr, as implemented in a ddplyr mutate_each thingamabob
2. Calculating species richness using vegan.
I did this for each sampling duration window separately.
Here is the bare bones version (I just did this successively for each sampling duration that I wanted by changing the width argument):
weeksum2 <- function(x) {rollapply(x, width = 2, align = 'left', sum, fill=NA)}
sum2weeks<-Data%>%
arrange(Year, Week)%>%
group_by(Year)%>%
mutate_each(funs(weeksum2), -Year, -Week)
weeklyspecnumber2<-specnumber(sum2weeks[,3:ncol(sum2weeks)],
groups = interaction(sum2weeks$Week, sum2weeks$Year))
weeklyspecnumber2<-unlist(weeklyspecnumber2)
weeklyspecnumber2<-as.data.frame(weeklyspecnumber2)
weeklyspecnumber2$WeekYear<-as.factor(rownames(weeklyspecnumber2))
weeklyspecnumber2<-tidyr::separate(weeklyspecnumber2, WeekYear, into = c('Week', 'Year'), sep = '[.]')

Algorithm to calculate the number fortnightly occurring events in a given calendar month

I'm looking for the cleverest algorithm for determining the number of fortnightly occurring events in a given calendar month, within a specific series.
i.e. Given the series is 'Every 2nd Thursday from 7 October 2010' the "events" are falling on (7 Oct 2010, 21 Oct, 4 Nov, 18 Nov, 2 Dec, 16 Dec, 30 Dec, ...)
So what I am after is a function
function(seriesDefinition, month) -> integer
where:
- seriesDefinition is some date that is a valid date in the series,
- month indicates a month and a year
such that it accurately yeilds: numberFortnightlyEventsInSeriesThatFallInCalendarMonth
Examples:
NumberFortnightlyEventsInMonth('7 Oct 2010, 'Oct 2010') -> 2
NumberFortnightlyEventsInMonth('7 Oct 2010, 'Nov2010') -> 2
NumberFortnightlyEventsInMonth('7 Oct 2010, 'Dec 2010') -> 3
Note that October has 2 events, November has 2 events, but December has 3 events.
Psuedocode preferred.
I don't want to rely on lookup tables or web service calls or any other external resources other than potentially universal libraries. For example, I think we can safely assume that most programming languages will have some date manipulation functions available.
There is no "clever" algorithm when handling dates, there is only the tedious one. That is, you have to specifically list how many days are in each month, handle leap years (every four years, except every 100 years, except every 400 years), etc.
Well, for the algorithm you are talking about the usual solution is to calculate the day number starting from some fixed date. (Number of day plus cumulated number of days in prev months plus number of years * 365 minus (number of year / 4) plus (number of year / 100) minus (number of year / 400))
Having this, you can easily implement what you need to. You need to calculate which day of week was the 1 January 1. Then you can easily see what is the number of "every second thursdays" from that day to 1 Oct 2010 and 1 Dec 2010. their difference is the value you are looking for.
My solution ...
Public Function NumberFortnightlyEventsInMonth(seriesDefinition As Date, month As String) As Integer
Dim monthBeginDate As Date
monthBeginDate = DateValue("1 " + month)
Dim lastDateOfMonth As Date
lastDateOfMonth = DateAdd("d", -1, DateAdd("m", 1, monthBeginDate))
' Step 1 - How many days between seriesDefinition and the 1st of [month]
Dim daysToMonthBegin As Integer
daysToMonthBegin = DateDiff("d", seriesDefinition, monthBeginDate)
' Step 2 - How many fortnights (14 days) fit into the number from Step 1? Round up to the nearest whole number.
Dim numberFortnightsToFirstOccurenceOfSeriesInMonth As Integer
numberFortnightsToFirstOccurenceOfSeriesInMonth = (daysToMonthBegin \ 14) + IIf(daysToMonthBegin Mod 14 > 0, 1, 0)
' Step 3 - The date of the first date of this series inside that month is seriesDefinition + the number of fortnights from Step 2
Dim firstDateOfSeriesInMonth As Date
firstDateOfSeriesInMonth = DateAdd("d", (14 * numberFortnightsToFirstOccurenceOfSeriesInMonth), seriesDefinition)
' Step 4 - How many fortnights fit between the date from Step 3 and the last date of the [month]?
NumberFortnightlyEventsInMonth = 1 + (DateDiff("d", firstDateOfSeriesInMonth, lastDateOfMonth) \ 14)
End Function

Resources