Finding algorithm to calculate maximum number of people in town simultaneously - algorithm

I would like a help in the following question of finding an algorithm to the following problem:
Given a pair list (a1,b1),...,(an,bn) , where ai is the entry date of a person i to the city and bi is the departure date of person i. Assume that you enter the city at the beginning of the day, and leave the city at the end of the day.
For example, if a person entered the city on the 4th of the month, and left on the 12th of the same month, then he was in the city 9 days. Propose an algorithm as efficient as possible for the calculation of the maximum number of people who were in the city at the same time
My try:
Use two arrays, one for the entry and one array for the departure. Sort the arrays in ascending order. Iterate through the sorted arrays and for each iteration, if the current element in the entry array is smaller (or equal) to the current element in the departure array then increment a counter, otherwise we decrement the counter. After each iteration, update another variable for the maximum counts to be the maximum between the counter variable and the maximum counts variable.
Then print the maximum counts.
This is done in O(n log n). Is there a more efficient way to solve this problem?

Create an array(lets say count) of size 33 and initialize it all with 0s.
Now traverse through the pair list and for every pair (ai, bi), do count[ai]++ and count[bi+1]--
Now take a variable called numberOfPeople and perform following loop
int numberOfPeople = 0, maxNumberOfPeople = 0;`
for(int i=1;i<=31;++i){
numberOfPeople += count[i];
maxNumberOfPeople = max(numberOfPeople, maxNumberOfPeople);
}
Inside the loop, for every i, numberOfPeople will represent the number of people present in the city on ith day. The time complexity of this solution is O(n)
This assumes that there are only max 31 days in a month and we are taking about a single month only. The solution can easily be modified if that is not the case

Split each pair in two events: arrival and departure and put them into a collection.
List<(DateTime time, bool isArrival)> events = ...
Then sort the collection by time, e.g.
events.Sort((left, right) => left.time.CompareTo(right.time));
Finally, scan the collection: on arrival add 1, on departure subtract 1 while computing maximum:
int result = 0;
int current = 0;
foreach (var event in events)
if (event.isArrival) {
count += 1;
result = Math.Max(result, count);
}
else
count -= 1;
It is sorting procedure that defines time complexity here. In general case we have typical O(n * log(n)) time complexity;
however, if we have real dates, which are restrited (say, the all can be represented in yyyy-MM-dd format) we can perform radix sort and get O(n) time complexity.

Related

Find the interval that contains the largest amount of intervals and enter their number

Given lots of intervals [ai, bi], find the interval that contains the most number of intervals.
I easily know how to do this in O(n^2) but I must do it in O(nlogn).
I think abount making two arrays, one with sorted starts and second with sorted ends, but I really don't know what to do next.
I can only use structures like an array or a tuple.
Interval containment is a combination of two conditions: a includes b if a.start <= b.start and a.end >= b.end.
If you sort intervals by (end, size) and process them in that order, then the end condition is easy -- current.end >= x.end if x is an interval you've already processed.
To find out how many intervals the current interval contains, you just need to know how many previously processed intervals have start points >= current.start. Every time you process an interval, put its start point into an order statistic tree or similar. Then you can find the number of contained start points in O(log N) time, which makes your algorithm O(N log N) all together. That's the same cost as the initial sort.
Note that I've hand-waved over handling of duplicate intervals above. After sorting the interval list, you should do a pass to remove duplicates and add counts for intervals that contain copies of themselves, if that counts as containment in your problem.
You can do it simply in O(n log n) using simply two sorts!
First, sort every interval according to its starting time.
Then, sort them in reverse order of their end time.
You know a specific interval cannot contain any of the intervals placed before them neither in the first or the second array, because intervals located before them in the first array start before him, and end after him in the second array.
Now, you know that the sum of the positions of an interval in the two arrays gives a correct upper bound for the number of intervals it does not contain.
It is an upper bound because an interval starting before and ending after another interval will count twice.
What's more, this estimation will be exact for the interval you are searching for, because if another interval starts before and ends after it, then it will contain even more intervals.
That's why the interval containing the most number of other intervals will just be the interval minimizing the sum of those two numbers.
## Warning : not tested !
intervals = [(1, 6), (5, 6), (2, 5), (8, 9), (1, 6)]
N = len(intervals)
# link intervals to their position in the array
for i, (start, end) in enumerate(intervals):
intervals[i] = (start, end, i)
# this will contain the position in ascending order of start time of every interval
positionStart = [None] * N
sortedStart = sorted(intervals, key = lambda inter: (inter[0], -inter[1], inter[2]))
for i, (start, end, pos) in sortedStart:
positionStart[pos] = i
# this will contain the position in descending order of end time of every interval
positionEnd = [None] * N
sortedEnd = sorted(intervals, key = lambda inter: (-inter[1], inter[0], inter[2]))
for i, (start, end, pos) in sortedEnd:
positionEnd[pos] = i
best = None # interval containing the most number of intervals
score = 0 # number of intervals it contains
for i in range(N):
# Upper bound of the number of interval it does not contain
result = positionStart[i] + positionEnd[i]
if N - result - 1 >= score:
best = (intervals[i][0], intervals[i][1])
score = N - result - 1
print(f"{best} is the interval containing the most number of intervals, "
+ "containing a total of {score} intervals")

Algorithm: Count the minimum number of distinct sorted subsequence for each update query

This question is asked to me in an interview:
Distinct sorted subsquence containing adjacent values is defined as either its length is one or it only contains adjacent numbers when sorted. Each element can belong to only 1 such subsequence. I have Q queries, each updating a single value in A and I have to answer for each query, how many parts would be in the partition of A into distinct sorted subsequences if the number of parts was minimized.
For example, the number of parts for A = [1,2,3,4,3,5] can be minimized by partitioning it in the following two ways, both of which contain only two parts:
1) [1,2,3] && [4,3,5] ==> answer 2 (4,3,5 sorted is 3,4,5 all adjacent values)
2) [1,2,3,4,5] && [3] ==> answer 2
Approach I tried: Hashing and forming sets but all test cases were not cleared because of Timeout.
Problem Statment PDF : PDF
Constraints:
1<= N,Q <= 3*10^5
1< A(i)< 10^9
Preprocessing
First you can preprocess A before all queries and generate a table (say times_of) such that when given a number n, one can efficiently obtain the number of times n appears in A through expression like times_of[n]. In the following example assuming A is of type int[N], we use an std::map to implement the table. Its construction costs O(NlogN) time.
auto preprocess(int *begin, int *end)
{
std::map<int, std::size_t> times_of;
while (begin != end)
{
++times_of[*begin];
++begin;
}
return times_of;
}
Let min and max be the minimum and maximum elements of A respectively. Then the following lemma applies:
The minimum number of distinct sorted subsequences is equal to max{0, times_of[min] - times_of[min-1]} + ... + max{0, times_of[max] -
times_of[max-1]}.
A rigorous proof is a bit technical, so I omit it from this answer. Roughly speaking, consider numbers from small to large. If n appears more than n-1, it has to bring extra times_of[n]-times_of[n-1] subsequences.
With this lemma, we can compute initially the minimum number of distinct sorted subsequences result in O(N) time (by iterating through times_of, not by iterating from min to max). The following is a sample code:
std::size_t result = 0;
auto prev = std::make_pair(min - 1, static_cast<std::size_t>(0));
for (auto &cur : times_of)
{
// times_of[cur.first-1] == 0
if (cur.first != prev.first + 1) result += cur.second;
// times_of[cur.first-1] == times_of[prev.first]
else if (cur.second > prev.second) result += cur.second - prev.second;
prev = cur;
}
Queries
To deal with a query A[u] = v, we first update times_of[A[u]] and times_of[v] which costs O(logN) time. Then according to the lemma, we need only to recompute constant (i.e. 4) related terms to update result. Each recomputation costs O(logN) time (to find the previous or next element in times_of), so a query takes O(logN) time in total.
Keep a list of clusters on the first pass. Each has a collection of values, with a minimum and maximum value. These clusters could very well be stored in a segment tree (making it easy to merge in case they ever touch).
Loop through your N numbers, and for each number, either add it to an existing cluster (possibly triggering a merge), or create a new cluster. This may be easier if your clusters store min-1 and max+1.
Now you are done with the initial input of N numbers, and you have several clusters, all of which are likely to be of a reasonable size for radix sort.
However, you do not want to finish the radix sort. Generate the list of counts, then use this to count adjacent clusters. Loop through this, and every time the count decreases, you have found (difference) many of your final distinct sorted subsequences. (Using max+1 pays off again, because the guaranteed zero at the end means you don't have to add a special case after the loop.)

Cheapest way to move through an array

I have a n x n array. Each field has a cost associated with it (a natural number) and I here's my problem:
I start in the first column. I need to find the cheapest way to move through an array (from any field in the first column to any in the last column) following these two rules:
I can only make moves to the right, to the top right, the lower right an to the bottom.
In a path I can only make k (some constant) moves to the bottom.
Meaning when I'm at cell x I can moves to these cells o:
How do I find the cheapest way to move through an array? I thought of this:
-For each field of the n x n array I keep a helpful array of how many bottom moves it takes to get there in the cheapest path. For the first column it's all 0's.
-We go through each of the field in this orded : columns left to right and rows top to bottom.
-For each field we check which of their neighbours is 'the cheapest'. If it's the upper one (meaning we have to take a bottom route to get from him) we check if it took k bottom moves to get to him, if not then then we assign the cost of getting to analyzed field as the sum of getting to field at the top+cost of the field, and in the auxilary array for the record corresponding to the field the put the number of bottom moves as x+1, where x is how many bottom moves we took to get to his upper neightbour.
-If the upper neighbour is not the cheapest we assign the cost of the other cheapest neighbour and the number of bottom moves as the number of moves we took to get to him.
Time complexity is O(n^2), and so is memory.
Is this correct?
Here is DP solution in O(N^2) time and O(N) memory :-
Dist(i,j) = distance from point(i,j) to last column.
Dist(i,j) = cost(i,j) + min { Dist(i+1,j),Dist(i,j+1),Dist(i+1,j+1),Dist(i-1,j+1) }
Dist(i,N) = cost[i][N]
Cheapest = min { D(i,0) } i in (1,M)
This DP equation suggests that you need only values of next rows to get current row so O(N) space for maintaining previous calculation. It also suggests that higher row values in same column need to evaluated first.
Pseudo Code :-
int Prev[N],int Curr[N];
// last row calculation => Base Case for DP
for(i=0;i<M;i++)
Prev[i] = cost[i][N-1];
// Evaluate the rows and columns in descending manner
for(j=N-2;j>=0;j--) {
for(i=M-1;i>=0;i--) {
Curr[i][j] = cost[i][j] + min ( Curr[i+1][j],Prev[i][j+1],Prev[i-1][j+1],Prev[i+1][j+1] )
}
Prev = Curr
}
// find row with cheapest cost
Cheapest = min(Prev)

alternative to sort algorithm less than O(nlog(n))

I have recieve as input a list of candidates for work so that the list I recieve is already sorted by requirments of the salary for each one and also is grase from the university(this parameter is not sorted). example:
danny 13000$ 75.56
dan 9000$ 100
bob 5000$ 98
in such a list I need to find the two candidates with the higher grades so that sum of both salary is not more than 10000$ (I can assume their is no two candidates with the same grade and there are no two pair of students with the same sum of grade (94+90 = 91 + 93))
I need to find them in comlexity of O(n).
I understand I can not do a sort algorithm (the min is n*log(n)) so How can I do that ?
is it possible ?
O(n) solution (assuming the number 10,000 is fixed):
arr <- new empty array of size 10000
for each i from 1 to 10000:
arr[i] = "most valuable" worker with salary up to i
best <- -infinity
for each i from 1 to 5000:
best = max{ best, arr[i] + arr[10000-i]}
return best
The idea is to hold an array where in each entry you hold the "best" candidate that asked for at most that salary (this is done in the first loop).
In the second loop, you go over all feasible possibilities and finds the max.
Note that the naive approach for the first loop is O(n) with terrible constants (each iteration is O(n), and it is done 10,000 times). However, you could build arr using DP by going from the lowest salary up, and doing something like:
arr[i] = max {arr[i-1], best worker with salary i}
The DP solution is O(n) with only once traversal over the array (or formally O(W+n) where W is the max salary).
Also note - this is a private case of knapsack problem, with the ability to choose only 2 elements.
since it is sorted by salary start with two pointers. i pointing to the start (lowest salery) and j pointing to the highest salery. now while the salery is over the limit decrease j. now increase i and decrease j staying below the limit. track the maximum grade for i and for j.

How to find sum of elements from given index interval (i, j) in constant time?

Given an array. How can we find sum of elements in index interval (i, j) in constant time. You are allowed to use extra space.
Example:
A: 3 2 4 7 1 -2 8 0 -4 2 1 5 6 -1
length = 14
int getsum(int* arr, int i, int j, int len);
// suppose int array "arr" is initialized here
int sum = getsum(arr, 2, 5, 14);
sum should be 10 in constant time.
If you can spend O(n) time to "prepare" the auxiliary information, based on which you would be able calculate sums in O(1), you could easily do it.
Preparation (O(n)):
aux[0] = 0;
foreach i in (1..LENGTH) {
aux[i] = aux[i-1] + arr[i];
}
Query (O(1)), arr is numerated from 1 to LENGTH:
sum(i,j) = aux[j] - aux[i-1];
I think it was the intent, because, otherwise, it's impossible: for any length to calculate sum(0,length-1) you should have scanned the whole array; this takes linear time, at least.
It cannot be done in constant time unless you store the information.
You would have to do something like specially modify the array to store, for each index, the sum of all values between the start of the array and this index, then using subtraction on the range to get the difference in sums.
However, nothing in your code sample seems to allow this. The array is created by the user (and can change at any time) and you have no control over it.
Any algorithm that needs to scan a group of elements in a sequential unsorted list will be O(n).
Previous answers are absolutely fine for the question asked. I am just adding a point, if this question is changed a bit like:
Find the sum of the interval, if the array gets changed dynamically.
If array elements get changed, then we have to recompute whatever sum we have stored in the auxiliary array as mentioned in #Pavel Shved's approach.
Recomputing is O(n) operation and hence we need to reduce the complexity down to O(nlogn) by making use of Segment Tree.
http://www.geeksforgeeks.org/segment-tree-set-1-sum-of-given-range/
There are three known algorithms for range based queries given [l,r]
1.Segment tree: total query time O(NlogN)
2.Fenwick tree: total query time O(NlogN)
3.Mo's algorithm(square root decomposition)
The first two algorithms can deal with modifications in the list/array given to you. The third algorithm or Mo's algorithm is an offline algorithm means all the queries need to be given to you prior. Modifications in the list/array are not allowed in this algorithm. For implementation, runtime and further reading of this algorithm you can check out this Medium blog. It explains with code. And a very few people actually know about this method.
this question will solve O(n^2)time,O(n)space or O(n)time,O(n)space..
Now the best optimal solution in this case (i.e O(n)time,O(n))
suppose a[]={1,3,5,2,6,4,9} is given
if we create an array(sum[]) in which we kept the value of sum of 0 index to that particular index.like for array a[],sum array will be sum[]={1,4,9,11,17,21,30};like
{1,3+1,3+1+5......} this takes O(n)time and O(n) space..
when we give index then it directly fetch from sum array it means add(i,j)=sum[j]-sum[i-1]; and this takes O(1) times and O(1) spaces...
so,this program takes O(n) time and O(N) spaces..
int sum[]=new int[l];
sum[0]=a[0];
System.out.print(cumsum[0]+" ");
for(int i=1;i<l;i++)
{
sum[i]=sum[i-1]+a[i];
System.out.print(sum[i]+" ");
}
?* this gives 1,4,9,11,17,21,30 and take O(n)time and O(n) spaces */
sum(i,j)=sum[j]-sum[i-1]/this gives sum of indexes from i to j and take O(1)time and O(1) spaces/
so,this program takes O(n) time and O(N) spaces..emphasized text

Resources