scheduling n people with given time of travel - algorithm

this is a puzzle but i think it could be a classical algorithm which i am unaware of :
There are n people at the bottom of a mountain, and everyone wants to go up, then down the mountain. Person i takes u[i] time to climb this mountain, and d[i] time to descend it.
However, at same given time atmost 1 person can climb , and .atmost 1 person can descend the mountain. Find the least time to travel up and back down the mountain.
Update 1 :
well i tried with few examples and found that it's not reducible to sorting , or getting the fastest climbers first or vice versa . I think to get optimal solution we may have to try out all possible solutions , so seems to be NP complete.
My initial guess: (WRONG)
The solution i thought is greedy : sort n people by start time in ascending order. Then up jth person up and kth down where u[j]<= d[k] and d[k] is minimum from all k persons on top of mountain. I am not able to prove correctness of this .
Any other idea how to approach ?
A hint would suffice.

Try to think in the following manner: if the people are not sorted in ascending order of time it takes them to climb the mountain than what happens if you find a pair of adjacent people that are not in the correct order(i.e. first one climbs longer than second one) and swap them. Is it possible that the total time increases?

I think it is incorrect. Consider
u = [2,3]
d = [1,3]
Your algorithm gives ordering 0,1 whereas it should be 1,0.
I would suggest another greedy approach:
Create ordering list and add first person.
For current ordering keep track of two values:
mU - time of last person on the mountain - time of the end
mD - time of earliest time of first descending
From people who are not ordered choose the one which minimises abs(mD - d) and abs(mU - u). Then if abs(mD - d) < abs(mU - u) he should go at the beginning of ordering. Otherwise he goes at the end.
Some tweak may still be needed here, but this approach should minimise losses from cases like the one given in the example.

The following solution will only work with n <= 24.
This solution will require dynamic programming and bit-mask technique knowledge to be understood.
Observation: we can easily observe that the optimal total climb up time is fixed, which is equalled to the total climb up time of n people.
For the base case, if n = 1, the solution is obvious.
For n = 2, the solution is simple, just scan through all 4 possibilities and calculate the minimum down time.
For n = 3, we can see that this case will be equal to the case when one person climb up first, followed by two.
And the two person minimum down time can be easily pre-calculated. More important, this two person then can be treated as one person with up time is the total up time of the two, and down time is the minimum down time.
Storing all result for minimum down time for cases from n = 0 to n = 3 in array called 'dp', using bit-mask technique, we represent the state for 3 person as index 3 = 111b, so the result for case n = 3 will be:
for(int i = 0; i < 3; i++){
dp[3] = min(dp[(1<<i)] + dp[3^(1<<i)],dp[3]);
}
For n = 4... 24, the solution will be similar to case n = 3.
Note: The actual formula is not just simple as the code for case n = 3(and it requires similar approach to solve as case n = 2), but will be very similar,

Your approach looks sensible, but it may be over-simplified, could you describe it more precisely here?
From your description, I can't make out whether you are sorting or something else; these are the heuristics that I figured you are using:
Get the fastest climbers first, so the start using the Down path
asap.
Ensure there is always people at the top of the mountain, so
when the Down path becomes available, a person starts descending
immediately.The way you do that is to select first those people who
climb fast and descend slowly.
What if the fastest climber is also the fastest descender? That would leave the Down path idle until the second climber gets to the top, how does your algorithm ensures that this the best order?. I'm not sure that the problem reduces to a Sorting problem, it looks more like a knapsack or scheduling type.

Related

Find minimum steps to convert all elements to zero

You are given an array of positive integers of size N. You can choose any positive number x such that x<=max(Array) and subtract it from all elements of the array greater than and equal to x.
This operation has a cost A[i]-x for A[i]>=x. The total cost for a particular step is the
sum(A[i]-x). A step is only valid if the sum(A[i]-x) is less than or equal to a given number K.
For all the valid steps find the minimum number of steps to make all elements of the array zero.
0<=i<10^5
0<=x<=10^5
0<k<10^5
Can anybody help me with any approach? DP will not work due to high constraints.
Just some general exploratory thoughts.
First, there should be a constraint on N. If N is 3, this is much easier than if it is 100. The naive brute force approach is going to be O(k^N)
Next, you are right that DP will not work with these constraints.
For a greedy approach, I would want to minimize the number of distinct non-zero values, and not maximize how much I took. Our worst case approach is take out the largest each time, for N steps. If you can get 2 pairs of entries to both match, then that shortened our approach.
The obvious thing to try if you can is an A* search. However that requires a LOWER bound (not upper). The best naive lower bound that I can see is ceil(log_2(count_distinct_values)). Unless you're incredibly lucky and the problem can be solved that quickly, this is unlikely to narrow your search enough to be helpful.
I'm curious what trick makes this problem actually doable.
I do have an idea. But it is going to take some thought to make it work. Naively we want to take each choice for x and explore the paths that way. And this is a problem because there are 10^5 choices for x. After 2 choices we have a problem, and after 3 we are definitely not going to be able to do it.
BUT instead consider the possible orders of the array elements (with ties both possible and encouraged) and the resulting inequalities on the range of choices that could have been made. And now instead of having to store a 10^5 choices of x we only need store the distinct orderings we get, and what inequalities there are on the range of choices that get us there. As long as N < 10, the number of weak orderings is something that we can deal with if we're clever.
It would take a bunch of work to flesh out this idea though.
I may be totally wrong, and if so, please tell me and I'm going to delete my thoughts: maybe there is an opportunity if we translate the problem into another form?
You are given an array A of positive integers of size N.
Calculate the histogram H of this array.
The highest populated slot of this histogram has index m ( == max(A)).
Find the shortest sequence of selections of x for:
Select an index x <= m which satisfies sum(H[i]*(i-x)) <= K for i = x+1 .. m (search for suitable x starts from m down)
Add H[x .. m] to H[0 .. m-x]
Set the new m as the highest populated index in H[0 .. x-1] (we ignore everything from H[x] up)
Repeat until m == 0
If there is only a "good" but not optimal solution sought for, I could imagine that some kind of spectral analysis of H could hint towards favorable x selections so that maxima in the histogram pile upon other maxima in the reduction step.

Dyanamic Programming - Coin Change Problem

I am solving the following problem from hackerrank
https://www.hackerrank.com/challenges/coin-change/problem
I 'm unable to solve the problem , so I have looked at the editorial and they mentioned
T(i, m) = T(i, m-i)+T(i+1, m)
I'm unable to get big picture of why this solution works on a higher level. (like a proof in CLRS or simple understandable example)
Solution which I have written is as follows
fun(m){
//base cases
count = 0;
for(i..n){
count+= fun(m-i);
}
}
My solution didn't work because there are some duplicates calls. But how editorial works and what is the difference between my solution and editorial on a higher level..
I think in order for this to work you have to clearly define what T is. Namely, let's define T(i,m) to be the number of ways to make change for m units using only coins with index at least i (i.e. we only look at the ith coin, the (i+1)th coin, all the way to the nth coin while neglecting the first i-1 coins). Further, we define an array C such that C[i] is the value of the ith coin (note that in general C[i] is not the same as i). As a result, if there are n coins (i.e. length of C is n) and we want to make change for W units, we are looking for the value T(0, W) as our answer (make sure you can see why this is the case at this point!).
Now, we proceed by constructing a recursive definition of T(i,m). Note that our solution will either contain an additional ith coin or it won't. In the case that it does, our new target will simply be m - C[i] and the number of ways to make change for this is T(i,m - C[i]) (since our new target is now C[i] less than m). In another case, our solution doesn't contain the ith coin. In this case, we keep the target value the same, but only consider coins with index greater than i. Namely, the number of ways to make change in this case is T(i+1,m). Since these cases are disjoint and exhaustive (either you put the ith coin in the solution or you don't!), we have that
T(i,m) = T(i, m-C[i]) + T(i+1,m)
which is very similar to what you had (the C[i] difference is important). Note that if m <= 0 (since we are assuming that coin values are positive), there are 0 ways to make change. You must keep these base cases in mind when computing T(i,m).
Now it remains to compute T(0, W), which you can easily do recursively. However, you likely noticed that a lot of the subproblems are repeated making this a slow solution. The solution is to use something called dynamic programming or memoization. Namely, whenever a solution is computed, add its value to a table (e.g. T[i,m] where T is a n x W size 2D array). Then whenever you recursively compute something check the table first so you don't compute the same thing twice. This is called memoization. Dynamic programming is simple except you use a little foresight to compute things in the order in which they will be needed. For example, I would compute the base cases first i.e. the column T[ . , 0]. And then I would compute all values bordering this row and column based on the recursive definition.

Best approach to a variation of a bucketing problem

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Solving problems with formulas (GCD and LCM)

I'm having a hard time figuring out formulas that are used to solve a given problem more efficiently.
For example, a problem I have encountered was the following:
n children are placed in a circle. Every kth child is given chocolate until a child that has already been given chocolate is
selected again. Determine the nr number of children that don't
receive chocolate, given n and k.
Ex: n = 12, k = 9; nr will be 8.
This problem can be solved in 2 ways:
Creating a boolean array and traversing it until a child that hasn't been given chocolate is selected (not really efficient);
Using the formula: n - n / GCD(n, k);
How would I go about figuring out the 2nd way of solving it (the formula)?
Also, where can I practice this specific type of problem, where there is an obvious, slow way of solving it or an efficient one requiring you to figure out a formula?
Every problem is different, there is no rule to find a solution. You need to analyse the situation and reason about it. Mathematical training helps a lot.
For this concrete example you can proceed like this: number the children from 0 to n-1. If you start at 0, the children getting chocolate are exactly the ones with a number divisible by GCD(n, k). How many are there: n / GDC(n, k) therefore, how many don't get chocolate: n -n / GDC(n, k).

Finding the minimal coverage of an interval with subintervals

Suppose I have an interval (a,b), and a number of subintervals {(ai,bi)}i whose union is all of (a,b). Is there an efficient way to choose a minimal-cardinality subset of these subintervals which still covers (a,b)?
A greedy algorithm starting at a or b always gives the optimal solution.
Proof: consider the set Sa of all the subintervals covering a. Clearly, one of them has to belong to the optimal solution. If we replace it with a subinterval (amax,bmax) from Sa whose right endpoint bmax is maximal in Sa (reaches furthest to the right), the remaining uncovered interval (bmax,b) will be a subset of the remaining interval from the optimal solution, so it can be covered with no more subintervals than the analogous uncovered interval from the optimal solution. Therefore, a solution constructed from (amax,bmax) and the optimal solution for the remaining interval (bmax,b) will also be optimal.
So, just start at a and iteratively pick the interval reaching furthest right (and covering the end of previous interval), repeat until you hit b. I believe that picking the next interval can be done in log(n) if you store the intervals in an augmented interval tree.
Sounds like dynamic programming.
Here's an illustration of the algorithm (assume intervals are in a list sorted by ending time):
//works backwards from the end
int minCard(int current, int must_end_after)
{
if (current < 0)
if (must_end_after == 0)
return 0; //no more intervals needed
else
return infinity; //doesn't cover (a,b)
if (intervals[current].end < must_end_after)
return infinity; //doesn't cover (a,b)
return min( 1 + minCard(current - 1, intervals[current].start),
minCard(current - 1, must_end_after) );
//include current interval or not?
}
But it should also involve caching (memoisation).
There are two cases to consider:
Case 1: There are no over-lapping intervals after the finish time of an interval. In this case, pick the next interval with the smallest starting time and the longest finishing time. (amin, bmax).
Case 2: There are 1 or more intervals overlapping with the last interval you're looking at. In this case, the start time doesn't matter because you've already covered that. So optimize for the finishing time. (a, bmax).
Case 1 always picks the first interval as the first interval in the optimal set as well (the proof is the same as what #RafalDowgrid provided).
You mean so that the subintervals still overlap in such a way that (a,b) remains completely covered at all points?
Maybe splitting up the subintervals themselves into basic blocks associated with where they came from, so you can list options for each basic block interval accounting for other regions covered by the subinterval also. Then you can use a search based on each sub-subinterval and at least be sure no gaps are left.
Then would need to search.. efficiently.. that would be harder.
Could eliminate any collection of intervals that are entirely covered by another set of smaller number and work the problem after the preprocessing.
Wouldn't the minimal for the whole be minimal for at least one half? I'm not sure.
Found a link to a journal but couldn't read it. :(
This would be a hitting set problem and be NP_hard in general.
Couldn't read this either but looks like opposite kind of problem.
Couldn't read it but another link that mentions splitting intervals up.
Here is an available reference on Randomized Algorithms for GeometricOptimization Problems.
Page 35 of this pdf has a greedy algorithm.
Page 11 of Karp (1972) mentions hitting-set and is cited alot.
Google result. Researching was fun but I have to go now.

Resources