Interval schedule maximization when encountering intervals with equivalent end points - algorithm

In interval scheduling, the greedy solution which maximizes the schedule that contains the largest number of "compliant intervals" involves initially sorting the list of intervals in ascending order by the end-times/point of each interval.
What confuses me, is if two or more intervals have the same ending time.
When performing the initial sort, should one base the sort of the sub-range of intervals on start time?
and if so should it be in ascending or descending order?

It doesn't matter, with the greedy based solution. All you want to optimize is to maximize no.of jobs completed. Job length is considered for finding overlap s and elimination, not for the choice.
Given list of jobs ending in same time, you will only end up in choosing 1 of them, because they all overlap. Also, there is no negative impact of choosing any of them.
Depending on application, you may want to choose the longest job or shortest job.
Hope it helps!

Related

Interval Scheduling with with start and end times are intervals themselves

So the classic interval scheduling problem is given a bunch of intervals [a_i, b_i] where a_i is the starting time of the interval and b_i is the ending time of the interval, find the most amount of non-overlapping intervals one can gather in a set. This problem is easy, one can use the greedy algorithm and that makes sense.
However, what if the starting time of the interval is an interval itself as well as the ending time of the interval as well. Essentially you have a list of intervals which contain intervals themselves. The motivation behind this is that often times, when one does a "task", they can start the task in an interval of time, and then can end the task in some interval of time. How would one approach/modify the interval scheduling problem to solve something of this nature.
Another way to look at this problem is that you have a bunch of "interval pairs: [a_i1, b_i1], [a_i2, b_i2]" and you want to apply the interval scheduling algorithm to these interval pairs. Same question, but perhaps a better way of looking at the problem. Can someone provide some help?

Mutually Overlapping Subset of Activites

I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.

greedy algorithm, scheduling

I am trying to understand how Greedy Algorithm scheduling problem works.
So I've been reading and googling for a while since I could not understand Greedy algorithm scheduling problem.
We have n jobs to schedule on a single resource. The job (i) has a requested start time s(i) and finish time f(i).
There are some greedy ideas which we select...
Accept in increasing order of s ("earliest start time")
Accept in increasing order of f - s ("shortest job time")
Accept in increasing order of number of conflicts ("fewest conflicts")
Accept in increasing order of f ("earliest finish time")
And the book says the last one, accept in increasing order of f will always gives an optimal solution.
However it did not mention why it always gives optimal solution and why other 3 will not give optimal solution.
They provided the figure that says why other three will not provide optimal solution but I could not understand what it means.
Since I have low reputation, I can not post any image so I will try to draw it.
 |---| |---| |---|
|-------------------------|
increasing order of s
underestimated solution
|-----------| |-----------|
   |-----|
increasing order of f-s
underestimated solution
|----|  |----| |----|  |----|
 |-----| |-----| |-----|
 |-----|    |-----|
 |-----|    |-----|
increasing order of number of conflicts.
underestimated solution
This is what it looks like and I don't see why this is a counterexample of each scenario.
If anyone can explain why each greedy idea does/ does not work, it will be very helpful.
Thank you.
I think I can explain this.
Lets say, we have n jobs, start times as s[1..n] and finish times as f[1..n]. So if we sort it according to finish times, then, we will always be able to complete most number of tasks. Lets see, how.
If a job is finishing earlier (even if it started later in the series, a short job), then, we always have more time for later jobs. Lets assume, we have other jobs that we could start/complete in this interval so that our number of tasks could increase. Now, this is not actually possible as if any task completed before this, then that would be the one with earliest finish time so we would be working on that one. And, if any task has not been completed till now (but has started), then if we selected that, we would not have completed any task but now we actually have done one at least. So, in any case, this is the most optimal choice.
There are many possible solutions with maximum number of tasks that can be done in an interval, EFT gives one such solution. But it is always the max number possible.
I hope I could explain it well.
Since #vish4071 has already explained why selecting earliest finish time will lead to optimal solution, I'll only explain the counterexamples. Task [a,b] starts at a and ends at b. I'll use the counterexamples you have provided.
Earliest start time
Suppose tasks [1,10], [2,3], [4,5], [6,7]. The earliest start time strategy will choose [1,10] and then refuse the other 3, since they all collide with the first one. Yet we can see that [2,3], [4,5], [6,7] is the optimal solution, so earliest start time strategy will not always yield the optimal result.
Shortest execution time
Suppose tasks [1,10], [11,20], [9,12]. This strategy would choose [9,12] and then reject the other two, but optimal solution is [1,10], [11,20]. Therefore, shortest execution time strategy will not always lead to optimal result.
Least amount of collisions
This strategy seems promising, but your example with 11 task proves it not to be optimal. Suppose tasks: [1,4], 3x[3,6], [5,8], [7,10], [9,12], 3x[11,14] and [13, 16]. [7,10] has only 2 collisions with other tasks, which is less than any other task, so it would be selected first by the least amount of collisions strategy. Then [1,4] and [13, 16] would be selected, and all the other tasks rejected because they collide with already selected tasks. That is 3 tasks, however 4 tasks can be selected without collision: [1,4], [5,8], [9,12] and [13, 16].
You can also see that the earliest finish time strategy will always choose the optimal solution in these examples. Note that more than one optimal solution can exist with same number of selected tasks. In such case, earliest finish time strategy will always choose one of them.

Optimal scheduling system in terms of lowest waiting time for users and maximum users in waiting intervals

I'm trying to look for an algorithm to optimally schedule events, given a set of timeslots. Each event (a,b) is a meeting between 2 users and each timeslot is a fixed amount of time.
eg. a possible set of events can be: [(1,2),(1,3),(4,2),(4,3),(3,1)] with 4 possible timeslots. All events have to be scheduled in a certain timeslot, however, waiting time per user should be minimised (time between two events) and at the same time, the amount of users in a waiting timeslot should be maximised.
Do you know of any possible algorithm or heuristic for this problem?
Greetings
Sound like a combination of Job Shop Scheduling (video) and Meeting Scheduling (video) with a fairness constraint. Both are NP-complete.
Use a simple greedy Construction Heuristic (such as First Fit Decreasing) with Local Search (such as Tabu Search). For these use cases, Local Search leads to better results than Genetic Algorithms, as well be more scalable (see research competitions for proof).
For the fairness constraint "waiting time per user should be minimised", penalize the waiting time squared:
You could get a maybe-better-than-random solution with a simple approach:
sort each pair with the lower-numbered user first
sort the list on first-user (primary key), second-user (secondary sort key)
schedule meetings in that order, with any independent meetings scheduled in parallel. (Like a CPU instruction scheduler looking ahead for independent instructions. Any given user will still have their meetings in the listed order. You're just finding allowed overlaps here.)
I'm unfortunately not an expert on trying to reduce problems to known NP problems like the travelling salesman problem. It's possible there's a polynomial-time solution to this, but it's not obvious to me. If nobody comes up with one, then read on:
If the list isn't too big, you could brute-force check every permutation. For each permutation, schedule all the meetings (with independent meetings in parallel), then sum the last-first meeting times for every user. That's the score for that permutation. Take the permutation with the lowest score.
Instead of brute force, you could use a random start point and evolve towards a local minimum. Phylogenetics software like phyml uses this technique to search for maximum-likelihood evolutionary tree, which has a similarly factorial search space.
Start with a random permutation and evaluate its score
make some random changes, then evaluate the score
if it's not an improvement, try another permutation until you find one that is. (maybe with a mechanism to remember that you already tried this modification to the starting tree).
Repeat from 2 with this new tree, until you've converged on a local minimum.
Repeat from 1 for some other starting guesses, and take the best final result.
If you can efficiently figure out the score change from a swap, that will be a big speedup over re-computing the score for a permutation from scratch.
This is similar to a genetic algorithm. You should read up on that and see if any of those ideas can work.

How would I optimally pack a set of tasks into a minimal number of time slots?

I have a set of independent tasks and time slots of the same fixed length , each task of arbitrary length .
How would I distribute the tasks across time slots while minimizing ?
You are looking at bin packing problem which is NP-complete. There however exist good approximate polynomial solutions.
Refer to this link: http://en.wikipedia.org/wiki/Bin_packing_problem
For the record, here's the FFD (first fit decreasing) algorithm, which is fast, easy to implement and pretty close to the optimal solution.
Sort all the tasks in descending order of cost, this will be our pending list.
Pick the first remaining task on the pending list.
Find the leftmost slot that it fits and put it in there.
Remove the task from the pending list.
While the pending list isn't empty, repeat from 2.
The worst-case performance of this algorithm is 11/9*OPT + 1, that is at worst it will require about 22% more slots than the theoretical minimum.

Resources