Proof of optimality of greedy algorithm for scheduling - algorithm

Unable to come up with a formal proof of optimality for algorithm A for the given problem. Have convinced myself that it is possible to execute some optimal schedule O in increasing order of the events' deadline. But don't know how to formally prove that extract_max operation converges to an optimal solution.
Problem
: Given a list of events with deadline date 'd' and duration 'l' days, provide algorithm to select events such that maximum number of them can be chosen. Of course, each event must be scheduled such that it finishes by deadline date 'd', it must run continuously for its duration 'l' days, and only one event can run at any given time.
**Greedy Algorithm A:**
Create max_heap S //schedule
Sort events by their deadline (increasing).
for(j=0;j<events.size();j++)
{
If you can incorporate event j into schedule S, do so.
Else, if(longest event in S > length of j) swap it with j.
}
Return S;
END

We can prove this by contradiction. Assume that the greedy choice were not part of the optimal solution; that is, if we consider the tasks sorted in ascending order of deadline, the optimal solution wouldn't include the one with the earliest deadline. Now, consider the task with earliest deadline in any hypothetical optimal solution. Either it overlaps with the greedy choice (in which case we might as well have chosen the greedy choice, since it finishes no later than the earliest task in the optimal solution, and cannot overlap with any earlier tasks in the optimal solution, since it was the earliest task); or else it doesn't overlap, in which case the optimal solution wasn't optimal (since we could also freely have included the greedy choice in it). In both cases, we have a contradiction (in the first, that the greedy solution couldn't have been picked; in the second, that the solution without the greedy choice was optimal) and so the assumption that the optimal choice doesn't contain the greedy choice was wrong; the optimal solution does include the greedy choice.

Related

Scheduling algorithm which minimizes maximum lateness is optimal (it finds a feasible schedule, if one exists)

I need to understand why if a scheduling algorithm A minimizes maximum lateness, then A is optimal (it finds a feasible schedule, if one exists).
I searched on the web for a long time, but without any success.
Can you tell me which ingredients I need to prove that please?
EDIT
The viceversa is not true: if A is optimal, it can happen that A is not able to minimize the maximum lateness.

Ordered Knapsack Problem Correctness/Proof

A thief is given the choice of n objects to steal, but only has one knapsack with a capacity of
taking M weight. Each object i has weight w_i, and profit p_i. Suppose he also knows the following:
the order of these items when sorted by increasing weight is the same as their order when sorted
by decreasing value. Give a greedy algorithm to find an optimal solution to this variant of the
knapsack problem. Prove the correctness and running time.
So the greedy algorithm I came up with was to sort the items based off of increasing weight which is also decreasing value. This means that the price per weight is in decreasing order. So the thief can take the highest valued item until the weight >= M. The running time would be O(n log n) since sorting takes O(n log n) and iterating through the list takes O(n). The part I am stuck on is the proof for correctness. Here is my proof so far:
Suppose there is an instance such that the solution stated above (referred to as GA) is not optimal. Let the optimal solution be referred to as OS, and the items taken by OS be sorted in increasing value. Since OS is more optimal than GA, then the profit earned from GA is less than or equal to the profit earned from OS. Since GA takes the item with the highest profit/weight ratio, then the first element, i, must be greater than or equal to the first element of OS. Because OS is more optimal, then there must exist a i that is greater than or equal to an item j in the set of GA. But because GA and OS are done on the same set, and GA is always taking the item with the highest profit/weight, there cannot be a i in OS that is greater than a j in GA.
Can anyone help with the proof? Thanks
Your approach to the solution is valid and the reasoning on the running time is correct. In the sequel, suppose that the input is "de-trivialized" in the sense that every occurring obejct actually fits into the knapsack and that it is impossible to select the entire input.
The sorting of the items that is generated by the sorting is both
decreasing in value
increasing weight
which makes it a special case of the general knapsack problem. The argumentation for the proof of correctnes is as follows. Let i' denote the breaking index which is the index of the first item in the sorted sequence which is rejected by the greedy algorithm. For clarity, call the corresponding object the breaking object. Note that
w_j > w_i' for each j > i'
holds, which means that that the greedy algorithm also rejects every object succeeding the breaking object (as it does not fit into the knapsack, just like the breaking object).
In total, the greedy algorithm selects a prefix of the sorted sequence; we aim at showing that any optimal solution (which we consider fixed in the sequel) is the same prefix.
Note that the optimal solution, as it it optimal, does not leave space for an additional object.
Aiming at a contradiction, let k be the minimal index which occurs in the greedy solution but not in the optimal solution. As it is impossible to select object k additionally into the optimal solution, there must (via minimality of k) be some item in the optimal solution with an index
k' > k
which permits an exchange of items in the optimal solution. As
w_k < w_k' and p_k > p_k'
hold, object k' can be replaced by object k in the optimal solution, which yields a solution with profit larger than the one of the optimal solution, which is a contradiction to its optimality.
Hence, there is no item in the greedy solution which is missing in the optimal solution, which means that the greedy solution is a subset of the optimal solution. On the other hand, the greedy solution is maximal with respect to inclusion, which means that the optimal solution cannot contain an item which is missing in the greedy solution.
Note that the greedy algorithm als is useful for the general knapsack problem; taking the better one of the greedy solution and an item with maximum profit yields an approximation algorithm with ratio 2.

Interview Scheduling Algorithm, 1 candidate and N interviewers

Problem
We have 'n' number of interviewers and their free-busy slots. We want to schedule interviews of a candidate with these 'n' interviewers one by one. Interviews can be in any order.
Approach 1: First come first serve
Start with the first interviewer
Take the interviewer and create an interview schedule based on its free-busy slot.
Then take the next interviewer and repeat step 1.
But with this approach, we can miss out on some cases.
Approach 2: Greedy algorithm
Sort the interviewers on the basis of the number of free slots available in their free-busy schedules.
Create an interview schedule for the interviewer with the least number of free-busy slots.
Repeat step 2 for the next interviewer.
Is there any more optimized/better approach for this problem?
We can re-cast this problem in terms of graph theory:
For each interviewer, generate a node. For each interview slot, generate a node too. Now create edges between interviewer nodes and slot nodes for those combinations where the interviewer is available for that slot. This will give you a bipartite graph.
The goal now is to find a maximum cardinality matching, i.e. a matching of interviewers to slots for as many interviewers as possible. Hence, if a full solution is not possible, it will even give you a partial schedule. Common algorithms for this are the Ford-Fulkerson Algorithm (or Edmonds-Karp Algorithm) and the Hopcroft-Karp Algorithm.

How would the time complexity change from brute force to recursion solution?

This is a problem I am working on
I implemented a brute force solution and a divide and conquer(recursive) solution. They both work with this input(from post 1 to 4)
For the brute force solution, what I did was generate all subsets of {1,2,3,4}, iterate through all the subsets and check only the ones that contained 1 and 4. I know that my brute force solution would run in O(2n) time because there are mathematically 2n subsets and I have to iterate over all of them.
For my recursive solution. what I did was break this problem down so that the only solutions that are generated/will be checked will be ones that contain 1 and 4. For example, from 1 you can go to 2,3,4 and if you go to 2, you can go to 3,4, and so on... till 4 is reached.
After analyzing both algorithms, I realized the only difference is that the divide and conquer wouldn't even check the subsets that didn't have 1 and 4 but the brute force would, say {2,3}. How would this difference affect the time complexity of the recursive algorithm. Would the recursive algorithm also run in O(2n) or something less because it checks less subsets?
Extra step for optimisation in recursive is using memoization. Reasoning goes - once you get and stop at post x it is always same cost till end, no matter where are you coming from.
It turns out that all it is required is array holding at index i lowest price to get to end. This can be populated going right to left in O(n²)time.
There exists a straightforward dynamic programming solution.
Consider each successively longer optimum canoe journey as the function of some shorter optimum canoe journey. I.e., the cheapest way to travel to post 4 is the cheapest way to travel to post 3 plus the best decision for the interval 3 to 4. Then consider the cheapest way to reach post 5 as a function of the previously calculated cheapest way to visit post 4.
Implement this recursion in code and your solution should be O(n^2), similar to the perennial CLRS favorite: the rod-cutting problem: http://www.geeksforgeeks.org/dynamic-programming-set-13-cutting-a-rod/
The divide-and-conquer approach would check singletons at which a rental was initiated, and then "merge" steps would optimize across the new doubleton/4ton/etc. It'll be much better than O(2^n) because you won't be recalculating all possibilities for every subpath, but it's a greedy algorithm--it makes an assumption about the answer before it has perfect information about the answer.
If you assume that the optimum decision occurs during some merge step of the subsets, you will necessarily miss cases where renting a boat at 1 and returning it at n is the best solution, but the runtime is O(n log n). But, if you check the input and optimize for every merge, you're back at O(2^n).

What is the difference between "hill climbing" and "greedy" algorithms?

Please explain the difference between "hill climbing" and "greedy" algorithms.
It seems both are similiar, and I have a doubts that "hill climbing" is an algorithm; it seems to be an optimization. Is this correct?
Hill-climbing and greedy algorithms are both heuristics that can be used for optimization problems. In an optimization problem, we generally seek some optimum combination or ordering of problem elements. A given combination or ordering is a solution. In either case, a solution can evaluated to compare it against other solutions.
In a hill-climbing heuristic, you start with an initial solution. Generate one or more neighboring solutions. Pick the best and continue until there are no better neighboring solutions. This will generally yield one solution. In hill-climbing, we need to know how to evaluate a solution, and how to generate a "neighbor."
In a greedy heuristic, we need to know something special about the problem at hand. A greedy algorithm uses information to produce a single solution.
A good example of an optimization problem is a 0-1 knapsack. In this problem, there is a knapsack with a certain weight limit, and a bunch of items to put in the knapsack. Each item has a weight and a value. The object is to maximize the value of the objects in the knapsack while keeping the weight under the limit.
A greedy algorithm would pick objects of highest density and put them in until the knapsack is full. For example, compared to a brick, a diamond has a high value and a small weight, so we would put the diamond in first.
Here is an example of where a greedy algorithm would fail: say you have a knapsack with capacity 100. You have the following items:
Diamond, value 1000, weight 90 (density = 11.1)
5 gold coins, value 210, weight 20 (density each = 10.5)
The greedy algorithm would put in the diamond and then be done, giving a value of 1000. But the optimal solution would be to include the 5 gold coins, giving value 1050.
The hill-climbing algorithm would generate an initial solution--just randomly choose some items (ensure they are under the weight limit). Then evaluate the solution--that is, determine the value. Generate a neighboring solution. For example, try exchanging one item for another (ensure you are still under the weight limit). If this has a higher value, use this selection and start over.
Hill climbing is not a greedy algorithm.
Yes you are correct. Hill climbing is a general mathematical optimization technique (see: http://en.wikipedia.org/wiki/Hill_climbing). A greedy algorithm is any algorithm that simply picks the best choice it sees at the time and takes it.
An example of this is making change while minimizing the number of coins (at least with USD). You take the most of the highest denomination of coin, then the most of the next highest, until you reach the amount needed.
In this way, hill climbing is a greedy algorithm.

Resources