Let's say we have a set of intervals
[s1,e1],[s2,e2]...[sn,en]
I would like to find the subset of non-overlapping intervals and has the maximum aggregate time.
Actually I'm looking for a greedy solution. Does it exist or not?
"Greedy" is not a formal term, but for the purpose of this question, let's define the class of greedy algorithms to be those that impose an a priori total order on intervals (i.e., independent of the input) and repeatedly extend the partial solution by the maximum available interval. Consider the inputs
[0,2],[1,4],[3,5]
[0,2],[1,4]
[1,4],[3,5].
There are three possibilities for the maximum interval among [0,2],[1,4],[3,5]. If [0,2] or [3,5] is maximum, then the greedy algorithm answers incorrectly for the second or third input respectively. If [1,4] is maximum, then the greedy algorithm answers incorrectly for the first input.
Related
A thief is given the choice of n objects to steal, but only has one knapsack with a capacity of
taking M weight. Each object i has weight w_i, and profit p_i. Suppose he also knows the following:
the order of these items when sorted by increasing weight is the same as their order when sorted
by decreasing value. Give a greedy algorithm to find an optimal solution to this variant of the
knapsack problem. Prove the correctness and running time.
So the greedy algorithm I came up with was to sort the items based off of increasing weight which is also decreasing value. This means that the price per weight is in decreasing order. So the thief can take the highest valued item until the weight >= M. The running time would be O(n log n) since sorting takes O(n log n) and iterating through the list takes O(n). The part I am stuck on is the proof for correctness. Here is my proof so far:
Suppose there is an instance such that the solution stated above (referred to as GA) is not optimal. Let the optimal solution be referred to as OS, and the items taken by OS be sorted in increasing value. Since OS is more optimal than GA, then the profit earned from GA is less than or equal to the profit earned from OS. Since GA takes the item with the highest profit/weight ratio, then the first element, i, must be greater than or equal to the first element of OS. Because OS is more optimal, then there must exist a i that is greater than or equal to an item j in the set of GA. But because GA and OS are done on the same set, and GA is always taking the item with the highest profit/weight, there cannot be a i in OS that is greater than a j in GA.
Can anyone help with the proof? Thanks
Your approach to the solution is valid and the reasoning on the running time is correct. In the sequel, suppose that the input is "de-trivialized" in the sense that every occurring obejct actually fits into the knapsack and that it is impossible to select the entire input.
The sorting of the items that is generated by the sorting is both
decreasing in value
increasing weight
which makes it a special case of the general knapsack problem. The argumentation for the proof of correctnes is as follows. Let i' denote the breaking index which is the index of the first item in the sorted sequence which is rejected by the greedy algorithm. For clarity, call the corresponding object the breaking object. Note that
w_j > w_i' for each j > i'
holds, which means that that the greedy algorithm also rejects every object succeeding the breaking object (as it does not fit into the knapsack, just like the breaking object).
In total, the greedy algorithm selects a prefix of the sorted sequence; we aim at showing that any optimal solution (which we consider fixed in the sequel) is the same prefix.
Note that the optimal solution, as it it optimal, does not leave space for an additional object.
Aiming at a contradiction, let k be the minimal index which occurs in the greedy solution but not in the optimal solution. As it is impossible to select object k additionally into the optimal solution, there must (via minimality of k) be some item in the optimal solution with an index
k' > k
which permits an exchange of items in the optimal solution. As
w_k < w_k' and p_k > p_k'
hold, object k' can be replaced by object k in the optimal solution, which yields a solution with profit larger than the one of the optimal solution, which is a contradiction to its optimality.
Hence, there is no item in the greedy solution which is missing in the optimal solution, which means that the greedy solution is a subset of the optimal solution. On the other hand, the greedy solution is maximal with respect to inclusion, which means that the optimal solution cannot contain an item which is missing in the greedy solution.
Note that the greedy algorithm als is useful for the general knapsack problem; taking the better one of the greedy solution and an item with maximum profit yields an approximation algorithm with ratio 2.
I have a specific sub-problem for which I am having trouble coming up with an optimal solution. This problem is similar to the subset sum group of problems as well as space filling problems, but I have not seen this specific problem posed anywhere. I don't necessarily need the optimal solution (as I am relatively certain it is NP-hard), but an effective and fast approximation would certainly suffice.
Problem: Given a list of positive valued integers find the fewest number of disjoint subsets containing the entire list of integers where each subset sums to less than N. Obviously no integer in the original list can be greater than N.
In my application I have many lists and I can concatenate them into columns of a matrix as long as they fit in the matrix together. For downstream purposes I would like to have as little "wasted" space in the resulting ragged matrix, hence the space filling similarity.
Thus far I am employing a greedy-like approach, processing from the largest integers down and finding the largest integer that fits into the current subset under the limit N. Once the smallest integer no longer fits into the current subset I proceed to the next subset similarly until all numbers are exhausted. This almost certainly does not find the optimal solution, but was the best I could come up with quickly.
BONUS: My application actually requires batches, where there is a limit on the number of subsets in each batch (M). Thus the larger problem is to find the fewest batches where each batch contains M subsets and each subset sums to less than N.
Straight from Wikipedia (with some bold amendments):
In the bin packing problem, objects [Integers] of different volumes [values] must be
packed into a finite number of bins [sets] or containers each of volume V [summation of the subset < V] in
a way that minimizes the number of bins [sets] used. In computational
complexity theory, it is a combinatorial NP-hard problem.
https://en.wikipedia.org/wiki/Bin_packing_problem
As far as I can tell, this is exactly what you are looking for.
So given a set of intervals, finding a subset of non-overlapping intervals that has the maximal number of intervals can be done in linear time after sorting the intervals by their right end-points. However, what if we want to output ALL solution subsets with a maximal number of non-overlapping intervals? The running time should be output-sensitive, because for n intervals the number of optimal solutions could be exponential, e.g. as high as O(sqrt(n)^sqrt(n)). So if there are S optimal solutions, can they be enumerated in time linearly proportional to the size of S (perhaps with polynomial dependence on n as well)?
Run the standard dynamic programming algorithm for largest independent set in an interval graph. This tells you what the maximum number is. It is straightforward to modify this algorithm to track the number of ways to get said maximum number.
For every interval I, compile a list of all of the later intervals that do not overlap I from which you can form an independent set of maximum size.
Now run a straightforward recursive enumeration of all independent sets using the information compiled in the last paragraph.
If the size of the maximum independent set is h, this will take O(hS + n^2) time; the n^2 is for the DP and the hS is for the recursion and output.
Consider the following numbers:
9,44,32,12,7,45,31,98,35,37,41,8,20,27,83,64,61,28,39,93,29,92,17,13,14,55,21,66,72,23,73,99,1,2,88,77,3,65,83,84,62,5,11,74,68,76,78,67,75,69,70,22,71,24,25,26.
I try to implement an algorithm to remove the least amount of numbers in the list to make the sequence
a) increasing order
b) decreasing order
I already tried with the shortest and longest subsecuence. Dont want the code, only the explanation or a pseudo code,i can't understand how to solve the problem thanks!
This is a lightly camouflaged Longest increasing (decreasing) subsequence problem. The algorithm to solving your problem is as follows:
Find the longest increasing (decreasing) subsequence in the array
Remove all elements that do not belong to the longest increasing subsequence.
Since the increasing/decreasing subsequence is longest, the amount of numbers that you will remove is the smallest.
Wikipedia article has a nice pseudocode for solving the LIS/LDS problem. You can substitute binary search for a linear one unless the original sequence is 1000+ elements long.
Since it has already been mentioned, I will add my 2 cents. Most probably this will be asked in interview in those circumstances, the running time(efficiency) is a major concern. So the same problem can be tackled with many algorithms depending on the time they take to execute.
The best known algorithm is of order O(nlogn). Other important one can like Dynamic programming paradigm can also be applied to yield a solution of O(n^2).
O(n^2) here
O(nlogn) here
Please explain the difference between "hill climbing" and "greedy" algorithms.
It seems both are similiar, and I have a doubts that "hill climbing" is an algorithm; it seems to be an optimization. Is this correct?
Hill-climbing and greedy algorithms are both heuristics that can be used for optimization problems. In an optimization problem, we generally seek some optimum combination or ordering of problem elements. A given combination or ordering is a solution. In either case, a solution can evaluated to compare it against other solutions.
In a hill-climbing heuristic, you start with an initial solution. Generate one or more neighboring solutions. Pick the best and continue until there are no better neighboring solutions. This will generally yield one solution. In hill-climbing, we need to know how to evaluate a solution, and how to generate a "neighbor."
In a greedy heuristic, we need to know something special about the problem at hand. A greedy algorithm uses information to produce a single solution.
A good example of an optimization problem is a 0-1 knapsack. In this problem, there is a knapsack with a certain weight limit, and a bunch of items to put in the knapsack. Each item has a weight and a value. The object is to maximize the value of the objects in the knapsack while keeping the weight under the limit.
A greedy algorithm would pick objects of highest density and put them in until the knapsack is full. For example, compared to a brick, a diamond has a high value and a small weight, so we would put the diamond in first.
Here is an example of where a greedy algorithm would fail: say you have a knapsack with capacity 100. You have the following items:
Diamond, value 1000, weight 90 (density = 11.1)
5 gold coins, value 210, weight 20 (density each = 10.5)
The greedy algorithm would put in the diamond and then be done, giving a value of 1000. But the optimal solution would be to include the 5 gold coins, giving value 1050.
The hill-climbing algorithm would generate an initial solution--just randomly choose some items (ensure they are under the weight limit). Then evaluate the solution--that is, determine the value. Generate a neighboring solution. For example, try exchanging one item for another (ensure you are still under the weight limit). If this has a higher value, use this selection and start over.
Hill climbing is not a greedy algorithm.
Yes you are correct. Hill climbing is a general mathematical optimization technique (see: http://en.wikipedia.org/wiki/Hill_climbing). A greedy algorithm is any algorithm that simply picks the best choice it sees at the time and takes it.
An example of this is making change while minimizing the number of coins (at least with USD). You take the most of the highest denomination of coin, then the most of the next highest, until you reach the amount needed.
In this way, hill climbing is a greedy algorithm.