Point covering problem - algorithm

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks

Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.

Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.

Related

Can you help explain this Held-Karp TSP Pseudocode?

I am trying to implement the Held-Karp algorithm for the Traveling Salesman Problem by following this pseudocode:
(which I found here: https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm#Example.5B4.5D )
I can do the algorithm by hand but am having trouble actually implementing it in code. It would be great if someone could provide an easy-to-follow explanation.
I also don't understand this:
I thought this part was for setting the distance from the starting city to it's connected cities. If that was the case, wouldn't it be it C({1}, k) := d1,k and not C({k}, k) := d1,k? Am I just completely misunderstanding this?
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp is a dynamic programming approach.
In dynamic programming, you break the task into subtasks and use "dynamic function" to solve larger subtasks using already computed results of smaller subtasks, until you finally solve your task.
To understand a DP algorithm it's imperative to understand how it defines subtask and dynamic function.
In the case of Held-Karp, the subtask is following:
For a given set of vertices S and a vertex k   (1 ∉ S, k ∈ S)
C(S,k) is the minimal length of the path that starts with vertex 1, traverses all vertices in S and ends with the vertex k.
Given this subtask definition, it's clear why initialization is:
C({k}, k) := d(1,k)
The minimal length of the path from 1 to k, traversing through {k}, is just the edge from 1 to k.
Next, the "dynamic function".
A side note, DP algorithm could be written as top-down or bottom-up. This pseudocode is bottom-up, meaning it computes smaller tasks first and uses their results for larger tasks. To be more specific, it computes tasks in the order of increasing size of the set S, starting from |S|=1 and going up to |S| = n-1 (i.e. S containing all vertices, except 1).
Now, consider a task, defined by some S, k. Remember, it corresponds to path from 1, through S, ending in k.
We break it into a:
path from 1, through all vertices in S except k (S\k), which ends in the vertex m   (m ∈ S, m ≠ k):  C(S\k, m)
an edge from m to k
It's easy to see, that if we look through all possible ways to break C(S,k) like this, and find the minimal path among them, we'll have the answer for C(S, k).
Finally, having computed all C(S, k) for |S| = n-1, we check all of them, completing the cycle with the missing edge from k to 1:  d(1,k). The minimal cycle is the final result.
Regarding:
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp has algorithmic complexity of θ(n²2n). 40² * 240 ≈ 1.75 * 1015 which, I would say, is unfeasible to compute on a single machine in reasonable time.
As David Eisenstat suggested, there are approaches using mixed integer programming that can solve this problem fast enough for N=40.
For example, see this blog post, and this project that builds upon it.

Picking the "spread" from the points on a line

I'm facing an algorithmic problem described as follows: Given a line from 0 to N (really big N), a list of X points on said line, and a number Z (0<=Z<=X) pick Z points from X to maximize the distance between two closest points. The brute-force solution in O(n^2) doesn't seem that difficult but I'm looking for something more sophisticated that can be done in O(n log n) time. Any clues, solutions, advice is very appreciated.
Edit: Answering the question in the first post-it is the minimal distance (between the two closest points) that has to be maximized.
One easy approach is O(XlogN).
First, sort the points.
Next observe that if you already know the minimum distance (call it d) between the points, it's O(X) to see if there's a way of picking Z points all of which are at least distance d apart: take the left-most element, then the next that's at least distance d away, then the next that's at least distance d away from that, and so on. If by the time you've got to the end of the array you have at least Z points, then you have a solution, and if you don't, there is no solution.
Now, you can use a binary search on [0, N] to find the largest d with a solution.
The sort is O(XlogX), the binary search takes O(logN) trials, and each is O(X). Overall, that's O(XlogX + XlogN), but since N >= X that simplifies to O(XlogN).

Variation of knapsack with subexponential growth

So I've come across an interesting problem I'd like to solve. It came across when I was trying to solve a game with nondeterminstic transitions. If you've ever heard of this problem or know if it has a name/papers written about it let me know! Here it is.
Given n boxes and m elements where n1 has i1 elements, n2 has i2 elements, etc (i.e i1 + i2 + ... + in = m). Each element has a weight w and value v. Find a selection of exactly one element from each n boxes (solution size = n) such that the value is maximized and the weight <= k (some input parameter).
The first thing I noticed is there are i1*i2...*in solutions. This is less than m choose n, which is less than 2^m, so does this mean the problem is in P (sorry my math is a little fuzzy)? Does anyone have any idea of an algorithm that does not involve iterating over every solution? Approximations are fine!
Edit: Okay so this problem is actually identical to the knapsack problem, so it's NP-hard. Let the boxes have two elements each, one of zero size and zero value, and one of nonzero size and nonzero value. This is identical to knapsack. Can anyone think of a clever pseudopolynomial time algorithm/conversion to knapsack?
This looks close enough to http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem that almost the same definition of m[i, w] as given there will work - let m[i, w] be the maximum value that can be obtained with weight <= w using items up to i. The only difference is that at each stage instead of considering either taking an item or not, consider which of the possible items at each stage you should take.

Skipping more than one node in Floyd's cycle finding Algorithm

Today I was reading Floyd's algorithm of detecting loop in a linked list.
I was just wondering that won't it be better if we skip more than one node, (say 2)
for faster loop detection?
For example:
fastptr=fastptr->next->next->next.
Note that the side effects will be taken into account while changing fastptr.
Your suggestion still is correct, but it doesn't change the speed of algorithm. If you take a look at tortoise and hare algorithm in wiki:
The key insight in the algorithm is that, for any integers i ≥ μ and k
≥ 0, xi = xi + kλ, where λ is the length of the
loop to be found and μ is start position of loop. In particular,
whenever i = kλ ≥ μ, it follows that xi =
x2i.
In the bold part, you could also say xi = x3i, or any other coefficient, but key insight is finding the i, it's not important with how many jumps you will find it, and order of algorithm, depends to the location of i.

an algorithm to find the minimum size set cover for the Set-cover problem

In the Set Covering problem, we are given a universe U, such that |U|=n, and sets S1,……,Sk are subsets of U. A set cover is a collection C of some of the sets from S1,……,Sk whose union is the entire universe U.
I'm trying to come up with an algorithm that will find the minimum number of set cover so that I can show that the greedy algorithm for set covering sometimes finds more sets.
Following is what I came up with:
repeat for each set.
1. Cover<-Seti (i=1,,,n)
2. if a set is not a subset of any other sets, then take take that set into cover.
but it's not working for some instances.
Please help me figure out an algorithm to find the minimum set cover.
I'm still having problem find this algorithm online. Anyone has any suggestion?
Set cover is NP-hard, so it's unlikely that there'll be an algorithm much more efficient than looking at all possible combinations of sets, and checking if each combination is a cover.
Basically, look at all combinations of 1 set, then 2 sets, etc. until they form a cover.
EDIT
This is an example pseudocode. Note that I do not claim that this is efficient. I simply claim that there isn't a much more efficient algorithm (algorithms will be worse than polynomial time unless something really cool is discovered)
for size in 1..|S|:
for C in combination(S, size):
if (union(C) == U) return C
where combination(K, n) returns all possible sets of size n whose elements come from K.
EDIT
However, I'm not too sure why you need an algorithm to find the minimum. In the question you state that you want to show that the greedy algorithm for set covering sometimes finds more sets. But this is easily achieved via a counterexample (and a counterexample is shown in the wikipedia entry for set cover). So I am quite puzzled.
EDIT
A possible implementation of combination(K, n) is:
if n == 0: return [{}] //a list containing an empty set
r = []
for k in K:
K = K \ {k} // remove k from K.
for s in combination(K, n-1):
r.append(union({k}, s))
return r
But in combination with the cover problem, one probably wants to perform the test of coverage from the base case n == 0 instead. Well.
Try Donald E. Knuth algorithm-X for exact set coverage, using a sparse matrix. Must be adapted a little to solve minimum set cover problems also.

Resources