Selecting k sub-posets - algorithm

I ran into the following algorithmic problem while experimenting with classification algorithms. Elements are classified into a polyhierarchy, what I understand to be a poset with a single root. I have to solve the following problem, which looks a lot like the set cover problem.
I uploaded my Latex-ed problem description here.
Devising an approximation algorithm that satisfies 1 & 2 is quite easy, just start at the vertices of G and "walk up" or start at the root and "walk down". Say you start at the root, iteratively expand vertexes and then remove unnecessary vertices until you have at least k sub-lattices. The approximation bound depends on the number of children of a vertex, which is OK for my application.
Does anyone know if this problem has a proper name, or maybe the tree-version of the problem? I would be interested to find out if this problem is NP-hard, maybe someone has ideas for a good NP-hard problem to reduce or has a polynomial algorithm to solve the problem. If you have both collect your million dollar price. ;)

The DAG version is hard by (drum roll) a reduction from set cover. Set k = 2 and do the obvious: condition (2) prevents us from taking the root. (Note that (3) doesn't actually imply (2) because of the lower bound k.)
The tree version is a special case of the series-parallel poset version, which can be solved exactly in polynomial time. Here's a recursive formula that gives a polynomial p(x) where the coefficient of xn is the number of covers of cardinality n.
Single vertex to be covered: p(x) = x.
Other vertex: p(x) = 1 + x.
Parallel composition, where q and r are the polynomials for the two posets: q(x) r(x).
Series composition, where q is the polynomial for the top poset and r, for the bottom: If the top poset contains no vertices to be covered, then p(x) = (q(x) - 1) + r(x); otherwise, p(x) = q(x).

Related

Can you help explain this Held-Karp TSP Pseudocode?

I am trying to implement the Held-Karp algorithm for the Traveling Salesman Problem by following this pseudocode:
(which I found here: https://en.wikipedia.org/wiki/Held%E2%80%93Karp_algorithm#Example.5B4.5D )
I can do the algorithm by hand but am having trouble actually implementing it in code. It would be great if someone could provide an easy-to-follow explanation.
I also don't understand this:
I thought this part was for setting the distance from the starting city to it's connected cities. If that was the case, wouldn't it be it C({1}, k) := d1,k and not C({k}, k) := d1,k? Am I just completely misunderstanding this?
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp is a dynamic programming approach.
In dynamic programming, you break the task into subtasks and use "dynamic function" to solve larger subtasks using already computed results of smaller subtasks, until you finally solve your task.
To understand a DP algorithm it's imperative to understand how it defines subtask and dynamic function.
In the case of Held-Karp, the subtask is following:
For a given set of vertices S and a vertex k   (1 ∉ S, k ∈ S)
C(S,k) is the minimal length of the path that starts with vertex 1, traverses all vertices in S and ends with the vertex k.
Given this subtask definition, it's clear why initialization is:
C({k}, k) := d(1,k)
The minimal length of the path from 1 to k, traversing through {k}, is just the edge from 1 to k.
Next, the "dynamic function".
A side note, DP algorithm could be written as top-down or bottom-up. This pseudocode is bottom-up, meaning it computes smaller tasks first and uses their results for larger tasks. To be more specific, it computes tasks in the order of increasing size of the set S, starting from |S|=1 and going up to |S| = n-1 (i.e. S containing all vertices, except 1).
Now, consider a task, defined by some S, k. Remember, it corresponds to path from 1, through S, ending in k.
We break it into a:
path from 1, through all vertices in S except k (S\k), which ends in the vertex m   (m ∈ S, m ≠ k):  C(S\k, m)
an edge from m to k
It's easy to see, that if we look through all possible ways to break C(S,k) like this, and find the minimal path among them, we'll have the answer for C(S, k).
Finally, having computed all C(S, k) for |S| = n-1, we check all of them, completing the cycle with the missing edge from k to 1:  d(1,k). The minimal cycle is the final result.
Regarding:
I have also heard that this algorithm does not perform very well past about 15-20 cities so for around 40 cities, what would be a good alternative?
Held-Karp has algorithmic complexity of θ(n²2n). 40² * 240 ≈ 1.75 * 1015 which, I would say, is unfeasible to compute on a single machine in reasonable time.
As David Eisenstat suggested, there are approaches using mixed integer programming that can solve this problem fast enough for N=40.
For example, see this blog post, and this project that builds upon it.

A greedy or dynamic algorithm to subset selection

I have a simple algorithmic question. I would be grateful if you could help me.
We have some 2 dimensional points. A positive weight is associated to them (a sample problem is attached). We want to select a subset of them which maximizes the weights and neither of two selected points overlap each other (for example, in the attached file, we cannot select both A and C because they are in the same row, and in the same way we cannot select both A and B, because they are in the same column.) If there is any greedy (or dynamic) approach I can use. I'm aware of non-overlapping interval selection algorithm, but I cannot use it here, because my problem is 2 dimensional.
Any reference or note is appreciated.
Regards
Attachment:
A simple sample of the problem:
A (30$) -------- B (10$)
|
|
|
|
C (8$)
If you are OK with a good solution, and do not demand the best solution - you can use heuristical algorithms to solve this.
Let S be the set of points, and w(s) - the weightening function.
Create a weight function W:2^S->R (from the subsets of S to real numbers):
W(U) = - INFINITY is the solution is not feasible
Sigma(w(u)) for each u in U otherwise
Also create a function next:2^S -> 2^2^S (a function that gets a subset of S, and returns a set of subsets of S)
next(U) = V you can get V from U by adding/removing one element to/from U
Now, given that data - you can invoke any optimization algorithm in the Artificial Intelligence book, such as Genetic Algorithm or Hill Climbing.
For example, Hill Climbing with random restarts, will be something like that:
1. best<- -INFINITY
2. while there is more time
3. choose a random subset s
4. NEXT <- next(s)
5. if max{ W(v) | for each v in NEXT} < W(s): //s is a local maximum
5.1. if W(s) > best: best <- W(s) //if s is better then the previous result - store it.
5.2. go to 2. //restart the hill climbing from a different random point.
6. else:
6.1. s <- max { NEXT }
6.2. goto 4.
7. return best //when out of time, return the best solution found so far.
The above algorithm is anytime - meaning it will produce better results if given more time.
This can be treated as a linear assignment problem, which can be solved using an algorithm like the Hungarian algorithm. The algorithm tries to minimize the sum of costs, so just negate your weights, and use them as the costs. The assignment of rows to columns will give you the subset of points that you need. There are sparse variants for cases where not every (row,column) pair has an associated point, but you can also just use a large positive cost for these.
Well you can think of this as a binary constraint optimization problem, and there are various algorithms. The easiest algorithm for this problem is backtracking and arc propogation. However, it takes exponential time in the worst case. I am not sure if there are any specific algorithms to take advantage of the geometrical nature of the problem.
This can be solved by a pretty straight forward dynamic programming approach with a exponential time complexity
s = {A, B, C ...}
getMaxSum(s) = max( A.value + getMaxSum(compatibleSubSet(s, A)),
B.value + getMaxSum(compatibleSubSet(s, B)),
...)
where compatibleSubSet(s, A) gets the subset of s that does not overlap with A
To optimize it, you can memorize the result for each subset
Some way to do it:
Write a function that generates subsets ordered from the subset off maximum weight to the subset off minimum weight while ignoring the constraints.
Then call this function repeatedly until a subset that honors the constraints pops up.
In order to improve the performance, you can write a not so dumb generator function that for instance honors the not-on-the-same-row constraint but that ignores the not-on-the-same-column one.

Maximum weighted bipartite matching, constraint: ordering of each graph is preserved

Let's say I have two sets: (n_1, n_2, ...) and (m_1, m_2, ...) and a matching function match(n, m) that returns a value from 0 to 1. I want to find the mapping between the two sets such that the following constraints are met:
Each element must have at most 1 matched element in the opposite set.
Unmatched elements will be paired with a dummy element at cost 1
The sum of the match function when applied to all elements is maximal
I am having trouble expressing this formally, but if you lined up each set parallel to each other with their original ordering and drew a line between matched elements, none of the lines would cross. E.x. [n_1<->m_2, n_2<->m_3] is a valid mapping but [n_1<->m_2, n_2<->m_1] is not.
(I believe the first three are standard weighted bipartite matching constraints, but I specified them in case I misunderstood weighted bipartite matching)
This is relatively straight forward to do with an exhaustive search in exponential time (with respect to the size of the sets), but I'm hoping a polynomial time (ideally O((|n|*|m|)^3) or better) solution exists.
I have searched a fair amount on the "assignment problem"/"weighted bipartite matching" and have seen variations with different constraints, but didn't find one that matched or that I was able to reduce to one with this added ordering constraint. Do you have any ideas on how I might solve this? Or perhaps a rough proof that it is not solvable in polynomial time (for my purposes, a reduction to NP-complete would also work)?
This problem has been studied under the name "maximum-weight non-crossing matching". There's a simple quadratic-time dynamic program. Let M(a, b) be the value of an optimal matching on n1, …, na and m1, …, mb. We have the recurrence
M(0, b) = -b
M(a, 0) = -a
M(a, b) = max {M(a - 1, b - 1) + match(a, b), M(a - 1, b) - 1, M(a, b - 1) - 1}.
By tracing back the argmaxes, you can recover an optimal solution from its value.
If match has significantly fewer than a quadratic number of entries greater than -1, there is an algorithm that runs in time linearithmic in the number of useful entries (Khoo and Cong, A Fast Multilayer General Area Router for MCM Designs).

Point covering problem

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset.
The algorithm I wrote for it was something to the effect of:
(say lines are stored as arrays with the left endpoint in position 0 and the right in position 1)
algorithm coverPoints(set[] m, set[][] n):
chosenLines = []
while m is not empty:
minX = min(m)
bestLine = n[0]
for i=1 to length of n:
if n[i][0] <= minX and n[i][1] > bestLine[1] then
bestLine = n[i]
add bestLine to chosenLines
for i=0 to length of m:
if m[i] <= bestLine[1] then delete m[i] from m
return chosenLines
I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did.
If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time?
Thanks
Your greedy algorithm IS correct.
We can prove this by showing that ANY other covering can only be improved by replacing it with the cover produced by your algorithm.
Let C be a valid covering for a given input (not necessarily an optimal one), and let S be the covering according to your algorithm. Now lets inspect the points p1, p2, ... pk, that represent the min points you deal with at each iteration step. The covering C must cover them all as well. Observe that there is no segment in C covering two of these points; otherwise, your algorithm would have chosen this segment! Therefore, |C|>=k. And what is the cost (segments count) in your algorithm? |S|=k.
That completes the proof.
Two notes:
1) Implementation: Initializing bestLine with n[0] is incorrect, since the loop may be unable to improve it, and n[0] does not necessarily cover minX.
2) Actually this problem is a simplified version of the Set Cover problem. While the original is NP-complete, this variation results to be polynomial.
Hint: first try proving your algorithm works for sets of size 0, 1, 2... and see if you can generalise this to create a proof by induction.

Multiple Constraint Knapsack Problem

If there is more than one constraint (for example, both a volume limit and a weight limit, where the volume and weight of each item are not related), we get the multiply-constrained knapsack problem, multi-dimensional knapsack problem, or m-dimensional knapsack problem.
How do I code this in the most optimized fashion? Well, one can develop a brute force recursive solution. May be branch and bound.. but essentially its exponential most of the time until you do some sort of memoization or use dynamic programming which again takes a huge amount of memory if not done well.
The problem I am facing is this
I have my knapsack function
KnapSack( Capacity, Value, i) instead of the common
KnapSack ( Capacity , i ) since I have upper limits on both of those. can anyone guide me with this? or provide suitable resources for solving these problems for reasonably large n
or is this NP complete ?
Thanks
Merge the constraints. Look at http://www.diku.dk/~pisinger/95-1.pdf
chapter 1.3.1 called Merging the Constraints.
An example is say you have
variable , constraint1 , constraint2
1 , 43 , 66
2 , 65 , 54
3 , 34 , 49
4 , 99 , 32
5 , 2 , 88
Multiply the first constraint by some big number then add it to the second constraint.
So you have
variable , merged constraint
1 , 430066
2 , 650054
3 , 340049
4 , 990032
5 , 20088
From there do whatever algorithm you wanted to do with one constraint. The main limiter that comes to mind with this how many digits your variable can hold.
As a good example would serve the following problem:
Given an undirected graph G having positive weights and N vertices.
You start with having a sum of M money. For passing through a vertex i, you must pay S[i] money. If you don't have enough money - you can't pass through that vertex. Find the shortest path from vertex 1 to vertex N, respecting the above conditions; or state that such path doesn't exist. If there exist more than one path having the same length, then output the cheapest one. Restrictions: 1
Pseudocode:
Set states(i,j) as unvisited for all (i,j)
Set Min[i][j] to Infinity for all (i,j)
Min[0][M]=0
While(TRUE)
Among all unvisited states(i,j) find the one for which Min[i][j]
is the smallest. Let this state found be (k,l).
If there wasn't found any state (k,l) for which Min[k][l] is
less than Infinity - exit While loop.
Mark state(k,l) as visited
For All Neighbors p of Vertex k.
If (l-S[p]>=0 AND
Min[p][l-S[p]]>Min[k][l]+Dist[k][p])
Then Min[p][l-S[p]]=Min[k][l]+Dist[k][p]
i.e.
If for state(i,j) there are enough money left for
going to vertex p (l-S[p] represents the money that
will remain after passing to vertex p), and the
shortest path found for state(p,l-S[p]) is bigger
than [the shortest path found for
state(k,l)] + [distance from vertex k to vertex p)],
then set the shortest path for state(i,j) to be equal
to this sum.
End For
End While
Find the smallest number among Min[N-1][j] (for all j, 0<=j<=M);
if there are more than one such states, then take the one with greater
j. If there are no states(N-1,j) with value less than Infinity - then
such a path doesn't exist.
Knapsack with multiple constraints is a packing problem. Read up. http://en.wikipedia.org/wiki/Packing_problem
There are greedy like heuristics that calculate an "efficiency" for each item, that run quickly and yield approximate solutions.
You can use a branch and bound algorithm. You can get an initial lower bound using a greedy like heuristic, which can be used to initialize the incumbent solution. You can calculate upper bounds for various sub-problems by considering each of the m constraints one at time (relaxing the other constraints in the problem), then use the lowest of these bounds as an upper bound for the original problem. This technique is due to Shih. However this technique probably won't work well if no particular constraint tends to dominate the solution, or if the initial solution from the greedy like heuristic is not close to the optimum.
There are better more modern algorithms which are harder to implement, see "multidimensional knapsack problem" papers by J Puchinger!
As you said vol and weight both are positive quantities, try to use that fact that weight always decreases:
knap[position][vol][t]
Now t=0 when wt is positive, t=1 when wt is negative.

Resources