Optimization problem - finding a maximum - algorithm

I have a problem at hand which can be reduced to something like this :
Assume a bunch of random points in a two-dimension plane X-Y where for each Y, there could be multiple points on X and for each X, there could be multiple points on Y.
Whenever a point (Xi,Yi) is chosen, no other point with X = Xi OR Y = Yi can be chosen. We have to choose the maximum number of points.

This can be reduced to simple maximum flow problem. If you have a point (xi, yi), in graph it should be represented with path from source S to point xi, from xi to yi and from yi to the last node (sink) T.
Note, if we have points (2, 2) and (2, 5), there's still only one path from S to x2. All paths (edges) have capacity 1.
The flow in this network is the answer.
about general problem
http://en.wikipedia.org/wiki/Max_flow
update
I don't have graphic editor right now to visualise problem, but you can easily draw example by hand. Let's say, points are (3, 3) (3, 5) (2, 5)
Then edges (paths) would be
S -> x2, S -> x3
y3 -> T, y5 -> T
x3 -> y3, x3 -> y5, x2 -> y5
Flow: S -> x2 -> y5 -> T and S -> x3 -> y3 -> T
The amount of 'water' going from source to sink is 2 and so is the answer.
Also there's a tutorial about max flow algorithms
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=maxFlow

Isn't this just the Hungarian algorithm?
Create an n×n matrix, with 0 at marked vertices, and 1 at unmarked vertices. The algorithm will choose n vertices, one for each row and column, which minimizes their sum. Simply count all the chosen vertices which equal 0, and you have your answer.
from munkres import Munkres
matrix = [[0, 0, 1],
[0, 1, 1],
[1, 0, 0]]
m = Munkres()
total = 0
for row, column in m.compute(matrix):
if matrix[row][column] == 0:
print '(%i, %i)' % (row, column)
total += 1
print 'Total: %i' % total
This runs in O(n3) time, where n is the number of rows in the matrix. The maximum flow solution runs in O(V3), where V is the number of vertices. As long as there are more than n chosen intersections, this runs faster; in fact, it runs orders of magnitude faster, as the number of chosen vertices goes up.

Different solution. It turns out that there's a lot of symmetry, and the answer is a lot simpler than I originally thought. The maximum number of things you can ever do is the minimum of the unique X's and unique Y's, which is O(NlogN) if you just want the result.
Every other shape is equivalent to a rectangle that contains the points, because it doesn't matter how many points you pull from the center of a rectangle, the order will never matter (if handled as below). Any shape that you pluck a point from now has one less unique X and one less unique Y, just like a rectangle.
So the optimal solution has nothing to do with connectedness. Pick any point that is on the edge of the smallest dimension (i.e. if len(unique-Xs)>len(unique-Ys), pick anything that has either maximum or minimum X). It doesn't matter how many connections it has, just which dimension is biggest, which can easily be done while looking at the sorted-unique lists created above. If you keep a unique-x and unique-y counter and decrement them when you delete all the unique nodes in that element of the list, then each deletion is O(1) and recalculating the lengths is O(1). So repeating this N times is at worst O(N), and the final complexity is O(NlogN) (due solely to the sorting).
You can pick any point along the shortest edge because:
if there's only one on that edge, you better pick it now or something else will eliminate it
if there's more than one on that edge, who cares, you will eliminate all of them with your pick anyways
Basically, you're maximizing "max(uniqX,uniqY)" at each point.
Update: IVlad caught an edge case:
If the dimensions are equal, take the edge with the least points. Even if they aren't equal, take the top or bottom of the unique-stack you're eliminating from that has the least points.
Case in point:
Turn 1:
Points: (1, 2); (3, 5); (10, 5); (10, 2); (10, 3)
There are 3 unique X's: 1, 3, 10
There are 3 unique Y's: 2, 3, 5
The "bounding box" is (1,5),(10,5),(10,2),(1,2)
Reaction 1:
The "outer edge" (outermost uniqueX or uniqueY lists of points) that has the least points is the left. Basically, look at the sets of points in x=1,x=10 and y=2,y=5. The set for x=1 is the smallest: one point. Pick the only point for x=1 -> (1,2).
That also eliminates (10,2).
Turn 2:
Points: (3, 5); (10, 5); (10, 3)
There are 2 unique X's: 3, 10
There are 2 unique Y's: 3, 5
The "bounding box" is (3,5),(10,5),(10,3),(3,3)
Reaction 2:
The "edge" of the bounding box that has the least is either the bottom or the left. We reached the trivial case - 4 points means all edges are the outer edges. Eliminate one. Say (10,3).
That also eliminates (10,5).
Turn 3:
Points: (3, 5)
Reaction 3:
Remove (3,5).

For each point, identify the number of other points (N) that would be disqualified by the selection of that point (i.e. the ones with the same X or Y values). Then, iterate over the non-disqualified points in order of increasing number of N disqualified points. When you are finished, you will have removed the maximum number of points.

The XY plane is a red herring. Phrase it as a set of elements, each of which has a set of mutually exclusive elements.
The algorithm then becomes a depth-first search. At each level, for each candidate node, calculate the set of excluded elements, the union of currently excluded elements with the elements excluded by the candidate node. Try candidate nodes in order of fewest excluded elements to most. Keep track of the best solution so far (the fewest excluded nodes). Prune any subtrees that are worse than the current best.
As a slight improvement at the cost of possible missed solutions, you can use Bloom filters for keeping track of the excluded sets.

This looks like a problem that can be solved with dynamic programming. Look into the algorithms for longest common substring, or the knapsack problem.

Based on a recommendation from IVlad, I looked into the Hopcroft–Karp algorithm. It's generally better than both the maximum flow algorithm and the Hungarian algorithm for this problem, often significantly. Some comparisons:
In general:
Max Flow: O(V3) where V is the number of vertices.
Hungarian: O(n3) where n is the number of rows in the matrix
Hopcroft-Karp: O(V √2V) where V is the number of vertices.
For a 50×50 matrix, with 50% chosen vertices:
Max Flow: 1,2503 = 1,953,125,000
Hungarian: 503 = 125,000
Hopcroft-Karp: 1,250 √2,500 = 62,500
For a 1000×1000 matrix, with 10 chosen vertices:
Max Flow: 103 = 1,000
Hungarian: 10003 = 1,000,000,000
Hopcroft-Karp: 10 √20 ≅ 44.7
The only time the Hungarian algorithm is better is when there is a significantly high proportion of points selected.
For a 100×100 matrix, with 90% chosen vertices:
Max Flow: 9,0003 = 729,000,000,000
Hungarian: 1003 = 1,000,000
Hopcroft-Karp: 9,000 √18,000 ≅ 1,207,476.7
The Max Flow algorithm is never better.
It's also quite simple, in practice. This code uses an implementation by David Eppstein:
points = {
0 : [0, 1],
1 : [0],
2 : [1, 2],
}
selected = bipartiteMatch(points)[0]
for x, y in selected.iteritems():
print '(%i, %i)' % (x, y)
print 'Total: %i' % len(selected)

Related

highest tower of cubes (with numbers on sides)

Problem:
There are N cubes. There are M numbers. Each side of cube has number from 1 to M. You can stack one cube on another if their touching sides have same number (top side of bottom cube and bottom side of top cube has same number). Find the highest tower of cubes.
Input: number N of cubes and number M.
Example:
INPUT: N=5, M=6. Now we generate 5 random cubes with 6 sides = <1,M>.
[2, 4, 3, 1, 4, 1]
[5, 1, 6, 6, 2, 5]
[2, 5, 3, 1, 1, 6]
[3, 5, 6, 1, 3, 4]
[2, 4, 4, 5, 5, 5]
how you interpret single array of 6 numbers is up to you. Opposite sides in cube might be index, 5-index (for first cube opposite side of 4 would be 4). Opposite sides in cube might also be index and index+1 or index-1 if index%2==0 or 1 respectively. I used the second one.
Now let's say first cube is our current tower. Depending on the rotation top color might be one of 1, 2, 3, 4. If the 1 is color on top we can stack
on top of it second, third or fourth cube. All of them has color 1 on their sides. Third cube even has two sides with color 1 so we can stack it in two different ways.
I won't analyse it till the end because this post would be too long. Final answer for these (max height of the tower) is 5.
My current solution (you can SKIP this part):
Now I'm just building the tower recursively. Each function has this subproblem to solve: find highest tower given the top color of current tower and current unused cubes (or current used cubes). This way I can memoize and store results for tuple(top color of tower, array of used cubes). Despite memoization I think that in the worst case (for small M) this solution has to store M*(2^N) values (and this many cases to solve).
What I'm looking for:
I'm looking for something that would help me solve this efficiently for small M. I know that there is tile stacking problem (which uses Dynamic Programming) and tower of cubes (which uses DAG longest path) but I don't see the applicability of these solutions to my problem.
You won't find a polynomial time solution- if you did, we'd be able to solve the decision variant of the longest path problem (which is NP-Complete) in polynomial time. The reduction is as follows: for every edge in an undirected graph G, create a cube with opposing faces (u, v), where u and v are unique identifiers for the vertices of the edge. For the remaining 4 faces, assign globally unique identifiers. Solve for the tallest cube tower, this tower's height will be the length of the longest path of G, return if path length equals the queried value (yes/no).
However, you could still solve it in something like O(M^3*(N/2)!*log(N)) time (I think that bound is a bit loose, but its close). Use divide and conquer with memoization. Find all longest paths using cubes [0, N) beginning with a value B in range [0, M) and ending with a value E in range [0, M), for all possible B and E. To compute this, recurse, partitioning the cubes evenly in every possible way. Keep recursing until you hit the bottom (just one cube). Then begin merging them (by combining cube stacks that end in X with those beginning with X, for all X in [0, M). Once that's all done, at the topmost level just take the max of all the tower heights.

maximum ratio of a min subset and a max subset of size k in a collection of n value pairs

So, say you have a collection of value pairs on the form {x, y}, say {1, 2}, {1, 3} & {2, 5}.
Then you have to find a subset of k pairs (in this case, say k = 2), such that the ratio of the sum of all x in the subset divided by all the y in the subset is as high as possible.
Could you point me in the direction for relevant theory or algorithms?
It's kind of like maximum subset sum, but since the pairs are "bound" to each other it introduces a restriction that changes it from problems known to me.
Initially I thought that a simple greedy approach could work here, but commentators pointed out some counter examples.
Instead I think a bisection approach should work.
Suppose we want to know whether it is possible to achieve a ratio of g.
We need to add a selection of k vectors to end up above a line of gradient g.
If we project each vector perpendicular to this line to get values p1,p2,p3, then the final vector will be above the line if and only if the sum of the p values is positive.
Now, with the projected values it does seem right that the optimal solution is to choose the largest k.
We can then use bisection to find the highest ratio that is achievable.
Mathematical justification
Suppose we want to have the ratio above g, i.e.
(x1+x2+x3)/(y1+y2+y3) >= g
=> (x1+x2+x3) >= g(y1+y2+y3)
=> (x1-g.y1) + (x2-g.y2) + (x3-g.y3) >= 0
=> p1 + p2 + p3 >= 0
where pi is defined to be xi-g.yi.

Maximum minimum manhattan distance

Input:
A set of points
Coordinates are non-negative integer type.
Integer k
Output:
A point P(x, y) (in or not in the given set) whose manhattan distance to closest is maximal and max(x, y) <= k
My (naive) solution:
For every (x, y) in the grid which contain given set
BFS to find closest point to (x, y)
...
return maximum;
But I feel it run very slow for a large grid, please help me to design a better algorithm (or the code / peseudo code) to solve this problem.
Should I instead of loop over every (x, y) in grid, just need to loop every median x, y
P.S: Sorry for my English
EDIT:
example:
Given P1(x1,y1), P2(x2,y2), P3(x3,y3). Find P(x,y) such that min{dist(P,P1), dist(P,P2),
dist(P,P3)} is maximal
Yes, you can do it better. I'm not sure if my solution is optimal, but it's better than yours.
Instead of doing separate BFS for every point in the grid. Do a 'cumulative' BFS from all the input points at once.
You start with 2-dimensional array dist[k][k] with cells initialized to +inf and zero if there is a point in the input for this cell, then from every point P in the input you try to go in every possible direction. The further you are from the start point the bigger integer you put in the array dist. If there is a value in dist for a specific cell, but you can get there with a smaller amount of steps (smaller integer) you overwrite it.
In the end, when no more moves can be done, you scan the array dist to find the cell with maximum value. This is your point.
I think this would work quite well in practice.
For k = 3, assuming 1 <= x,y <= k, P1 = (1,1), P2 = (1,3), P3 = (2,2)
dist would be equal in the beginning
0, +inf, +inf,
+inf, 0, +inf,
0, +inf, +inf,
in the next step it would be:
0, 1, +inf,
1, 0, 1,
0, 1, +inf,
and in the next step it would be:
0, 1, 2,
1, 0, 1,
0, 1, 2,
so the output is P = (3,1) or (3,3)
If K is not large enough and you need to find a point with integer coordinates, you should do, as another answer suggested - Calculate minimum distances for all points on the grid, using BFS, strarting from all given points at once.
Faster solution, for large K, and probably the only one which can find a point with float coordinates, is as following. It has complexity of O(n log n log k)
Search for resulting maximum distance using dihotomy. You have to check if there is any point inside the square [0, k] X [0, k] which is at least given distance away from all points in given set. Suppose, you can check that fast enough for any distance. It is obvious, that if there is such point for some distance R, there always will be some point for all smaller distances r < R. For example, the same point would go. Thus you can search for maximum distance using binary search procedure.
Now, how to fast check for existence (and also find) a point which is at least r units away from all given points. You should draw "Manhattan spheres of radius r" around all given points. These are set of points at most r units away from given point. They are tilted by 45 degrees squares with diagonal equal to 2r. Now turn the picture by 45 degrees, and all squares will be parallel to the axis. Now you can check for existence of any point outside such squares using sweeping line algorithm. You have to sort all vertical edges of squares, and then process them one by one from left to right. Left borders will add segment mark to sweeping line, Left borders will erase it. And you have to check if there is any non marked point on the line. You can implement it using segment tree. Then, you have to check if there is any non marked point on the line inside the initial square [0,k]X[0,k].
So, again, overall solution will be binary search for r. Inside of it you will have to check if there is any point at least r units away from all given points. Do that by constructing "manhattans spheres of radius r" and then scanning them with a diagonal line from left-top corner to right-bottom. While moving line you should store number of opened spheres at each point at the line in the segment tree. between opening and closing of any spheres, line does not change, and if there is any free point there, it means, that you found it for distance r.
Binary search contributes log k to complexity. Each checking procedure is n log n for sorting squares borders, and n log k (n log n?) for processing them all.
Voronoi diagram would be another fast solution and could also find non integer answer. But it is much much harder to implement even for Manhattan measure.
First try
We can turn a 2D problem into a 1D problem by projecting onto the lines y=x and y=-x. If the points are (x1,y1) and (x2,y2) then the manhattan distance is abs(x1-x2)+abs(y1-y2). Change coordinate to a u-v system with basis U = (1,1), V = (1,-1). Coords of the two points in this basis are u1 = (x1-y1)/sqrt(2), v1= (x1+y1), u2= (x1-y1), v2 = (x1+y1). And the manhatten distance is the largest of abs(u1-u2), abs(v1-v2).
How this helps. We can just work with the 1D u-values of each points. Sort by u-value, loop through points and find the largest difference between pains of points. Do the same of v-values.
Calculating u,v coords of O(n), quick sorting is O(n log n), looping through sorted list is O(n).
Alas does not work well. Fails if we have point (-10,0), (10,0), (0,-10), (0,10). Lets try a
Voronoi diagram
Construct a Voronoi diagram
using Manhattan distance. This can be calculate in O(n log n) using https://en.wikipedia.org/wiki/Fortune%27s_algorithm
The vertices in the diagram are points which have maximum distance from its nearest vertices. There is psudo-code for the algorithm on the wikipedia page. You might need to adapt this for Manhattan distance.

How to find the minmal bounding rectangles for a set of lines?

Provided a set of N connected lines on a 2D axis, I am looking for an algorithm which will determine the X minimal bounding rectangles.
For example, suppose I am given 10 lines and I would like to bound them with at most 3 (potentially intersecting) rectangles. So if 8 of the lines are clustered closely together, they may use 1 rectangle, and the other two may use a 2nd or perhaps also a 3rd rectangle depending on their proximity to each other.
Thanks.
If the lines are actually a path, then perhaps you wouldn't be averse to the requirement that each rectangle cover a contiguous portion of the path. In this case, there's a dynamic program that runs in time O(n2 r), where n is the number of segments and r is the number of rectangles.
Compute a table with entries C(i, j) denoting the cost of covering segments 1, …, i with j rectangles. The recurrence is, for i, j > 0,
C(0, 0) = 0
C(i, 0) = ∞
C(i, j) = min over i' < i of (C(i', j - 1) + [cost of the rectangle covering segments i' + 1, …, i])
There are O(n r) entries, each of which is computed in time O(n). Recover the optimal collection of rectangles at the end by, e.g., storing the best i' for each entry.
I don't know of a simple, optimal algorithm for the general case. Since there are “only” O(n4) rectangles whose edges each contain a segment endpoint, I would be tempted to formulate this problem as an instance of generalized set cover.

Computational Geometry set of points algorithm

I have to design an algorithm with running time O(nlogn) for the following problem:
Given a set P of n points, determine a value A > 0 such that the shear transformation (x,y) -> (x+Ay,y) does not change the order (in x direction) of points with unequal x-coordinates.
I am having a lot of difficulty even figuring out where to begin.
Any help with this would be greatly appreciated!
Thank you!
I think y = 0.
When x = 0, A > 0
(x,y) -> (x+Ay,y)
-> (0+(A*0),0) = (0,0)
When x = 1, A > 0
(x,y) -> (x+Ay,y)
-> (1+(A*0),0) = (1,0)
with unequal x-coordinates, (2,0), (3,0), (4,0)...
So, I think that the begin point may be (0,0), x=0.
Suppose all x,y coordinates are positive numbers. (Without loss of generality, one can add offsets.) In time O(n log n), sort a list L of the points, primarily in ascending order by x coordinates and secondarily in ascending order by y coordinates. In time O(n), process point pairs (in L order) as follows. Let p, q be any two consecutive points in L, and let px, qx, py, qy denote their x and y coordinate values. From there you just need to consider several cases and it should be obvious what to do: If px=qx, do nothing. Else, if py<=qy, do nothing. Else (px>qx, py>qy) require that px + A*py < qx + A*qy, i.e. (px-qx)/(py-qy) > A.
So: Go through L in order, and find the largest A' that is satisfied for all point pairs where px>qx and py>qy. Then choose a value of A that's a little less than A', for example, A'/2. (Or, if the object of the problem is to find the largest such A, just report the A' value.)
Ok, here's a rough stab at a method.
Sort the list of points by x order. (This gives the O(nlogn)--all the following steps are O(n).)
Generate a new list of dx_i = x_(i+1) - x_i, the differences between the x coordinates. As the x_i are ordered, all of these dx_i >= 0.
Now for some A, the transformed dx_i(A) will be x_(i+1) -x_i + A * ( y_(i+1) - y_i). There will be an order change if this is negative or zero (x_(i+1)(A) < x_i(A).
So for each dx_i, find the value of A that would make dx_i(A) zero, namely
A_i = - (x_(i+1) - x_i)/(y_(i+1) - y_i). You now have a list of coefficients that would 'cause' an order swap between a consecutive (in x-order) pair of points. Watch for division by zero, but that's the case where two points have the same y, these points will not change order. Some of the A_i will be negative, discard these as you want A>0. (Negative A_i will also induce an order swap, so the A>0 requirement is a little arbitrary.)
Find the smallest A_i > 0 in the list. So any A with 0 < A < A_i(min) will be a shear that does not change the order of your points. Pick A_i(min) as that will bring two points to the same x, but not past each other.

Resources