Closest Pair Brute Force Algorithm - Basic Operation - algorithm

Given the following pseudo code for calculating the distance between two closest points in the place by brute force:
for (i <-- 1 to n-1) do
for (j <-- i+1 to n) do
d <-- min(d, sqrt(xi-xj)^2 + (yi-yj)^2)
return d
The basic operation is computing the square root. But apparently computing square roots in the loop can be avoided by ignoring the square root function and comparing the values (xi-xj)^2 + (yi-yj)^2) themselves. I looked it up and got this "the smaller a number of which we take the root, the smaller its square root, meaning the square root function is strictly increasing". Therefore the basic operation becomes squaring the numbers. Is anyone able to explain this definition?

The easiest way to answer your question is by seeing why you can avoid taking the square root. Consider the following set of distances between points, in ascending order:
{2, 5, 10, 15, 30, ...} = {d1,1, d1,2, d1,3, d2,1, d2,2, ...}
Each one of these distances was computed as the square root of the sum of the x and y distances squared. But we can square each item in this set and arrive at the same order:
{4, 25, 100, 225, 900} = {d1,12, d1,22, d1,32, d2,12, d2,22, ...}
Notice carefully that the positions of the distances, including the minimum distance (the first entry), did not change position.
It should be noted that your pseudo-code does not actually compute the minimum distance, but it could easily be modified to compute this.

Related

highest tower of cubes (with numbers on sides)

Problem:
There are N cubes. There are M numbers. Each side of cube has number from 1 to M. You can stack one cube on another if their touching sides have same number (top side of bottom cube and bottom side of top cube has same number). Find the highest tower of cubes.
Input: number N of cubes and number M.
Example:
INPUT: N=5, M=6. Now we generate 5 random cubes with 6 sides = <1,M>.
[2, 4, 3, 1, 4, 1]
[5, 1, 6, 6, 2, 5]
[2, 5, 3, 1, 1, 6]
[3, 5, 6, 1, 3, 4]
[2, 4, 4, 5, 5, 5]
how you interpret single array of 6 numbers is up to you. Opposite sides in cube might be index, 5-index (for first cube opposite side of 4 would be 4). Opposite sides in cube might also be index and index+1 or index-1 if index%2==0 or 1 respectively. I used the second one.
Now let's say first cube is our current tower. Depending on the rotation top color might be one of 1, 2, 3, 4. If the 1 is color on top we can stack
on top of it second, third or fourth cube. All of them has color 1 on their sides. Third cube even has two sides with color 1 so we can stack it in two different ways.
I won't analyse it till the end because this post would be too long. Final answer for these (max height of the tower) is 5.
My current solution (you can SKIP this part):
Now I'm just building the tower recursively. Each function has this subproblem to solve: find highest tower given the top color of current tower and current unused cubes (or current used cubes). This way I can memoize and store results for tuple(top color of tower, array of used cubes). Despite memoization I think that in the worst case (for small M) this solution has to store M*(2^N) values (and this many cases to solve).
What I'm looking for:
I'm looking for something that would help me solve this efficiently for small M. I know that there is tile stacking problem (which uses Dynamic Programming) and tower of cubes (which uses DAG longest path) but I don't see the applicability of these solutions to my problem.
You won't find a polynomial time solution- if you did, we'd be able to solve the decision variant of the longest path problem (which is NP-Complete) in polynomial time. The reduction is as follows: for every edge in an undirected graph G, create a cube with opposing faces (u, v), where u and v are unique identifiers for the vertices of the edge. For the remaining 4 faces, assign globally unique identifiers. Solve for the tallest cube tower, this tower's height will be the length of the longest path of G, return if path length equals the queried value (yes/no).
However, you could still solve it in something like O(M^3*(N/2)!*log(N)) time (I think that bound is a bit loose, but its close). Use divide and conquer with memoization. Find all longest paths using cubes [0, N) beginning with a value B in range [0, M) and ending with a value E in range [0, M), for all possible B and E. To compute this, recurse, partitioning the cubes evenly in every possible way. Keep recursing until you hit the bottom (just one cube). Then begin merging them (by combining cube stacks that end in X with those beginning with X, for all X in [0, M). Once that's all done, at the topmost level just take the max of all the tower heights.

Maximum minimum manhattan distance

Input:
A set of points
Coordinates are non-negative integer type.
Integer k
Output:
A point P(x, y) (in or not in the given set) whose manhattan distance to closest is maximal and max(x, y) <= k
My (naive) solution:
For every (x, y) in the grid which contain given set
BFS to find closest point to (x, y)
...
return maximum;
But I feel it run very slow for a large grid, please help me to design a better algorithm (or the code / peseudo code) to solve this problem.
Should I instead of loop over every (x, y) in grid, just need to loop every median x, y
P.S: Sorry for my English
EDIT:
example:
Given P1(x1,y1), P2(x2,y2), P3(x3,y3). Find P(x,y) such that min{dist(P,P1), dist(P,P2),
dist(P,P3)} is maximal
Yes, you can do it better. I'm not sure if my solution is optimal, but it's better than yours.
Instead of doing separate BFS for every point in the grid. Do a 'cumulative' BFS from all the input points at once.
You start with 2-dimensional array dist[k][k] with cells initialized to +inf and zero if there is a point in the input for this cell, then from every point P in the input you try to go in every possible direction. The further you are from the start point the bigger integer you put in the array dist. If there is a value in dist for a specific cell, but you can get there with a smaller amount of steps (smaller integer) you overwrite it.
In the end, when no more moves can be done, you scan the array dist to find the cell with maximum value. This is your point.
I think this would work quite well in practice.
For k = 3, assuming 1 <= x,y <= k, P1 = (1,1), P2 = (1,3), P3 = (2,2)
dist would be equal in the beginning
0, +inf, +inf,
+inf, 0, +inf,
0, +inf, +inf,
in the next step it would be:
0, 1, +inf,
1, 0, 1,
0, 1, +inf,
and in the next step it would be:
0, 1, 2,
1, 0, 1,
0, 1, 2,
so the output is P = (3,1) or (3,3)
If K is not large enough and you need to find a point with integer coordinates, you should do, as another answer suggested - Calculate minimum distances for all points on the grid, using BFS, strarting from all given points at once.
Faster solution, for large K, and probably the only one which can find a point with float coordinates, is as following. It has complexity of O(n log n log k)
Search for resulting maximum distance using dihotomy. You have to check if there is any point inside the square [0, k] X [0, k] which is at least given distance away from all points in given set. Suppose, you can check that fast enough for any distance. It is obvious, that if there is such point for some distance R, there always will be some point for all smaller distances r < R. For example, the same point would go. Thus you can search for maximum distance using binary search procedure.
Now, how to fast check for existence (and also find) a point which is at least r units away from all given points. You should draw "Manhattan spheres of radius r" around all given points. These are set of points at most r units away from given point. They are tilted by 45 degrees squares with diagonal equal to 2r. Now turn the picture by 45 degrees, and all squares will be parallel to the axis. Now you can check for existence of any point outside such squares using sweeping line algorithm. You have to sort all vertical edges of squares, and then process them one by one from left to right. Left borders will add segment mark to sweeping line, Left borders will erase it. And you have to check if there is any non marked point on the line. You can implement it using segment tree. Then, you have to check if there is any non marked point on the line inside the initial square [0,k]X[0,k].
So, again, overall solution will be binary search for r. Inside of it you will have to check if there is any point at least r units away from all given points. Do that by constructing "manhattans spheres of radius r" and then scanning them with a diagonal line from left-top corner to right-bottom. While moving line you should store number of opened spheres at each point at the line in the segment tree. between opening and closing of any spheres, line does not change, and if there is any free point there, it means, that you found it for distance r.
Binary search contributes log k to complexity. Each checking procedure is n log n for sorting squares borders, and n log k (n log n?) for processing them all.
Voronoi diagram would be another fast solution and could also find non integer answer. But it is much much harder to implement even for Manhattan measure.
First try
We can turn a 2D problem into a 1D problem by projecting onto the lines y=x and y=-x. If the points are (x1,y1) and (x2,y2) then the manhattan distance is abs(x1-x2)+abs(y1-y2). Change coordinate to a u-v system with basis U = (1,1), V = (1,-1). Coords of the two points in this basis are u1 = (x1-y1)/sqrt(2), v1= (x1+y1), u2= (x1-y1), v2 = (x1+y1). And the manhatten distance is the largest of abs(u1-u2), abs(v1-v2).
How this helps. We can just work with the 1D u-values of each points. Sort by u-value, loop through points and find the largest difference between pains of points. Do the same of v-values.
Calculating u,v coords of O(n), quick sorting is O(n log n), looping through sorted list is O(n).
Alas does not work well. Fails if we have point (-10,0), (10,0), (0,-10), (0,10). Lets try a
Voronoi diagram
Construct a Voronoi diagram
using Manhattan distance. This can be calculate in O(n log n) using https://en.wikipedia.org/wiki/Fortune%27s_algorithm
The vertices in the diagram are points which have maximum distance from its nearest vertices. There is psudo-code for the algorithm on the wikipedia page. You might need to adapt this for Manhattan distance.

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Algorithm for polygon with weight on vertices and operations on edges

I am thinking about the algorithm for the following problem (found on carrercup):
Given a polygon with N vertexes and N edges. There is an int number(could be negative) on every vertex and an operation in set(*,+) on every edge. Every time, we remove an edge E from the polygon, merge the two vertexes linked by the edge(V1,V2) to a new vertex with value: V1 op(E) V2. The last case would be two vertexes with two edges, the result is the bigger one.
Return the max result value can be gotten from a given polygon.
I think we can use just greedy approach. I.e. for polygon with k edges find a pair (p, q) which produces the maximum number when collapsing: (p ,q) = max ({i operation j : i, j - adjacent edges)
Then just call a recursion on polygons:
1. Let function CollapseMaxPair( P(k) ) - gets polygon with k edges and returns 'collapsed' polygon with k-1 edges
2. Then our recursion:
P = P(N);
Releat until two edges left
P = CollapseMaxPair( P )
maxvalue = max ( two remained values)
What do you think?
I have answered this question here: Google Interview : Find the maximum sum of a polygon and it was pointed out to me that that question is a duplicate of this one. Since no one has answered this question fully yet, I have decided to add this answer here as well.
As you have identified (tagged) correctly, this indeed is very similar to the matrix multiplication problem (in what order do I multiply matrixes in order to do it quickly).
This can be solved polynomially using a dynamic algorithm.
I'm going to instead solve a similar, more classic (and identical) problem, given a formula with numbers, addition and multiplications, what way of parenthesizing it gives the maximal value, for example
6+1 * 2 becomes (6+1)*2 which is more than 6+(1*2).
Let us denote our input a1 to an real numbers and o(1),...o(n-1) either * or +. Our approach will work as follows, we will observe the subproblem F(i,j) which represents the maximal formula (after parenthasizing) for a1,...aj. We will create a table of such subproblems and observe that F(1,n) is exactly the result we were looking for.
Define
F(i,j)
- If i>j return 0 //no sub-formula of negative length
- If i=j return ai // the maximal formula for one number is the number
- If i<j return the maximal value for all m between i (including) and j (not included) of:
F(i,m) (o(m)) F(m+1,j) //check all places for possible parenthasis insertion
This goes through all possible options. TProof of correctness is done by induction on the size n=j-i and is pretty trivial.
Lets go through runtime analysis:
If we do not save the values dynamically for smaller subproblems this runs pretty slow, however we can make this algorithm perform relatively fast in O(n^3)
We create a n*n table T in which the cell at index i,j contains F(i,j) filling F(i,i) and F(i,j) for j smaller than i is done in O(1) for each cell since we can calculate these values directly, then we go diagonally and fill F(i+1,i+1) (which we can do quickly since we already know all the previous values in the recursive formula), we repeat this n times for n diagonals (all the diagonals in the table really) and filling each cell takes (O(n)), since each cell has O(n) cells we fill each diagonals in O(n^2) meaning we fill all the table in O(n^3). After filling the table we obviously know F(1,n) which is the solution to your problem.
Now back to your problem
If you translate the polygon into n different formulas (one for starting at each vertex) and run the algorithm for formula values on it, you get exactly the value you want.
Here's a case where your greedy algorithm fails:
Imagine your polygon is a square with vertices A, B, C, D (top left, top right, bottom right, bottom left). This gives us edges (A,B), (A,D), (B,C), and (C, D).
Let the weights be A=-1, B=-1, C=-1, and D=1,000,000.
A (-1) ------ B (-1)
| |
| |
| |
| |
D(1000000) ---C (-1)
Clearly, the best strategy is to collapse (A,B), and then (B,C), so that you may end up with D by itself. Your algorithm, however, will start with either (A,D) or (D,C), which will not be optimal.
A greedy algorithm that combines the min sums has a similar weakness, so we need to think of something else.
I'm starting to see how we want to try to get all positive numbers together on one side and all negatives on the other.
If we think about the initial polygon entirely as a state, then we can imagine all the possible child states to be the subsequent graphs were an edge is collapsed. This creates a tree-like structure. A BFS or DFS would eventually give us an optimal solution, but at the cost of traversing the entire tree in the worst case, which is probably not as efficient as you'd like.
What you are looking for is a greedy best-first approach to search down this tree that is provably optimal. Perhaps you could create an A*-like search through it, although I'm not sure what your admissable heuristic would be.
I don't think the greedy algorithm works. Let the vertices be A = 0, B = 1, C = 2, and the edges be AB = a - 5b, BC = b + c, CA = -20. The greedy algorithm selects BC to evaluate first, value 3. Then AB, value, -15. However, there is a better sequence to use. Evaluate AB first, value -5. Then evaluate BC, value -3. I don't know of a better algorithm though.

Optimization problem - finding a maximum

I have a problem at hand which can be reduced to something like this :
Assume a bunch of random points in a two-dimension plane X-Y where for each Y, there could be multiple points on X and for each X, there could be multiple points on Y.
Whenever a point (Xi,Yi) is chosen, no other point with X = Xi OR Y = Yi can be chosen. We have to choose the maximum number of points.
This can be reduced to simple maximum flow problem. If you have a point (xi, yi), in graph it should be represented with path from source S to point xi, from xi to yi and from yi to the last node (sink) T.
Note, if we have points (2, 2) and (2, 5), there's still only one path from S to x2. All paths (edges) have capacity 1.
The flow in this network is the answer.
about general problem
http://en.wikipedia.org/wiki/Max_flow
update
I don't have graphic editor right now to visualise problem, but you can easily draw example by hand. Let's say, points are (3, 3) (3, 5) (2, 5)
Then edges (paths) would be
S -> x2, S -> x3
y3 -> T, y5 -> T
x3 -> y3, x3 -> y5, x2 -> y5
Flow: S -> x2 -> y5 -> T and S -> x3 -> y3 -> T
The amount of 'water' going from source to sink is 2 and so is the answer.
Also there's a tutorial about max flow algorithms
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=maxFlow
Isn't this just the Hungarian algorithm?
Create an n×n matrix, with 0 at marked vertices, and 1 at unmarked vertices. The algorithm will choose n vertices, one for each row and column, which minimizes their sum. Simply count all the chosen vertices which equal 0, and you have your answer.
from munkres import Munkres
matrix = [[0, 0, 1],
[0, 1, 1],
[1, 0, 0]]
m = Munkres()
total = 0
for row, column in m.compute(matrix):
if matrix[row][column] == 0:
print '(%i, %i)' % (row, column)
total += 1
print 'Total: %i' % total
This runs in O(n3) time, where n is the number of rows in the matrix. The maximum flow solution runs in O(V3), where V is the number of vertices. As long as there are more than n chosen intersections, this runs faster; in fact, it runs orders of magnitude faster, as the number of chosen vertices goes up.
Different solution. It turns out that there's a lot of symmetry, and the answer is a lot simpler than I originally thought. The maximum number of things you can ever do is the minimum of the unique X's and unique Y's, which is O(NlogN) if you just want the result.
Every other shape is equivalent to a rectangle that contains the points, because it doesn't matter how many points you pull from the center of a rectangle, the order will never matter (if handled as below). Any shape that you pluck a point from now has one less unique X and one less unique Y, just like a rectangle.
So the optimal solution has nothing to do with connectedness. Pick any point that is on the edge of the smallest dimension (i.e. if len(unique-Xs)>len(unique-Ys), pick anything that has either maximum or minimum X). It doesn't matter how many connections it has, just which dimension is biggest, which can easily be done while looking at the sorted-unique lists created above. If you keep a unique-x and unique-y counter and decrement them when you delete all the unique nodes in that element of the list, then each deletion is O(1) and recalculating the lengths is O(1). So repeating this N times is at worst O(N), and the final complexity is O(NlogN) (due solely to the sorting).
You can pick any point along the shortest edge because:
if there's only one on that edge, you better pick it now or something else will eliminate it
if there's more than one on that edge, who cares, you will eliminate all of them with your pick anyways
Basically, you're maximizing "max(uniqX,uniqY)" at each point.
Update: IVlad caught an edge case:
If the dimensions are equal, take the edge with the least points. Even if they aren't equal, take the top or bottom of the unique-stack you're eliminating from that has the least points.
Case in point:
Turn 1:
Points: (1, 2); (3, 5); (10, 5); (10, 2); (10, 3)
There are 3 unique X's: 1, 3, 10
There are 3 unique Y's: 2, 3, 5
The "bounding box" is (1,5),(10,5),(10,2),(1,2)
Reaction 1:
The "outer edge" (outermost uniqueX or uniqueY lists of points) that has the least points is the left. Basically, look at the sets of points in x=1,x=10 and y=2,y=5. The set for x=1 is the smallest: one point. Pick the only point for x=1 -> (1,2).
That also eliminates (10,2).
Turn 2:
Points: (3, 5); (10, 5); (10, 3)
There are 2 unique X's: 3, 10
There are 2 unique Y's: 3, 5
The "bounding box" is (3,5),(10,5),(10,3),(3,3)
Reaction 2:
The "edge" of the bounding box that has the least is either the bottom or the left. We reached the trivial case - 4 points means all edges are the outer edges. Eliminate one. Say (10,3).
That also eliminates (10,5).
Turn 3:
Points: (3, 5)
Reaction 3:
Remove (3,5).
For each point, identify the number of other points (N) that would be disqualified by the selection of that point (i.e. the ones with the same X or Y values). Then, iterate over the non-disqualified points in order of increasing number of N disqualified points. When you are finished, you will have removed the maximum number of points.
The XY plane is a red herring. Phrase it as a set of elements, each of which has a set of mutually exclusive elements.
The algorithm then becomes a depth-first search. At each level, for each candidate node, calculate the set of excluded elements, the union of currently excluded elements with the elements excluded by the candidate node. Try candidate nodes in order of fewest excluded elements to most. Keep track of the best solution so far (the fewest excluded nodes). Prune any subtrees that are worse than the current best.
As a slight improvement at the cost of possible missed solutions, you can use Bloom filters for keeping track of the excluded sets.
This looks like a problem that can be solved with dynamic programming. Look into the algorithms for longest common substring, or the knapsack problem.
Based on a recommendation from IVlad, I looked into the Hopcroft–Karp algorithm. It's generally better than both the maximum flow algorithm and the Hungarian algorithm for this problem, often significantly. Some comparisons:
In general:
Max Flow: O(V3) where V is the number of vertices.
Hungarian: O(n3) where n is the number of rows in the matrix
Hopcroft-Karp: O(V √2V) where V is the number of vertices.
For a 50×50 matrix, with 50% chosen vertices:
Max Flow: 1,2503 = 1,953,125,000
Hungarian: 503 = 125,000
Hopcroft-Karp: 1,250 √2,500 = 62,500
For a 1000×1000 matrix, with 10 chosen vertices:
Max Flow: 103 = 1,000
Hungarian: 10003 = 1,000,000,000
Hopcroft-Karp: 10 √20 ≅ 44.7
The only time the Hungarian algorithm is better is when there is a significantly high proportion of points selected.
For a 100×100 matrix, with 90% chosen vertices:
Max Flow: 9,0003 = 729,000,000,000
Hungarian: 1003 = 1,000,000
Hopcroft-Karp: 9,000 √18,000 ≅ 1,207,476.7
The Max Flow algorithm is never better.
It's also quite simple, in practice. This code uses an implementation by David Eppstein:
points = {
0 : [0, 1],
1 : [0],
2 : [1, 2],
}
selected = bipartiteMatch(points)[0]
for x, y in selected.iteritems():
print '(%i, %i)' % (x, y)
print 'Total: %i' % len(selected)

Resources