Split the set of points, algorithm

Split the set of points, algorithm - algorithm

I'm learning for the test and I can't manage with this problem:
We are given a set of n < 1000 points and an integer d. The task is to create two disjoint sets of points A_1 and A_2 (which union is given set of n points) in such way that the distance (euclidean) between each pair of points from A_i (for i = 1 or 2) is less or equal to d. If it is not possible, print -1.
For example, input:
d = 3, and points:
(5,3), (1,1), (4,2), (1,3), (5,2), (2,3), (5,1)
we can create:
A_1 = { (2,3), (1,3), (1,1) }
A_2 = { (5,3), (4,2), (5,2), (5,1) }
since each pair of points from A_i (for i = 1 or 2) are close enough.
I really want to know how to solve it, but no idea. Since n is small, algorithm can be even O(n^2 log n), but I don't know how to start. I was thinking that maybe start with counting the distance between each pair of points, then take two points with maximum distance and place them in to different sets (if their distance is greater than d). Then repeat this step for the rest of pairs, but the problem is how to decide where I can legally put next points. Can anyone help with this algorithm?

Let's consider a simple graph with n nodes (corresponding to the n points). Two nodes are connected if the distance between the two corresponding points is greater that d.
If is is possible to create the two disjoints sets, we must have a bi-partite graphe (if some nodes are not connected to the others, they can be put in any set).
Thus, we only need to test the bipartiteness of the graph which is simple :
http://en.wikipedia.org/wiki/Bipartite_graph#Testing_bipartiteness

Think of an array with all the points across the top and all the points down the side.
Fill in the array with a zero in any cell if the two points (left and top) that define the cell are more than d apart and one if the two points are less than d apart.
(5,3), (1,1), (4,2), (1,3), (5,2), (2,3), (5,1)
(5,3), 0 1 0 1 1 1
(1,1), 0 0 1 0 1 0
(4,2), 1 0 0 1 1 1
(1,3), 0 1 0 0 1 0
(5,2), 1 0 1 0 0 1
(2,3), 1 1 1 1 0 0
(5,1) 1 0 1 0 1 0
(Note: You have to fill in the each triangle with the same 0s and 1s flipped.)
Ignore the diagonal. Pay attention to the top-right triangular section.
Skip the 0th column.
Start with the 1st column and, if it doesn't have a 1 in the top row, swap it with another column to its right that has a 1 in the top row. Then swap the same rows too to keep the diagonal blank. (If there isn't one, there is no solution.) [Example: Swap column 2 and 3 and row 2 and 3] Note that the choice of this row may become an optimizing factor.
Move to the next column and if it doesn't have a 1 in the top row, swap with a column to the right that does and swap the corresponding rows. If there is not one, try swapping it with a row below it that has a 1 in that column and do the corresponding column. The rows below it should be ignored if below the diagonal.
We are collecting points in the top left corner of the triangle that have 1's in them. These points can all go in one of the collections.
This is where I get lost in doing this in my head but you have to do a similar process starting in the bottom right corner of the triangle and collecting points that will be in the other collection. Swap rows and corresponding columns to collect 1s in the bottom right corner of the triangle.
Once you have swapped enough rows that you can form a rectangle in the top right corner--a true rectangle without the bottom left corner cut off--and that rectangle contains all the 0's, you have a solution. If you can't do that, there is no solution.
There is a comparison of the lowest row with a 1 in the triangle and the rightmost column with a 1 in the triangle and the cell where they meet. That cell has to be in the triangle for a solution to exist.
(I left you a big "to-do" to find how to interleave the swaps of rows and columns to clean the 0's out of the top-left and bottom-right corners of the triangle. Maybe a discussion here can resolve how to make it work. Or find out it won't work.)

Starting with a distance matrix seems to be a good idea. Then in this distance matrix pick every entry that is greater than d. This entry means that the according points have to be in different sets.
Start with two empty sets and iterate all relevant entries ( > d).
If sets are empty, put the two points into them. Otherwise there are three options:
If it is clear which set the points belong to, put them into them (that means, inserting the point preserves the max-distance criterion, which can be obtained from the distance matrix).
If the points cannot be inserted into one of the sets, the problem is not solvable.
If both sets are possible for both points, we have a problem. I would suggest starting a new pair of point sets and putting the points into them. Then if a subsequent point pair is unclear again, check for the second set pair. If it is yet unclear, check for a third set pair and so on. If a point pair is inserted into a previous set, check the following sets, if points are now clear. At the end you have a list of pairs of sets of points, which can be united as you wish.
I just had an idea for a second approach, which is similar, but should be a bit faster.
We also start with the distance matrix.
Additional to the two sets, we maintain a stack, queue or whatever of newly added entries.
So if we pick the first relevant entry from the distance matrix, the points are added to both queues. As long as there is an entry in one of the queues, do the following:
Remove the point from the queue and insert it into the set. If the insertion breaks the max-distance criterion, the problem is not solvable. Examine the row or column in the distance matrix for this very point and extract every relevant entry in this row/column. Add the partner point to the queue of the other set (because this has to be in a different set).
If both queues are empty, add the next relevant entry that has not yet been visited to the queues and start over.
This algorithm has the advantage that the points are processed in the order they can impact each other. Therefore, there is no need for more than one pair of sets.

Related

Intersection of n rectangles - Maximum number of regions where exactly k rectangles intersect

Imagine n axis-aligned rectangles (specified by its position (x,y), width and height). The rectangles are aligned in a way so that the i-th rectangle necessarily intersects with the (i+1)-th. For example let n = 3, then 1 necessarily intersects with 2 and 2 with 3. It is important to mention that this is not transitive; 3 can intersect with 1 but there is no guarantee (see figure for two valid alignment examples).
What I'm now looking for is the maximum possible number of regions where exactly k = 2,...,n rectangles intersect with each other (these regions are shown in the figure). In other words, I'm looking for a worst-case alignment of n rectangles so that the number of regions where exactly k rectangles intersect reaches its maximum. Theoretical, the maximum possible number of regions where exaclty k rectangles intersect is n over k (the binomial coefficient). However, this formula is geometrical only valid for n < 4 as it is not possible to align (and to draw) rectangles for n >= 4 so that in the worst-case n over k regions exist where exactly k rectangles intersect.
The first sub-image of the figure shows the worst-case alignment for n = 3. There are 3 over 2 = 3 regions where exactly two rectangles intersect and 3 over 3 = 1 region where exactly three rectangles intersect. The second sub-image also shows a valid alignment for three rectangles however this is not a worst-case alignment as, for example, there is no region where exactly three rectangles intersect.

A WRONG answer; not removed only because of the approach that may or may not be useful.
The geometric data -- which rectangles intersect -- can be abstracted away: all that matters is the following property:
Property P: If rectangles i and j intersect that implies that i intersects with i+1,...,j-1 as well.
If your representation of the problem encodes P it doesn't matter anymore that you started with the rectangles.
Now, how do we keep record which rectangles intersect? One way would be a graph with nodes being the rectangles and edges intersections, but that isn't very useful because the above property P is not evident in a graph. A better way would be to setup the following matrix:
Represent i-th rectangle with the i-th row of a matrix A that has 0s until the entry A(i,i), 1s from A(i,i) to A(i,i+m), where i+m is the index of the furthest rectangle that intersects with rectangle i. That is, A has n rows, one per the original rectangle, it consists of 0s and 1s, and A(i,j) for j>i is 1 if and only if rectangles i and j intersect. For j
Now, what does it mean that we have an area of exactly k intersecting rectangle? I claim that the above matrix represents that by a column that has exactly k 1s. Why? Suppose that your area is an intersection of rectangles i+1,...,i+k. Take a look at the matrix entry A(i+k,i+k). The column above it has 1s in rows from 1+1 to i+k and 0s otherwise.
The above matrix looks superficially similar to young's skew-tableau, thus the comment. But yes, similarity is superficial because it doesn't originate from a partition.
Now it remains to maximize the number of columns in A that has exactly k 1s. I think the best one would be a matrix with exactly k 1s in each row, which would give the answer to the original problem n. The answer is obviously wrong, so I'm missing something here. Aaaaah!

Build a matrix where the cell Aij is 1 if rectangles i and j intersect or 0 if they don't. This matrix is symmetric.
Now notice that many 1's close to the diagonal mean more intersections between contiguous-aligned rectangles.
The "worst" case, where k, the number of rectangles that intersect each other, is maximum, is represented in the matrix by k contiguous 1's, with no 0 in between. You can consider the elements after the diagonal. This way the property "rectangle i intersects rectangle j" is fulfilled.
The problem is how to swap rows and columns to achieve this result. It may be a matrix bandwidth reduction problem.
See, for example, this and this.
You can also generate your own algorithm for your special case. Be aware of it can be a NP problem, and that several solutions may exist.

Two dimensional array scan algorithm

I have being given with a question to scan 2 dimensional array, the array represent a garden, u can step on the garden grass only if the grass is cut down and not too high. a cut down grass represented by number 1. high grass represented with a number bigger then 1, the bigger the number - the higher the grass, heights are unique. in this garden you can have ant colonies which is represented by 0. you can't step on ant colony, no matter what.
Your goal is to cut down all the grass and make it level 1, but u must cat the smallest grass first before you cut any bigger grass. u start from any corner of the garden u choose, as long as u don't stand on an ant colony.
once u cut a grass, it will become number 1, which means, u can now step on it, remember, u can't step on grass bigger then number 1.
Edit:
- heights are unique
The algorithm should return the number of steps made (else -1) , obviously the less steps the better, and you can't go out of the board.
Example:
this matrix
[1,1,1,0]
[1,0,2,1]
[1,0,3,1]
output: 3, because u start from the bottom right cornet, then, u go up, and then left, (chop the grass) and then down (chop grass again).
suggested solution:
is using some kind of a flood fill algorithm (recursion in all directions), and in any case use calculated data structure - like min heap, to hold the current smallest grass height so far, without a pre-clculated min heap we can't never know if we can cut the grass. we take the minimum number from the heap, and start searching for it in the matrix. every cell we encounter, we will go in all directions to search for the number we want.
This solution is obviously the worst, but it solve the problem. I was just wandering if someone can have a better one, I can imagine some dynamic programming solution maybe, not sure. Hell =D

An algorithm that finds the shortest path (with the minimum number of steps):
Collect all cells with height > 1 and sort them by height in increasing order. (They are all unique).
Add the starting cell to the beginning of the sorted collection of cells.
Iterate through the collection and find the shortest path between the current cell and the next cell in the collection, assuming that all cells with higher heights are the ant colonies (cannot be visited). This can be done with BFS. Example:
1 2 4
1 3 0
1 1 1
On the first iteration, we need to find the shortest path between bottom-right corner and cell with height = 2. We should run BFS in the 'virtual garden' where all cells with height > 2 are impossible to go through:
1 2 0
1 0 0
1 1 1
Note, that you need not change higher cells to zero value, just to change the condition in BFS.
Join all found shortest paths.

Recursively compute closest pairs

I am trying to perform the closest pairs algorithm - there is one area however where I am totally stuck.
The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:
1) Sort points according to their x-coordinates.
2) Split the set of points into two equal-sized subsets by a vertical line x=xmid.
3) Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dLmin and dRmin, respectively.
4) Find the minimal distance dLRmin among the set of pairs of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
5) The final answer is the minimum among dLmin, dRmin, and dLRmin.
My problem is with Step 3: Let's say that you've split your 8 element array into two halves, and on the left half you have 4 points - 1,2,3 and 4. Let's also say that points 2 and 3 are the closest pair among those 4 points. Well, if you keep recursively dividing this subset into halves, you will eventually end up calculating the min between 1-2 (on the left), you will calculate the min between 3-4 (on the right), and you will return the minimum-distance pair from those two pairs..
HOWEVER - you've totally missed the distance between 2-3, as you've never calculated it since they were on opposite sides... so how is this issue solved? Notice how Step 4 comes AFTER this recursive process, it seems to be an independent step and only applies to the end result after Step 3, and not to every subsequent division of the subarrays that occurs within Step 3.. is it? Or is there another way of doing this that I'm missing?

The steps are a bit misleading, in that steps 2-5 are all part of the recursion. At every level of recursion, you need to calculate dLmin, dRmin, and dLRmin. The minimum of these is returned as the answer for that level of recursion.
To use your example, you would calculated dLmin as the distance between points 1 and 2, dRmin as the distance between points 3 and 4, and then dLRmin as the distance between points 2 and 3. Since dLRmin is the smallest in your hypothetical case, it would be returned.

All points with minimum Manhattan distance from all other given points [Optimized]

The problem here is to find set of all integer points which gives minimum sum over all Manhattan distances from given set of points!
For example:
lets have a given set of points { P1, P2, P3...Pn }
Basic problem is to find a point say X which would have minimum sum over all distances from points { P1, P2, P3... Pn }.
i.e. |P1-X| + |P2-X| + .... + |Pn-X| = D, where D will be minimum over all X.
Moving a step further, there can be more than one value of X satisfying above condition. i.e. more than one X can be possible which would give the same value D. So, we need to find all such X.
One basic approach that anyone can think of will be to find the median of inputs and then brute force the co-ordinates which is mentioned in this post
But the problem with such approach is: if the median gives two values which are very apart, then we end up brute forcing all points which will never run in given time.
So, is there any other approach which would give the result even when the points are very far apart (where median gives a range which is of the order of 10^9).

You can consider X and Y separately, since they add to the distance independently of each other. This reduces the question to finding, given n points on a line, a point with the minimum sum-of-distances to the other points. This is simple: any point between the two medians (inclusive) will satisfy this.
Proof: If we have an even number of points, there will be two medians. A point between the two medians will have n/2 points to the left and n/2 points to the right, and a total sum-of-distances to those points of S.
If we move it one point to the left, S will go up by n/2 (since we're moving away from the right-most points) and down by n/2 (since we're moving towards the left-most points), so overall S remains the same. This holds true until we hit the left-most median point. When we move one left of the left-most median point, we now have (n/2 + 1) points to the right, and (n/2 - 1) points to the left, so S goes up by two. Continuing to the left will only increase S further.
By the same logic, all points to the right of the right-most median also have a higher S.
If we have an odd number of points, there is only one median. Using the same logic as above, we can show that it has the lowest value of S.

If the median gives you an interval of the order of 10^9 then each point in that interval is as good as any other.
So depending on what you want to do with those points later on you can either return the range or enumerate points in that range. No way around it..
Obviously in two dimensions you'll get a bouding rectangle, in 3 dimensions a bounding cuboid etc..
The result will always be a cartesian product of ranges obtained for each dimension, so you can return a list of those ranges as a result.

Since in manhattan distance each component contributes separately, you can consider them separately too. The optimal answer is ( median(x),median(y) ). You need to look around this point for integer solutions.
NOTE: I did not read your question properly while answering. My answer still holds, but probably you knew about this solution already.

Yes i also think that for odd number of N points on a grid , there will be only a Single point(i.e the MEDIAN) which will be at minimum sum of Manhattan distance from all other points.
For Even value of N, the scenario will be a little different.
According to me if two Sets X = {1,2} and Y= {3,4} their Cartesian product will be always 4.
i.e X × Y = {1,2} × {3,4} = {(1,3), (1,4), (2,3), (2,4)}. This is what i have understood so far.
As for EVEN number of values we always take "MIDDLE TWO" values as MEDIAN. Taking 2 from X and 2 from Y will always return a Cartesian product of 4 points.
Correct me if i am wrong.

Peg Game: best place to place ball such that it lands in the target cell

Source: Facebook Hacker Cup Qualification Round 2011
At the arcade, you can play a simple game where a ball is dropped into the top of the game, from a position of your choosing. There are a number of pegs that the ball will bounce off of as it drops through the game. Whenever the ball hits a peg, it will bounce to the left with probability 0.5 and to the right with probability 0.5. The one exception to this is when it hits a peg on the far left or right side, in which case it always bounces towards the middle.
When the game was first made, the pegs where arranged in a regular grid. However, it's an old game, and now some of the pegs are missing. Your goal in the game is to get the ball to fall out of the bottom of the game in a specific location. Given the arrangement of the game, how can we determine the optimal place to drop the ball, such that the probability of getting it to this specific location is maximized?
The image below shows an example of a game with five rows of five columns. Notice that the top row has five pegs, the next row has four pegs, the next five, and so on. With five columns, there are four choices to drop the ball into (indexed from 0). Note that in this example, there are three pegs missing. The top row is row 0, and the leftmost peg is column 0, so the coordinates of the missing pegs are (1,1), (2,1) and (3,2). In this example, the best place to drop the ball is on the far left, in column 0, which gives a 50% chance that it will end in the goal.
x.x.x.x.x
x...x.x
x...x.x.x
x.x...x
x.x.x.x.x
G
x indicates a peg, . indicates empty space.

Start at the bottom and assign a probability of 1 to the goal and 0 to other slots. Then for the next row up, assign probabilities as follows:
1) if there is no peg, use the probability directly below.
2) for a peg, use the average of the probabilities in the adjacent columns one row down.
This will simply propagate the probabilities to the top where each slot will be assigned the probability of reaching the goal from that slot. No tree, no recursion.

We can solve this problem using probability theory. We drop the ball in a position and recursively split the ball's path in its one (at the sidewall) or two possible directions. At the first step, we know with probability 1 the position of the ball (we are dropping it after all!). At each subsequent split into two directions, the probability halves. If we end up at the bottom row in the target location, we add the probability of path taken to our total. Repeat this process for all starting positions and take the highest probability of reaching the target.
We can improve this algorithm by removing the recursion and processing row-by-row using dynamic programming. Start with the first row set to all 0, except for the starting location which we set to 1. Then calculate the probabilities of reaching each cell in the next row by starting with an array of 0's and. For each cell in our current row, add half its probability to the cell to its left in the next row and half to its right, unless its against the sidewall in which case we add the full probability to the single cell. Continue doing this for each row until reaching the final row.
So far we've neglected the missing pegs. We can take them into account by having three probabilities for each cell: one for each direction the ball is currently travelling. In the end, we sum up all thre as direction doesn't matter.

This question was in Facebook Hacker Cup 2011.
marcog solution seems correct, but I solved a bit different. I solved like this:
Setup board: Read input, setup a NxM board, read missing pegs and insert holes on the board.
For each possible initial drop hole, do a BFS as follow:
Drop hole has 1.0 initial probability.
From current state you can either go down, left, right, left and right.
If you can only go down, left, or right, sum the current state probability and add it to the queue if it is not already on the queue. For example: if you are at (1, 2) with probability 0.5 and can only go down, sum 0.5 to state (2,2) and add it to the queue if it is not on the queue already.
If you can go left and right, sum half the current state probability to each possible next state and add them to the queue if they are not already there. For example: if you are at (3, 3) with probability 0.5 and can go both left and right, add 0.25 to (4, 2) and 0.25 to (4, 4) and them to the queue if they are not already there.
Update current best
Print global best.
My solution (not the cleanest code) in cpp can be downloaded from: https://github.com/piva/Programming-Challenges/blob/master/peggame.cpp
Hope that helped...

Observations:
For a given starting position, on each row there is a distribution of probabilities
From one full row to the next, the distribution will simply be blurred except for the edges.
Where there are holes, we will see predictable deviation from the blurring in (2)
We could separate these deviations out, since the balls are dropped one at a time, so the probabilities obey the superposition principle (quantum computers would be ideal here).
Separating out the deviations, we can see that really there is a set of holes overlaid on a grid of pegs, so we can calculate the distribution from the complete set of pegs first (easy) and then go through the pegs individually to see their effect - this assumes that there are more pegs than holes!
An edge is really a mirror - we can calculate for an infinite array of these mirrored virtual boards rather than using if conditions for the boundaries.
So I would start at the bottom, in the desired position, and spread the probability. The missing pegs effectively just skip a row, so you keep a register of vertically falling balls.
Ideally, I would start with a complete (fibonacci) tree, and for each missing of the missing pegs on a row add in the effect of them being missing.

O(R*C) solution
dp[i][j] gives the probability of the ball reaching the goal slot if it is currently at row i and in slot j.
The base case has dp[R-1][goal] = 1.0 and all other slots in row R-1 to 0.0
The recurrence then is
dp[i][j] = dp[i + 2][j] if the peg below is missing
dp[i][j] = dp[i + 1][left] if the peg is on the right wall
dp[i][j] = dp[i + 1][right] if the peg is on the left wall
dp[i][j] = (dp[i + 1][left] + dp[i + 1][right]) / 2 otherwise

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio