Rank finding problem: In the 2 dimension space, we shall say that a point A=(a1,a2) dominates a point B=(b1,b2) if and only if a1>b1 and b1>b2. Given a set of n points, the rank of a point X is the number of points dominated by X. Design an algorithm to find the rank of every point.
Sort points by their first coordinate. Then insert them into order-statistics tree, which sorts them by second coordinate.
Rank of the point in the order-statistics tree at the time it is inserted is exactly the number of points, dominated by this point.
Use a stable sort twice to sort by the first attribute and then by the second attribute. The positon in the final sorted array gives the number of points that a given point is dominating.
The wavelet tree data structure solves this problem. I think the construction of it is essentially the same process as what Evgeny and pogo have described.
Related
I have two list of 2d vectors of size m and n and I need to find minimum distance between each node of first list with the nodes of 2d list. I wonder if it is possible to do it in better time than O(nm). Let's assume that we can change data structures as we please. What do you think?
I don't understand why inserting an edge in adjacency matrix takes O(1) time.
For example we want to add an edge from vertex 3 to 5, in oriented graph we need to change graph[2][4] to 1. In oriented do the other way round also.
How can it possible be O(1), if we at least once have to go find the correct row in array, so its already O(|V|)?
In 2D array all operations are considered as O(1).
In 2D array you don't go linearly to find the find the row and the column to add the data.
Here
a[i][[j] = k
is an O(1) operation as you can refer the position of the array directly as index rather than going linearly.
However in Linkedlist it is true that you have to go and find the row/column by visiting all the row/column one by one.
I have a set of N points (in particular this point are binary string) and for each of them I have a discrete metric (the Hamming distance) such that given two points, i and j, Dij is the distance between the i-th and the j-th point.
I want to find a subset of k elements (with k < N of course) such that the distance between this k points is the maximum as possibile.
In other words what I want is to find a sort of "border points" that cover the maximum area in the space of the points.
If k = 2 the answer is trivial because I can try to search the two most distant element in the matrix of distances and these are the two points, but how I can generalize this question when k>2?
Any suggest? It's a NP-hard problem?
Thanks for the answer
One generalisation would be "find k points such that the minimum distance between any two of these k points is as large as possible".
Unfortunately, I think this is hard, because I think if you could do this efficiently you could find cliques efficiently. Suppose somebody gives you a matrix of distances and asks you to find a k-clique. Create another matrix with entries 1 where the original matrix had infinity, and entries 1000000 where the original matrix had any finite distance. Now a set of k points in the new matrix where the minimum distance between any two points in that set is 1000000 corresponds to a set of k points in the original matrix which were all connected to each other - a clique.
This construction does not take account of the fact that the points correspond to bit-vectors and the distance between them is the Hamming distance, but I think it can be extended to cope with this. To show that a program capable of solving the original problem can be used to find cliques I need to show that, given an adjacency matrix, I can construct a bit-vector for each point so that pairs of points connected in the graph, and so with 1 in the adjacency matrix, are at distance roughly A from each other, and pairs of points not connected in the graph are at distance B from each other, where A > B. Note that A could be quite close to B. In fact, the triangle inequality will force this to be the case. Once I have shown this, k points all at distance A from each other (and so with minimum distance A, and a sum of distances of k(k-1)A/2) will correspond to a clique, so a program finding such points will find cliques.
To do this I will use bit-vectors of length kn(n-1)/2, where k will grow with n, so the length of the bit-vectors could be as much as O(n^3). I can get away with this because this is still only polynomial in n. I will divide each bit-vector into n(n-1)/2 fields each of length k, where each field is responsible for representing the connection or lack of connection between two points. I claim that there is a set of bit-vectors of length k so that all of the distances between these k-long bit-vectors are roughly the same, except that two of them are closer together than the others. I also claim that there is a set of bit-vectors of length k so that all of the distances between them are roughly the same, except that two of them are further apart than the others. By choosing between these two different sets, and by allocating the nearer or further pair to the two points owning the current bit-field of the n(n-1)/2 fields within the bit-vector I can create a set of bit-vectors with the required pattern of distances.
I think these exist because I think there is a construction that creates such patterns with high probability. Create n random bit-vectors of length k. Any two such bit-vectors have an expected Hamming distance of k/2 with a variance of k/4 so a standard deviation of sqrt(k)/2. For large k we expect the different distances to be reasonably similar. To create within this set two points that are very close together, make one a copy of the other. To create two points that are very far apart, make one the not of the other (0s in one where the other has 1s and vice versa).
Given any two points their expected distance from each other will be (n(n-1)/2 - 1)k/2 + k (if they are supposed to be far apart) and (n(n-1)/2 -1)k/2 (if they are supposed to be close together) and I claim without proof that by making k large enough the expected difference will triumph over the random variability and I will get distances that are pretty much A and pretty much B as I require.
#mcdowella, I think that probably I don't explain very well my problem.
In my problem I have binary string and for each of them I can compute the distance to the other using the Hamming distance
In this way I have a distance matrix D that has a finite value in each element D(i,j).
I can see this distance matrix like a graph: infact, each row is a vertex in the graph and in the column I have the weight of the arc that connect the vertex Vi to the vertex Vj.
This graph, for the reason that I explain, is complete and it's a clique of itself.
For this reason, if i pick at random k vertex from the original graph I obtain a subgraph that is also complete.
From all the possible subgraph with order k I want to choose the best one.
What is the best one? Is a graph such that the distance between the vertex as much large but also much uniform as possible.
Suppose that I have two vertex v1 and v2 in my subgraph and that their distance is 25, and I have three other vertex v3, v4, v5, such that
d(v1, v3) = 24, d(v1, v4) = 7, d(v2, v3) = 5, d(v2, v4) = 22, d(v1, v5) = 14, d(v1, v5) = 14
With these distance I have that v3 is too far from v1 but is very near to v2, and the opposite situation for v4 that is too far from v2 but is near to v1.
Instead I prefer to add the vertex v5 to my subgraph because it is distant to the other two in a more uniform way.
I hope that now my problem is clear.
You think that your formulation is already correct?
I have claimed that the problem of finding k points such that the minimum distance between these points, or the sum of the distances between these points, is as large as possible is NP-complete, so there is no polynomial time exact answer. This suggests that we should look for some sort of heuristic solution, so here is one, based on an idea for clustering. I will describe it for maximising the total distance. I think it can be made to work for maximising the minimum distance as well, and perhaps for other goals.
Pick k arbitrary points and note down, for each point, the sum of the distances to the other points. For each other point in the data, look at the sum of the distances to the k chosen points and see if replacing any of the chosen points with that point would increase the sum. If so, replace whichever point increases the sum most and continue. Keep trying until none of the points can be used to increase the sum. This is only a local optimum, so repeat with another set of k arbitrary/random points in the hope of finding a better one until you get fed up.
This inherits from its clustering forebear the following property, which might at least be useful for testing: if the points can be divided into k classes such that the distance between any two points in the same class is always less than the distance between any two points in different classes then, when you have found k points where no local improvement is possible, these k points should all be from different classes (because if not, swapping out one of a pair of points from the same class would increase the sum of distances between them).
This problem is known as the MaxMin Diversity Problem (MMDP). It is known to be NP-hard. However, there are algorithms for giving good approximate solutions in reasonable time, such as this one.
I'm answering this question years after it was asked because I was looking for algorithms to solve the same problem, and had trouble even finding out what to call it.
The minimum Manhattan distance between any two points in the cartesian plane is the sum of the absolute differences of the respective X and Y axis. Like, if we have two points (X,Y) and (U,V) then the distance would be: ABS(X-U) + ABS(Y-V). Now, how should I determine the minimum distance between several pairs of points moving only parallel to the coordinate axis such that certain given points need not be visited in the selected path. I need a very efficient algorithm, because the number of avoided points can range up to 10000 with same range for the number of queries. The coordinates of the points would be less than ABS(50000). I would be given the set of points to be avoided in the beginning, so I might use some offline algorithm and/or precomputation.
As an example, the Manhattan distance between (0,0) and (1,1) is 2 from either path (0,0)->(1,0)->(1,1) or (0,0)->(0,1)->(1,1). But, if we are given the condition that (1,0) and (0,1) cannot be visited, the minimum distance increases to 6. One such path would then be: (0,0)->(0,-1)->(1,-1)->(2,-1)->(2,0)->(2,1)->(1,1).
This problem can be solved by breadth-first search or depth-first search, with breadth-first search being the standard approach. You can also use the A* algorithm which may give better results in practice, but in theory (worst case) is no better than BFS.
This is provable because your problem reduces to solving a maze. Obviously you can have so many obstacles that the grid essentially becomes a maze. It is well known that BFS or DFS are the only way to solve mazes. See Maze Solving Algorithms (wikipedia) for more information.
My final recommendation: use the A* algorithm and hope for the best.
You are not understanding the solutions here or we are not understanding the problem:
1) You have a cartesian plane. Therefore, every node has exactly 4 adjacent nodes, given by x+/-1, y+/-1 (ignoring the edges)
2) Do a BFS (or DFS,A*). All you can traverse is x/y +/- 1. Prestore your 10000 obstacles and just check if the node x/y +/-1 is visitable on demand. you don't need a real graph object
If it's too slow, you said you can do an offline calculation - 10^10 only requires 1.25GB to store an indexed obstacle lookup table. leave the algorithm running?
Where am I going wrong?
I'm currently trying to understand dynamic programming, and I found an interesting problem : "Given a chess board of nxn squares and a starting position(xs,ys), find the shortest (as in no. of moves) path a knight can take to an end position(xe,ye)". This is how my solution would sound like :
Initialize the matrix representing the chess board (except the "square" xs,ys) with infinity.
The first value in a queue is the square xs,ys.
while(the queue is not empty){
all the squares available from the first square of the queue (respecting the rules of chess) get "refreshed"
if (i modified the distance value for a "square")
add the recently modified square to the queue
}
Can someone please help me find out what's the computing-time O value for this function? I (kind of) understand big-O, but I just can't put a value for this particular function.
Because you are using a queue, the order that you process the squares is going to be in order of minimum distance. This means that you will only ever modify the distance value for a square once, and therefore the time will be O(n^2), since there are n^2 squares.
Your algorithm is worded poorly
You don't define the contents of your "queue"
you don't define "refreshed"
you're always stuck on the first square, you're not keeping track of a current square.
also, Google Djkistra's algorithm No, don't do dijkstra's algorithm. you don't have a weighted graph.
If you want to use a dynamic programming algorithm to brute force your way to an answer, I'd start at (xe,ye), and you should be able to get O(n^2) on a nxn grid
but if you consider your constraints(your piece moves like a knight, and he moves along a grid, and not an arbitrary graph) you should be able to do this problem in O(n) time
Sounds somewhat like Dijkstra's shortest path algorithm. In which case it is O(N^2), you're finding the "distance" for all possible paths from source to destination in order to determine the lowest one.
This is a breadth first search in my opinion. It is clear that you add a square at most once in the queue and the processing of a queue entry is O(1), so the total complexity is bounded by O(N^2). However, if you can prove a theorem that tells the number of moves to get from position A to B on a NxN chess board is less than N (and intuitively this sounds reasonable for N equals or greater than 8), then your algorithm will be O(N).