Single pair shortest path in a matrix - algorithm

Let a MxN matrix where the start is at position (0,0) and the finish at (M-1,N-1)
. Every square has a positive integer which is the cost to move on this square. What algorithm should I use for the shortest path from start to finish? We can recreate the problem using graphs. Where the squares are the vertices and the costs are weighted edges.
Example

Use Dijkstra. As soon as you hear "shortest path", look at Dijkstra. In particular, if you search for "dijkstra adjacency matrix" on stack overflow, you will get over a dozen questions discussing various aspects of how to apply Dijkstra on a graph represented as a matrix. You can build an adjacency matrix from your input matrix by looping through the input as follows:
create a (rows * cols) * (rows * cols) adjacency matrix
for each square at position row,col in the input matrix
let v = row*cols + col
for all neighboring positions u (at row,col+1; row+1,col; and so on)
set the cost of going from v to u to input[row][col]
You can even skip building the adjacency matrix, and simply calculate neighbors and distance-to-neighbors on the fly. This can save quite a lot of memory, at the expense of extra runtime.
You can also save some space by representing the graph as an adjacency list, but they are slightly more complicated to implement, and you seem to be just starting out.

Related

Greedy colouring algorithm on graph in adjacency list representation

Suppose you have been given a simple undirected graph and the graph has a max degree of d. You are given d + 1 colors, represented by numbers starting from 0 to d and you want to return a valid placement of colors such that no two adjacent vertices share the same color. And as the title suggests, the graph is given in adjacency list representation. The algorithm should run in O(V+E) time.
I think the correct way to approach this is by using a greedy coloring algorithm. However, this may sound stupid but I am stuck on the part where I try to find the first available color that hasn't been used by its neighbors for each vertex. I don't really know how I can do it so that it runs in O(number of neighbors) time for each vertex that helps to fit under time complexity requirements.

Find a subset of k most distant point each other

I have a set of N points (in particular this point are binary string) and for each of them I have a discrete metric (the Hamming distance) such that given two points, i and j, Dij is the distance between the i-th and the j-th point.
I want to find a subset of k elements (with k < N of course) such that the distance between this k points is the maximum as possibile.
In other words what I want is to find a sort of "border points" that cover the maximum area in the space of the points.
If k = 2 the answer is trivial because I can try to search the two most distant element in the matrix of distances and these are the two points, but how I can generalize this question when k>2?
Any suggest? It's a NP-hard problem?
Thanks for the answer
One generalisation would be "find k points such that the minimum distance between any two of these k points is as large as possible".
Unfortunately, I think this is hard, because I think if you could do this efficiently you could find cliques efficiently. Suppose somebody gives you a matrix of distances and asks you to find a k-clique. Create another matrix with entries 1 where the original matrix had infinity, and entries 1000000 where the original matrix had any finite distance. Now a set of k points in the new matrix where the minimum distance between any two points in that set is 1000000 corresponds to a set of k points in the original matrix which were all connected to each other - a clique.
This construction does not take account of the fact that the points correspond to bit-vectors and the distance between them is the Hamming distance, but I think it can be extended to cope with this. To show that a program capable of solving the original problem can be used to find cliques I need to show that, given an adjacency matrix, I can construct a bit-vector for each point so that pairs of points connected in the graph, and so with 1 in the adjacency matrix, are at distance roughly A from each other, and pairs of points not connected in the graph are at distance B from each other, where A > B. Note that A could be quite close to B. In fact, the triangle inequality will force this to be the case. Once I have shown this, k points all at distance A from each other (and so with minimum distance A, and a sum of distances of k(k-1)A/2) will correspond to a clique, so a program finding such points will find cliques.
To do this I will use bit-vectors of length kn(n-1)/2, where k will grow with n, so the length of the bit-vectors could be as much as O(n^3). I can get away with this because this is still only polynomial in n. I will divide each bit-vector into n(n-1)/2 fields each of length k, where each field is responsible for representing the connection or lack of connection between two points. I claim that there is a set of bit-vectors of length k so that all of the distances between these k-long bit-vectors are roughly the same, except that two of them are closer together than the others. I also claim that there is a set of bit-vectors of length k so that all of the distances between them are roughly the same, except that two of them are further apart than the others. By choosing between these two different sets, and by allocating the nearer or further pair to the two points owning the current bit-field of the n(n-1)/2 fields within the bit-vector I can create a set of bit-vectors with the required pattern of distances.
I think these exist because I think there is a construction that creates such patterns with high probability. Create n random bit-vectors of length k. Any two such bit-vectors have an expected Hamming distance of k/2 with a variance of k/4 so a standard deviation of sqrt(k)/2. For large k we expect the different distances to be reasonably similar. To create within this set two points that are very close together, make one a copy of the other. To create two points that are very far apart, make one the not of the other (0s in one where the other has 1s and vice versa).
Given any two points their expected distance from each other will be (n(n-1)/2 - 1)k/2 + k (if they are supposed to be far apart) and (n(n-1)/2 -1)k/2 (if they are supposed to be close together) and I claim without proof that by making k large enough the expected difference will triumph over the random variability and I will get distances that are pretty much A and pretty much B as I require.
#mcdowella, I think that probably I don't explain very well my problem.
In my problem I have binary string and for each of them I can compute the distance to the other using the Hamming distance
In this way I have a distance matrix D that has a finite value in each element D(i,j).
I can see this distance matrix like a graph: infact, each row is a vertex in the graph and in the column I have the weight of the arc that connect the vertex Vi to the vertex Vj.
This graph, for the reason that I explain, is complete and it's a clique of itself.
For this reason, if i pick at random k vertex from the original graph I obtain a subgraph that is also complete.
From all the possible subgraph with order k I want to choose the best one.
What is the best one? Is a graph such that the distance between the vertex as much large but also much uniform as possible.
Suppose that I have two vertex v1 and v2 in my subgraph and that their distance is 25, and I have three other vertex v3, v4, v5, such that
d(v1, v3) = 24, d(v1, v4) = 7, d(v2, v3) = 5, d(v2, v4) = 22, d(v1, v5) = 14, d(v1, v5) = 14
With these distance I have that v3 is too far from v1 but is very near to v2, and the opposite situation for v4 that is too far from v2 but is near to v1.
Instead I prefer to add the vertex v5 to my subgraph because it is distant to the other two in a more uniform way.
I hope that now my problem is clear.
You think that your formulation is already correct?
I have claimed that the problem of finding k points such that the minimum distance between these points, or the sum of the distances between these points, is as large as possible is NP-complete, so there is no polynomial time exact answer. This suggests that we should look for some sort of heuristic solution, so here is one, based on an idea for clustering. I will describe it for maximising the total distance. I think it can be made to work for maximising the minimum distance as well, and perhaps for other goals.
Pick k arbitrary points and note down, for each point, the sum of the distances to the other points. For each other point in the data, look at the sum of the distances to the k chosen points and see if replacing any of the chosen points with that point would increase the sum. If so, replace whichever point increases the sum most and continue. Keep trying until none of the points can be used to increase the sum. This is only a local optimum, so repeat with another set of k arbitrary/random points in the hope of finding a better one until you get fed up.
This inherits from its clustering forebear the following property, which might at least be useful for testing: if the points can be divided into k classes such that the distance between any two points in the same class is always less than the distance between any two points in different classes then, when you have found k points where no local improvement is possible, these k points should all be from different classes (because if not, swapping out one of a pair of points from the same class would increase the sum of distances between them).
This problem is known as the MaxMin Diversity Problem (MMDP). It is known to be NP-hard. However, there are algorithms for giving good approximate solutions in reasonable time, such as this one.
I'm answering this question years after it was asked because I was looking for algorithms to solve the same problem, and had trouble even finding out what to call it.

Using Single Source Shortest Path to traverse a chess board

Say we have a n x n chess board (or a matrix in other words) and each square has a weight to it. A piece can move horizontally or vertically, but it can't move diagonally. The cost of each movement will be equal to the difference of the two squares on the chess board. Using an algorithm, I want to find the minimum cost for a single chess piece to move from the square (1,1) to square (n,n) which has a worst-case time complexity in polynomial time.
Could dikstras algorithm be used to solve this? Would my algorithm below be able to solve this problem? Diijkstras can already be ran in polynomial time, but what makes it this time complexity?
Pseudocode:
We have some empty set S, some integer V, and input a unweighted graph. After that we complete a adjacency matrix showing the cost of an edge without the infinity weighted vertices and while the matrix hasn't picked all the vertices we find a vertex and if the square value is less then the square we're currently on, move to that square and update V with the difference between the two squares and update S marking each vertices thats been visited. We do this process until there are no more vertices.
Thanks.
Since you are trying to find a minimum cost path, you can use Dijkstra's for this. Since Dijkstra is O(|E| + |V|log|V|) in the worst case, where E is the number of edges and V is the number of verticies in the graph, this satisfies your polynomial time complexity requirement.
However, if your algorithm considers only costs associated with the beginning and end square of a move, and not the intermediate nodes, then you must connect all possible beginning and end squares together so that you can take "short-cuts" around the intermediate nodes.

Edge lists vs adjacency lists vs and adjacency matrix

I'm preparing to create a maze solving program. As with stated in these two questions: graphs representation : adjacency list vs matrix &&
Size of a graph using adjacency list versus adjacency matrix? they give an explanation of the differences in using the adjacency lists vs adjacency matrices. Unfortunately, I cannot decide on the pros and cons of an edge lists compared to these other two since I have found very little on adjacency matrices and edge lists.
An example of going through the adjacent list for the maze (I think) would be:
insertVertex(V) : O(1)
insertEdge(Vertex, Vertex, E) : O(1)
removeVertex(Vertex) : O(deg(v))
removeEdge(Edge) : O(m)
vertices() : O(n)
edges() : O(m)
areAdjacent(Vertex, Vertex) : O(min(deg(v),deg(w))
endVertices(Edge) : O(1)
incidentEdges(Vertex) : O(deg(v))
space complexity : O(n+m)
So my question is, which has the best time cost an edge list, adjacency list, or adjacency matrix for this maze solving problem?
Let's start from "classical" mazes. They are defined as rectangular grid, each cell of which is either corridor or wall. Player can move one cell at the time in one of four directions (top, left, bottom, right). Maze example:
S..#.##E
.#.#.#..
.#...#.#
.#.###.#
##.....#
Player starts at position marked S and should reach position E.
For now let's present each blank cell as graph vertex. Then each vertex can have at most 4 neighbours. In terms of space usage adjacency list clearly wins - 4*V vs V^2.
Simplest efficient shortest path algorithm for grid maze would be BFS. For huge mazes it can be replaced by A*. Both of these algorithms have only one "edge related" operation: take all neighbours for given node. This is O(1) (we have at most 4 neighbours) for adjacency list and O(V) for adjacency matrix.
To save space we can create vertices only for crossroads. However this has no impact on calculations above (number of vertices will go down but it will be still greater than 4).
As a conclusion, for grid representation of a maze adjacency list wins in terms of both time and space usage.
General case
Every maze can be modelled as a set of rooms (vertices) with corridors (edges) that lead to different rooms. Usually number of rooms is much bigger than number of corridors for single room. In this case arguments for adjacency lists still holds.
Additional note. For grid maze it's often more easy just to use grid representation as is (2-dimensional array with booleans) without creation of additional graph structures.

Embedding Graph in Euclidean Space

I have a total undirected graph where nodes represent points on points on a plane, and edges are approximate euclidean distances between the points. I would like to "embed" this graph in a two dimensional space. That is, I want to convert each vertex to an (x,y) position tuple so that for any two two vertices v and w, the edge (v,w) has weight close to dist(v,w).
For example, if I had the graph with nodes A, B, C, and D and edges with weights (A,B): 20; (A,C): 22; (A,D): 26; (B, C): 30; (B, D): 20, (C, D): 19, then you could assign the points A: (0,0); B: (10, 0); C: (0, 10); D: (10, 10). Clearly this is imperfect, but it is a reasonable approximation.
I don't care about getting the best possible solution, I just want a reasonable one in a reasonable amount of time.
(In case you want the motivation for this problem. I have a physical system where I have noisy measurements of distances from all pairs of points. Distance measurements are noisy, but tend to be within a factor of two of the true value. I have made all of these measurements, and now have a graph with several thousand nodes, and several million edges, and want to place the points on a plane.)
You may be able to adapt the force-based graph drawing algorithm for your needs.
This algorithm attempts to find a good layout for an undirected graph G(V,E) by treating each vertex in V as a Cartesian point and each edge in E as a linear spring. Additionally, a pair-wise repulsive force (i.e. Coulomb's law) is calculated between vertices globally - this prevents the clustering of vertices in Cartesian space that are non-adjacent in G(V,E).
In your case you could set the equilibrium length of the springs equal to your edge weights - this should give a layout with pair-wise Euclidean vertex distances close to your edge weights.
The algorithm updates an initial distribution (possibly random) in a pseudo-time stepping fashion based on the sum of forces at each vertex. The algorithm terminates when an approximate steady-state is reached. A simplified pseudo-code:
while(not converged)
for i = vertices in V
F(i) = sum of spring + repulsive forces on ith vertex
endfor
Update vertex positions based on force vector F
if (vertex positions not changing much)
converged = true
endif
endwhile
There are a number of optimisations that can be applied to reduce the complexity of the algorithm. For instance, a spatial index (such as a quadtree) can be used to allow for efficient calculation of an approximate repulsive force between "near-by" vertices, rather than the slow global calculation. It's also possible to use multi-level graph agglomeration techniques to improve convergence and optimality.
Finally, note that there are several good libraries for graph drawing that implement optimised versions of this algorithm - you might want to check out Graphviz for instance.
For starters, I think I'd go for a heuristic search approach.
You actually want to find a set of point p1,p2,...,p_n that minimizes the function:
f(X) = Sum (|dist(p_i,p_j) - weight(n_i,n_j)|) [for each i,j ]
The problem can be heuristically solved by some algorithms including Hill Climbing and Genetic Algorithms.
I personally like Hill Climbing, and the approach is as follows:
best <- [(0,0),(0,0),...,(0,0)]
while there is still time:
S <- random initialized vector of points
flag <- true
while (flag):
flag <- false
candidates <- next(S) (*)
S <- X in candidates such that f(X) <= f(Y) for each X in candidates (**)
if f(S) was improved:
flag <- true
if f(S) <= f(best):
best <- S
return best
(*) next() generates a list of candidates. It can utilize information about gradient of function (and basically decay into something similar to gradient descent) for example, or sample a few random 'directions' and put them as candidates (all in the multi-dimensional vector, where each point is a dimension).
(**) In here, you basically chose the "best" candidate, and store it in S, so you will continue with it in next iteration.
Note, the algorithm is any-time, so it is expected to get better the more time you have to give it. This behavior is achieved by the random initialization of starting point - which is likely to change the ending result, and by the random selection of points for candidates.

Resources