how does a best first search decide on equal distance nodes?

how does a best first search decide on equal distance nodes? - algorithm

am working on this assignment but am confused on which node will the best first search move next. using Manhattan distance, I found that all the nodes which are directly connected to the starting node have the same distance. my question is that since it is s BFS and am supposed to use Manhattan distance as evaluation function. how will the BFS decide which node to explore next.

const getHeuristic = (row, col) => {
return Math.abs(end.row - row) + Math.abs(end.col - col);
};
const NEIGHBORS = [
[-1, 0],
[0, 1],
[1, 0],
[0, -1],
];
const visitNeighbors = (node) => {
let { row, col } = node;
NEIGHBORS.forEach((neighbor) => {
let r = row + neighbor[0];
let c = col + neighbor[1];
if (checkIndexes(matrix, r, c)) {
//check to see if the neighbors of this node don't cause out of bounds issues. Like edge nodes.
let cost = getHeuristic(r, c);
pQueue.insert({ row: r, col: c, cost });
}
});
};
let begin = {
row: start.row,
col: start.col,
cost: 0,
};
pQueue.insert(begin);
while (pQueue.size() > 0) {
let cell = pQueue.pop();
if (isEquals(cell, end)) {
//handle when end is found. If you want to trace the path, the use a map to keep track of the parent node when you explore its children.
return;
}
visitNeighbors(cell);
}
For Best-first search,
You would use a heuristic function that best estimates the distance to goal from current node. For Manhattan distance, that function could be: Math.abs(goal.row - current.row) + Math.abs(goal.col - current.col). Here you take the different between the row values and column values of node you are currently processing and your goal node and add their absolute values together.
You would then have a priority queue in which you add the neighbors along with heuristic cost to get to them.
You would remove the least costing node from the priority queue based on the heuristic value and explore each of its neighbors and calculate heuristic cost to get to those nodes and push to the priority queue and so on until you reach the end node.
If the estimated distances of nodes are equal, then either node you choose will produce the correct result. Best First Search doesn't guarantee the shortest path. But to answer your question, you don't need to have a tie breaker if they have the same cost, just remove from priority queue, whatever gets removed is still correct. If you need the shortest path, then look into A-Star algorithm.
P.S. for further clarification, ask in the comments, you shouldn't create an answer to ask another question/clarification.

Related

Find guaranteed ancestors in directed graph

I'm trying to implement an algorithm to find what I call 'guaranteed ancestors' in a directed graph. I have a list of nodes which each can point to zero, one or multiple child nodes.
Below you see an example of a simple graph. I've marked all circles with a unique number.
Let's imagine we're trying to determine which nodes I'm guaranteed to have visited before reaching node 13 starting at node 0.
My thoughts when solving this simple example by hand is starting in node 13 and working my way back, which nodes am I guaranteed to visit no matter which direction I go. The first node I notice obeying this property is node 10, since no matter if I choose to visit node 11 or node 12, then I'm guaranteed to eventually reach node 13. Similarly I can conclude I have to visit node 9 if I want to reach node 13. Working all the way up the graph I conclude that node 13 has node 0, 1, 9, 10 as it's guaranteed anchestors.
I'm not sure what such an algorithm is called, but I'm sure there is a name for this specific search.
Here is the constraints you can assume about my graph.
There is a single defined "head/root" node, which is the only node without any other nodes pointing to it.
The graph is acyclic (Ideally the algorithm would be able to handle cycles too, but I have a different check, verifying that the graph is acyclic, so this is not a must.)
There is no "dead" nodes, eg. nodes which can't be reached from the head/root node.
This has to run on more complicated graphs with up to 500 nodes and many nodes with multiple "parents", which could be connected back and forth. Runtime is a priority as well - I assume we should be able to solve this problem in linear time complexity.
I've tried simplifying the problem to the point where I tried making an algorithm which could determine if a single node was a guaranteed anchestor of another node, which I believe is pretty simple to determine in O(n), however if I want a complete list of all guaranteed anchestors I assume I'd have to run this algorithm for every node, leaving me with O(n^2).
Does anyone know the correct name of the algorithm I'm describing?

Assign a weight of 1 to every edge
Run Dijkstra to find shortest path between head and root.
Assign weight of 2 * ( edge count of graph ) to every edge in path
Run Dijkstra to find cheapest path
Identify edges that are present in both paths. ( they could not be avoided although very expensive )
The nodes at both ends of every edge identified in 5 will be critical - i.e they must ALL be visted by any route between head and root.
Consider an example:
The first Dijkstra run would return a path containing node 1 or 2 ( they both belong on 5 hop paths. The second run would return a path containing the other of those two nodes

This is almost what the definition of an articulation or cut vertex is in an undirected graph. See Biconnected component:
a cut vertex is any vertex whose removal increases the number of connected components.
The difference is that your graph is directed, and that you consider the root also as such a vertex.
So my suggestion is to temporarily consider the graph to be undirected, and to apply a depth-first algorithm to identify such cut vertices, and include the root.
The algorithm is given as pseudo code in the same Wikipedia article. I have rewritten it in JavaScript, so it can be run here for the graph that you have given as example:
function buildAdjacencyList(n, edges) {
// Indexes in adj represent node identifiers.
// Values in adj are lists of neighbors: start out with empty lists
let adj = [];
for (let i = 0; i < n; i++) adj.push([]);
for (let [start, end] of edges) {
adj[start].push(end );
adj[end ].push(start); // make edge bidirectional
}
return adj;
}
function markArticulationPoints(nodes, node, depth) {
node.visited = true;
node.depth = depth;
node.low = depth;
for (let neighborId of node.neighbors) {
let neighbor = nodes[neighborId];
if (!neighbor.visited) {
neighbor.parent = node;
markArticulationPoints(nodes, neighbor, depth + 1);
if (neighbor.low >= node.depth) node.isArticulation = true;
if (neighbor.low < node.low) node.low = neighbor.low;
} else if (neighbor != node.parent && neighbor.depth < node.low) {
node.low = neighbor.depth;
}
}
}
function getArticulationPoints(adj, root) {
// Create object for each node, having meta data for algorithm
let nodes = [];
for (let i = 0; i < adj.length; i++) {
nodes.push({
neighbors: adj[i],
visited: false,
depth: Infinity,
low: Infinity,
parent: -1,
isArticulation: i == root // root is considered articulation point
});
}
markArticulationPoints(nodes, nodes[root], 0); // start DFS algorithm
// Collect articulation points from meta data
let result = [];
for (let i = 0; i < adj.length; i++) {
if (nodes[i].isArticulation) result.push(i);
}
return result;
}
// Build adjacency list for example graph, but with undirected edges
let adj = buildAdjacencyList(14, [
[0, 1],
[1, 2],
[1, 3],
[2, 4],
[2, 5],
[4, 5],
[4, 6],
[3, 7],
[7, 8],
[6, 9],
[8, 9],
[9, 10],
[10, 11],
[10, 12],
[11, 13],
[12, 13]
]);
let result = getArticulationPoints(adj, 0);
console.log("Articluation points:", ...result);

Computing depth of each node in a "maximally packed" DAG

(Note: I thought about asking this on https://cstheory.stackexchange.com/, but decided my question is not theoretical enough -- it's about an algorithm. If there is a better Stack Exchange community for this post, I'm happy to listen!)
I'm using the terminology "starting node" to mean a node with no links into it, and "terminal node" to mean a node with no links out of it. So the following graph has starting nodes A and B and terminal nodes F and G:
I want to draw it with the following rules:
at least one starting node has a depth of 0.
links always point from top to bottom
nodes are packed vertically as closely as possible
Using those rules, depth of for each node is shown for the graph above. Can someone suggest an algorithm to compute the depth of each node that runs in less than O(n^2) time?
update:
I tweaked the graph to show that the DAG may contain starting and terminal nodes at different depths. (This was a case that I didn't consider in my original buggy answer.) I also switched terminology from "x coordinate" to "depth" in order to emphasize that this is about "graphing" and not "graphics".

Your x coordinate of a node corresponds to the longest way from any node without incomming edges to this node in question. For a DAG it can be calculated in O(N):
given DAG G:
calculate incomming_degree[v] for every v in G
initialize queue q={v with incomming_degree[v]==0}, x[v]=0 for every v in q
while(q not empty):
v=q.pop() #retreive and delete first element
for(w in neighbors of v):
incomming_degree[w]--
if(incomming_degree[w]==0): #no further way to w exists, evaluate
q.offer(w)
x[w]=x[v]+1
x stores the desired information.

Here's one solution which is essentially a two-pass depth-first tree walk. The first pass (traverseA) traces the DAG from the starting nodes (A and B in the O.P.'s example) until encountering terminal nodes (F and G in the example). It them marks them with the maximum depth as traced through the graph.
The second pass (traverseB) starts at the terminal nodes and traces back towards the starting nodes, marking each node along the way with the node's current value OR the previous node's value minus one, whichever is smaller if the node hasn't been visited yet:
function labelDAG() {
nodes.forEach(function(node) { node.depth = -1; }); // initialize
// find and mark terminal nodes
startingNodes().forEach(function(node) { traverseA(node, 0); });
// walk backwards from the terminal nodes
terminalNodes().forEach(function(node) { traverseB(node); });
dumpGraph();
};
function traverseA(node, depth) {
var targets = targetsOf(node);
if (targets.length === 0) {
// we're at a leaf (terminal) node -- set depth
node.depth = Math.max(node.depth, depth);
} else {
// traverse each subtree with depth = depth+1
targets.forEach(function(target) {
traverseA(target, depth+1);
});
};
};
// walk backwards from a terminal node, setting each source node's depth value
// along the way.
function traverseB(node) {
sourcesOf(node).forEach(function(source) {
if ((source.depth === -1) || (source.depth > node.x - 1)) {
// source has not yet been visited, or we found a longer path
// between terminal node and source node.
source.depth = node.depth - 1;
}
traverseB(source);
});
};

sort graph by distance to end nodes

I have a list of nodes which belong in a graph. The graph is directed and does not contain cycles. Also, some of the nodes are marked as "end" nodes. Every node has a set of input nodes I can use.
The question is the following: How can I sort (ascending) the nodes in the list by the biggest distance to any reachable end node? Here is an example off how the graph could look like.
I have already added the calculated distance after which I can sort the nodes (grey). The end nodes have the distance 0 while C, D and G have the distance 1. However, F has the distance of 3 because the approach over D would be shorter (2).
I have made a concept of which I think, the problem would be solved. Here is some pseudo-code:
sortedTable<Node, depth> // used to store nodes and their currently calculated distance
tempTable<Node>// used to store nodes
currentDepth = 0;
- fill tempTable with end nodes
while( tempTable is not empty)
{
- create empty newTempTable<Node node>
// add tempTable to sortedTable
for (every "node" in tempTable)
{
if("node" is in sortedTable)
{
- overwrite depth in sortedTable with currentDepth
}
else
{
- add (node, currentDepth) to sortedTable
}
// get the node in the next layer
for ( every "newNode" connected to node)
{
- add newNode to newTempTable
}
- tempTable = newTempTable
}
currentDepth++;
}
This approach should work. However, the problem with this algorithm is that it basicly creates a tree from the graph based from every end node and then corrects old distance-calculations for every depth. For example: G would have the depth 1 (calculatet directly over B), then the depth 3 (calculated over A, D and F) and then depth 4 (calculated over A, C, E and F).
Do you have a better solution to this problem?

It can be done with dynamic programming.
The graph is a DAG, so first do a topological sort on the graph, let the sorted order be v1,v2,v3,...,vn.
Now, set D(v)=0 for all "end node", and from last to first (according to topological order) do:
D(v) = max { D(u) + 1, for each edge (v,u) }
It works because the graph is a DAG, and when done in reversed to the topological order, the values of all D(u) for all outgoing edges (v,u) is already known.
Example on your graph:
Topological sort (one possible):
H,G,B,F,D,E,C,A
Then, the algorithm:
init:
D(B)=D(A)=0
Go back from last to first:
D(A) - no out edges, done
D(C) = max{D(A) + 1} = max{0+1}=1
D(E) = max{D(C) + 1} = 2
D(D) = max{D(A) + 1} = 1
D(F) = max{D(E)+1, D(D)+1} = max{2+1,1+1} = 3
D(B) = 0
D(G) = max{D(B)+1,D(F)+1} = max{1,4}=4
D(H) = max{D(G) + 1} = 5
As a side note, if the graph is not a DAG, but a general graph, this is a variant of the Longest Path Problem, which is NP-Complete.
Luckily, it does have an efficient solution when our graph is a DAG.

Minimizing weighted sum

I came across this problem quite recently.
Suppose there are n points on x-axis, x[0],x[1] .. x[n-1].
Let the weight associated with each of these points be w[0],w[1] .. w[n-1].
Starting from any point between 0 to n-1, the objective is to cover all the points such that the sum of w[i]*d[i] is minimized where d[i] is the distance covered to reach the ith point from the starting point.
Example:
Suppose the points are: 1 5 10 20 40
Suppose the weights are: 1 2 10 50 13
If I choose to start at point 10 and choose to move to point 20 then to 5 then to 40 and then finally to 1, then the weighted sum will become 10*0+50*(10)+2*(10+15)+13*(10+15+35)+1*(10+15+35+39).
I have tried to solve it using greedy approach by starting off from the point which has maximum associated weight and move to second maximum weight point and so on. But the algorithm does not work. Can someone give me pointers about the approach which must be taken to solve this problem?

There's a very important fact that leads to a polynomial time algorithm:
Since the points are located on some axis, they generate a path graph, which means that for every 3 vertices v1,v2,v3, if v2 is between v1 and v3, then the distance between v1 and v3 equals the distance between v1 and v2 plus the distance between v2 and v3. therefor if for example we start at v1, the obj. function value of a path that goes first to v2 and then to v3 will always be less than the value of the path that first goes to v3 and then back to v2 because:
value of the first path = w[2]*D(v1,v2)+W[3]*(D(v1,v2)+D(v2,v3))
value of the second path = w[3]*D(v1,v3)+W[2]*((v1,v3)+D(v3,v2)) = w[3]*D(v1,v2)+w[3]*D(v2,v3)+w[2]*(D(v1,v2)+2*D(v3,v2))
If we subtract the first path value from the second, we are left with w[2]*2*D(v3,v2) which is equal to or greater than 0 unless you consider negative weights.
All this means that if we are located at a certain point, there are always only 2 options we should consider: going to closest unvisited point on the left or the closest unvisited point on the right.
This is very significant as it leaves us with 2^n possible paths rather than n! possible paths (like in the Travelling Salesman Problem).
Solving the TSP/minimum weight hamiltonian path on path graphs can be done in polynomial time using dynamic programming, you should apply the exact same method but modify the way you calculated the objective function.
Since you don't know the starting vertex, you'll have to run this algorithm n time, each time starting from a different vertex, which means the running time will be multiplied by n.

Maybe you should elaborate what you mean that the algorithm "does not work". The basic idea of the greedy approach that you described seems feasible for me. Do you mean that the greedy approach will not necessarily find the optimal solution? As it was pointed out in the comments, this might be an NP-complete problem - although, to be sure, one would have to analyze it further: Some dynamic programming, and maybe some prefix sums for the distance computations could lead to a polynomial time solution as well.
I quickly implemented the greedy solution in Java (hopefully I understood everything correctly...)
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class MinWeightSum
{
public static void main(String[] args)
{
double x[] = { 1, 5, 10, 20, 40 };
double w[] = { 1, 2, 10, 50, 13 };
List<Integer> givenIndices = Arrays.asList(2, 3, 1, 4, 0);
Path path = createPath(x, w, givenIndices);
System.out.println("Initial result "+path.sum);
List<Integer> sortedWeightIndices =
computeSortedWeightIndices(w);
Path greedyPath = createPath(x, w, sortedWeightIndices);
System.out.println("Greedy result "+greedyPath.sum);
System.out.println("For "+sortedWeightIndices+" sum "+greedyPath.sum);
}
private static Path createPath(
double x[], double w[], List<Integer> indices)
{
Path path = new Path(x, w);
for (Integer i : indices)
{
path.append(i);
}
return path;
}
private static List<Integer> computeSortedWeightIndices(final double w[])
{
List<Integer> indices = new ArrayList<Integer>();
for (int i=0; i<w.length; i++)
{
indices.add(i);
}
Collections.sort(indices, new Comparator<Integer>()
{
#Override
public int compare(Integer i0, Integer i1)
{
return Double.compare(w[i1], w[i0]);
}
});
return indices;
}
static class Path
{
double x[];
double w[];
int prevIndex = -1;
double distance;
double sum;
Path(double x[], double w[])
{
this.x = x;
this.w = w;
}
void append(int index)
{
if (prevIndex != -1)
{
distance += Math.abs(x[prevIndex]-x[index]);
}
sum += w[index] * distance;
prevIndex = index;
}
}
}
The sequence of indices that you described in the example yields the solution
For [2, 3, 1, 4, 0] sum 1429.0
The greedy approach that you described gives
For [3, 4, 2, 1, 0] sum 929.0
The best solution is
For [3, 2, 4, 1, 0] sum 849.0
which I found by checking all permutations of indices (This is not feasible for larger n, of course)

Suppose you are part way through a solution and have traveled for distance D so far. If you go a further distance x and see a point with weight w it costs you (D + x)w. If you go a further distance y and see a point with weight v it costs you (D + x + y)v.. If you sum all of this up there is a component that depends on the path you take after the distance D: xw + xv + yv+..., and there is a component that depends on distance D and the sum of the weights of the points that you need to carry: D (v + w + ...). But the component that depends on distance D does not depend on anything else except the sum of the weights of the points you need to visit, so it is fixed, in the sense that it is the same regardless of the path you take after going distance D.
It always make sense to visit points we pass as we visit them, so the best path will start off with a single point (possibly at the edge of the set of points to be visited and possibly in the centre) and then expand this to an interval of visited points, and then expand this to visit all the points. To pre-calculate the relative costs of visiting all points outside the interval we only need to know the current position and the size of the interval, not the distance travelled so far.
So an expensive but polynomial dynamic programming approach has as the state the current position (which must be one of the points) the position of the first, if any, unvisited point to the left of the current position, and the position, if any, of the first unvisited point to the right of the current point. There are at most two points we should consider visiting next - the point to the right of the current point and the point to the left of the current point. We can work out the cost of these two alternatives by looking at pre-computed costs for states with fewer points left, and store the best result as the best possible cost from this point. We could compute these costs under the fiction that D=0 at the time we reach the current point. When we look up stored costs they are also stored under this assumption (but with D=0 at their current point, not our current point), but we know the sum of the weights of points left at that stage, so we can add to the stored cost that sum of weights times the distance between our current point and the point we are looking up costs for to compensate for this.
That gives cost O(n^3), because you are building a table with O(n^3) cells, with each cell the product of a relatively simple process. However, because it never makes sense to pass cells without visiting them, the current point must be next to one of the two points at either end of the interval, so we need consider only O(n^2) possibilities, which cuts the cost down to O(n^2). A zig-zag path such as (0, 1, -1, 2, -2, 3, -3, 4, -4...) might be the best solution for suitably bizarre weights, but it is still the case, even for instance when going from -2 to 3, that -2 to is the closest point not yet taken between the two points 3 and -3.
I have put an attempted java implementation at http://www.mcdowella.demon.co.uk/Plumber.java. The test harness checks this DP version against a (slow) almost exhaustive version for a number of randomly generated test cases of length up to and including 12. It still may not be completely bug-free, but hopefully it will fill in the details.

For every vertex in a graph, find all vertices within a distance d

In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A breadth-first search on every vertex is also an option, but it is slow.

def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)

Let's say that we have a function verticesWithin(d,x) that finds all vertices within distance d of vertex x.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x) if d >= 1 is the union of vertices(d-1,y[i]) for all i within range, where y=verticesWithin(1,x). If d == 0 then it's simply {x}. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x itself as a member of y.
Also, if the return type of verticesWithin(d,x) is changed from a simple list or set, to a list of d sets representing increasing distance from x, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.

You already mention the option of calculating A^d, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v of zeros and ones, representing a set of vertices. The vector w := A v now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w now has a positive entry for each node that can be accessed from the jth node in at most d steps.

Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio