Removing edge that causes a cycle in an undirected graph - algorithm

I have a graph that is represented by an adjacency matrix, G and I am trying to use DFS to remove an edge that causes a cycle
There can be multiple cycles, but I figure it is probably best to remove them one at a time, so I only need my algorithm to find one cycle, and that can be repeated.
Here is the code for what I have got so far:
function [ G, c_flag, c_stack, o_stack, cptr, optr ] =...
dfs_cycle( G, curr_v, c_stack, o_stack, cptr, optr, c_flag )
% add current vertex to open list
optr = optr + 1;
o_stack(optr) = curr_v;
% find adjacent vertices
adj_v = find(G(curr_v,:));
for next_v = adj_v
% ensure next_v is not in closed list
if ~any(c_stack == next_v)
% if next_v in open list then cycle exists
if any(o_stack == next_v)
% remove edge and set flag to 1
G(curr_v, next_v) = 0;
G(next_v, curr_v) = 0;
c_flag = 1;
break;
end
[G, c_flag, c_stack, o_stack, cptr, optr] =...
dfs_cycle(G, next_v, c_stack, o_stack, cptr, optr, c_flag);
if c_flag == 1
break;
end
% remove vertex from open list and put into closed list
o_stack(optr) = 0;
optr = optr - 1;
cptr = cptr + 1;
c_stack(cptr) = next_v;
end
end
end
the function is called using:
v_list = find(sum(G)>0);
o_stack = zeros(1,numel(v_list));
c_stack = o_stack;
optr = 0;
cptr = 0;
root_v = v_list(randperm(length(v_list),1));
c_flag = 0;
[G_dash,c_flag,~,~,~,~] =...
dfs_cycle(G, root_v, c_stack, o_stack, cptr, optr, c_flag);
It should return the modified (if cycle found) adjacency matrix, G_dash and c_flag corresponding to whether a cycle was found or not.
However, it doesnt seem to be functioning as it should.
I think I have located the problem; in the line if any(o_stack == next_v) it will return true, because the previous vertex visited is usually still in o_stack, however I am not sure how I should go about fixing this.
Does anyone have any ideas?

A connected, un-directed, acyclic graph is called a tree, with n nodes and n - 1 edges. For a formal proof, see here.
So, to form a tree from your graph, you just need to run DFS once, and keep all edges used by this DFS (for more information about tree created by DFS, see wiki link, example section). Those unused edges can be removed.

Related

Increasing efficiency on a network randomization algorithm on Julia

My problem is the following. I have the adjacency matrix Mat for a neural network. I want to randomize this network in the sense that I want to choose 4 notes randomly (say i,j,p,q) such that i and p are connected (which means Mat[p,i] = 1) and j and q are connected AND i and q are not connected (Mat[q,j] = 0)and j and p are not connected. I then connect i and q and j and p and disconnect the previous nodes. In one run, I want to do this 10^6 times.
So far I have two versions, one using a for loop and one recursively.
newmat = copy(Mat)
for trial in 1:Niter
count = 0
while count < 1
i,j,p,q = sample(Nodes,4,replace = false) #Choosing 4 nodes at random
if (newmat[p,i] == 1 && newmat[q,j] == 1) && (newmat[p,j] == 0 && newmat[q,i] == 0)
newmat[p,i] = 0
newmat[q,j] = 0
newmat[p,j] = 1
newmat[q,i] = 1
count += 1
end
end
end
Doing this recursively runs about just as fast until Niter = 10^4 after which I get a Stack Overflow error. How can I improve this?
I assume you are talking about a recursive variant of the for trial in 1:Niter.
To avoid stack overflows like this, a general rule of thumb (in languages without tail recursion elimination) is to not use recursion unless you know the recursion depth will not scale more than logarithmically.
The cases where this is applicable is mostly algorithms that are like tree traversals, with a "naturally occuring" recursive structure. Your case of a simple for loop can be viewed as the degenerate variant of that, with a "linked list" tree, but is not a all natural.
Just don't do it. There's nothing bad about a loop for some sequential processing like this. Julia is an imperative language, after all.
(If you want to do this with a recursive structure for fun or exercise: look up trampolines. They allow you to write code structured as tail recursive, but with the allocation happening by mutation and on the heap.)
Instead of sampling 4 random nodes and hoping they happen to be connected, you can sample the starting nodes p and q, and look for i and j within the nodes that these are connected to. Here's an implementation of that:
function randomizeconnections(adjmatin)
adjmat = copy(adjmatin)
nodes = axes(adjmat, 2)
niter = 10
for trial in 1:niter
p, q = sample(nodes, 2, replace = false)
#views plist, qlist = findall(adjmat[p, :]), findall(adjmat[q, :])
filter!(i -> !in(i, qlist) && i != q, plist)
filter!(j -> !in(j, plist) && j != p, qlist)
if isempty(plist) || isempty(qlist)
#debug "No swappable exclusive target nodes for source nodes $p and $q, skipping trial $trial..."
continue
end
i = rand(plist)
j = rand(qlist)
adjmat[p, i] = adjmat[q, j] = false
adjmat[p, j] = adjmat[q, i] = true
end
adjmat
end
Through the course of randomization, it may happen that two nodes don't have any swappable connections i.e. they may share all their end points or one's ending nodes are a subset of the other's. So there's a check for that in the above code, and the loop moves on to the next iteration in that case.
The line with the findalls in the above code effectively creates adjacency lists from the adjacency matrix on the fly. You can instead do that in one go at the beginning, and work with that adjacency list vector instead.
function randomizeconnections2(adjmatin)
adjlist = [findall(r) for r in eachrow(adjmatin)]
nodes = axes(adjlist, 1)
niter = 10
for trial in 1:niter
p, q = sample(nodes, 2, replace = false)
plist = filter(i -> !in(i, adjlist[q]) && i != q, adjlist[p])
qlist = filter(j -> !in(j, adjlist[p]) && j != p, adjlist[q])
if isempty(plist) || isempty(qlist)
#debug "No swappable exclusive target nodes for source nodes $p and $q, skipping trial $trial..."
continue
end
i = rand(plist)
j = rand(qlist)
replace!(adjlist[p], i => j)
replace!(adjlist[q], j => i)
end
create_adjmat(adjlist)
end
function create_adjmat(adjlist::Vector{Vector{Int}})
adjmat = falses(length(adjlist), length(adjlist))
for (i, l) in pairs(adjlist)
adjmat[i, l] .= true
end
adjmat
end
With the small matriced I tried locally, randomizeconnections2 seems about twice as fast as randomizeconnections, but you may want to confirm whether that's the case with your matrix sizes and values.
Both of these accept (and were tested with) BitMatrix type values as input, which should be more efficient than an ordinary matrix of booleans or integers.

Best way to find the most costly path in graph

I have a directed acyclic graph on which every vertex has a weight >= 0. There is a vertex who is the "start" of the graph and another vertex who is the "end" of the graph. The idea is to find the path from the start to the end whose sum of the weights of the vertices is the greater. For example, I have the next graph:
I(0) -> V1(3) -> F(0)
I(0) -> V1(3) -> V2(1) -> F(0)
I(0) -> V3(0.5) -> V2(1) -> F(0)
The most costly path would be I(0) -> V1(3) -> V2(1) -> F(0), which cost is 4.
Right now, I am using BFS to just enumerate every path from I to F as in the example above, and then, I choose the one with the greatest sum. I am afraid this method can be really naive.
Is there a better algorithm to do this? Can this problem be reduced to another one?
Since your graph has no cycles* , you can negate the weights of your edges, and run Bellman-Ford's algorithm.
* Shortest path algorithms such as Floyd-Warshall and Bellman-Ford do not work on graphs with negative cycles, because you can build a path of arbitrarily small weight by staying in a negative cycle.
You can perform a topological sort, then iterate through the list of vertices returned by the topological sort, from the start vertex to the end vertex and compute the costs. For each directed edge of the current vertex check if you can improve the cost of destination vertex, then move to the next one. At the end cost[end_vertex] will contain the result.
class grph:
def __init__(self):
self.no_nodes = 0
self.a = []
def build(self, path):
file = open(path, "r")
package = file.readline()
self.no_nodes = int(package[0])
self.a = []
for i in range(self.no_nodes):
self.a.append([10000] * self.no_nodes)
for line in file:
package = line.split(' ')
source = int(package[0])
target = int(package[1])
cost = int(package[2])
self.a[source][target] = cost
file.close()
def tarjan(graph):
visited = [0] * graph.no_nodes
path = []
for i in range(graph.no_nodes):
if visited[i] == 0:
if not dfs(graph, visited, path, i):
return []
return path[::-1]
def dfs(graph, visited, path, start):
visited[start] = 1
for i in range(graph.no_nodes):
if graph.a[start][i] != 10000:
if visited[i] == 1:
return False
elif visited[i] == 0:
visited[i] = 1
if not dfs(graph, visited, path, i):
return False
visited[start] = 2
path.append(start)
return True
def lw(graph, start, end):
topological_sort = tarjan(graph)
costs = [0] * graph.no_nodes
i = 0
while i < len(topological_sort) and topological_sort[i] != start:
i += 1
while i < len(topological_sort) and topological_sort[i] != end:
vertex = topological_sort[i]
for destination in range(graph.no_nodes):
if graph.a[vertex][destination] != 10000:
new_cost = costs[vertex] + graph.a[vertex][destination]
if new_cost > costs[destination]:
costs[destination] = new_cost
i += 1
return costs[end]
Input file:
6
0 1 6
1 2 2
3 0 10
1 4 4
2 5 9
4 2 2
0 2 10
In general longest path problem is NP-hard, but since the graph is a DAG, it can be solved by first negating the weights then do a shortest path. See here.
Because the weights reside on the vertices, before computing, you might want to move the weights to the in edges of the vertices.

Obtain forest out of tree with even number of nodes

I'm stuck on a code challenge, and I want a hint.
PROBLEM: You are given a tree data structure (without cycles) and are asked to remove as many "edges" (connections) as possible, creating smaller trees with even numbers of nodes. This problem is always solvable as there are an even number of nodes and connections.
Your task is to count the removed edges.
Input:
The first line of input contains two integers N and M. N is the number of vertices and M is the number of edges. 2 <= N <= 100.
Next M lines contains two integers ui and vi which specifies an edge of the tree. (1-based index)
Output:
Print the number of edges removed.
Sample Input
10 9
2 1
3 1
4 3
5 2
6 1
7 2
8 6
9 8
10 8
Sample Output :
2
Explanation : On removing the edges (1, 3) and (1, 6), we can get the desired result.
I used BFS to travel through the nodes.
First, maintain an array separately to store the total number of child nodes + 1.
So, you can initially assign all the leaf nodes with value 1 in this array.
Now start from the last node and count the number of children for each node. This will work in bottom to top manner and the array that stores the number of child nodes will help in runtime to optimize the code.
Once you get the array after getting the number of children nodes for all the nodes, just counting the nodes with even number of nodes gives the answer. Note: I did not include root node in counting in final step.
This is my solution. I didn't use bfs tree, just allocated another array for holding eachnode's and their children nodes total number.
import java.util.Scanner;
import java.util.Arrays;
public class Solution {
public static void main(String[] args) {
int tree[];
int count[];
Scanner scan = new Scanner(System.in);
int N = scan.nextInt(); //points
int M = scan.nextInt();
tree = new int[N];
count = new int[N];
Arrays.fill(count, 1);
for(int i=0;i<M;i++)
{
int u1 = scan.nextInt();
int v1 = scan.nextInt();
tree[u1-1] = v1;
count[v1-1] += count[u1-1];
int root = tree[v1-1];
while(root!=0)
{
count[root-1] += count[u1-1];
root = tree[root-1];
}
}
System.out.println("");
int counter = -1;
for(int i=0;i<count.length;i++)
{
if(count[i]%2==0)
{
counter++;
}
}
System.out.println(counter);
}
}
If you observe the input, you can see that it is quite easy to count the number of nodes under each node. Consider (a b) as the edge input, in every case, a is the child and b is the immediate parent. The input always has edges represented bottom-up.
So its essentially the number of nodes which have an even count(Excluding the root node). I submitted the below code on Hackerrank and all the tests passed. I guess all the cases in the input satisfy the rule.
def find_edges(count):
root = max(count)
count_even = 0
for cnt in count:
if cnt % 2 == 0:
count_even += 1
if root % 2 == 0:
count_even -= 1
return count_even
def count_nodes(edge_list, n, m):
count = [1 for i in range(0, n)]
for i in range(m-1,-1,-1):
count[edge_list[i][1]-1] += count[edge_list[i][0]-1]
return find_edges(count)
I know that this has already been answered here lots and lots of time. I still want to know reviews on my solution here. I tried to construct the child count as the edges were coming through the input and it passed all the test cases.
namespace Hackerrank
{
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
var tempArray = Console.ReadLine().Split(' ').Select(x => Convert.ToInt32(x)).ToList();
int verticeNumber = tempArray[0];
int edgeNumber = tempArray[1];
Dictionary<int, int> childCount = new Dictionary<int, int>();
Dictionary<int, int> parentDict = new Dictionary<int, int>();
for (int count = 0; count < edgeNumber; count++)
{
var nodes = Console.ReadLine().Split(' ').Select(x => Convert.ToInt32(x)).ToList();
var node1 = nodes[0];
var node2 = nodes[1];
if (childCount.ContainsKey(node2))
childCount[node2]++;
else childCount.Add(node2, 1);
var parent = node2;
while (parentDict.ContainsKey(parent))
{
var par = parentDict[parent];
childCount[par]++;
parent = par;
}
parentDict[node1] = node2;
}
Console.WriteLine(childCount.Count(x => x.Value % 2 == 1) - 1);
}
}
}
My first inclination is to work up from the leaf nodes because you cannot cut their edges as that would leave single-vertex subtrees.
Here's the approach that I used to successfully pass all the test cases.
Mark vertex 1 as the root
Starting at the current root vertex, consider each child. If the sum total of the child and all of its children are even, then you can cut that edge
Descend to the next vertex (child of root vertex) and let that be the new root vertex. Repeat step 2 until you have traversed all of the nodes (depth first search).
Here's the general outline of an alternative approach:
Find all of the articulation points in the graph.
Check each articulation point to see if edges can be removed there.
Remove legal edges and look for more articulation points.
Solution - Traverse all the edges, and count the number of even edges
If we remove an edge from the tree and it results in two tree with even number of vertices, let's call that edge - even edge
If we remove an edge from the tree and it results in two trees with odd
number of vertices, let's call that edge - odd edge
Here is my solution in Ruby
num_vertices, num_edges = gets.chomp.split(' ').map { |e| e.to_i }
graph = Graph.new
(1..num_vertices).to_a.each do |vertex|
graph.add_node_by_val(vertex)
end
num_edges.times do |edge|
first, second = gets.chomp.split(' ').map { |e| e.to_i }
graph.add_edge_by_val(first, second, 0, false)
end
even_edges = 0
graph.edges.each do |edge|
dup = graph.deep_dup
first_tree = nil
second_tree = nil
subject_edge = nil
dup.edges.each do |e|
if e.first.value == edge.first.value && e.second.value == edge.second.value
subject_edge = e
first_tree = e.first
second_tree = e.second
end
end
dup.remove_edge(subject_edge)
if first_tree.size.even? && second_tree.size.even?
even_edges += 1
end
end
puts even_edges
Note - Click Here to check out the code for Graph, Node and Edge classes

How to easily know if a maze has a road from start to goal?

I implemented a maze using 0,1 array. The entry and goal is fixed in the maze. Entry always be 0,0 point of the maze. Goal always be m-1,n-1 point of the maze. I'm using breadth-first search algorithm for now, but the speed is not good enough. Especially for large maze (100*100 or so). Could someone help me on this algorithm?
Here is my solution:
queue = []
position = start_node
mark_tried(position)
queue << position
while(!queue.empty?)
p = queue.shift #pop the first element
return true if maze.goal?(p)
left = p.left
visit(queue,left) if can_visit?(maze,left)
right = p.right
visit(queue,right) if can_visit?(maze,right)
up = p.up
visit(queue,up) if can_visit?(maze,up)
down = p.down
visit(queue,down) if can_visit?(maze,down)
end
return false
the can_visit? method check whether the node is inside the maze, whether the node is visited, whether the node is blocked.
worst answer possible.
1) go front until you cant move
2) turn left
3) rinse and repeat.
if you make it out , there is an end.
A better solution.
Traverse through your maze keeping 2 lists for open and closed nodes. Use the famous A-Star algorithm
to choose evaluate the next node and discard nodes which are a dead end. If you run out of nodes on your open list, there is no exit.
Here is a simple algorithm which should be much faster:
From start/goal move to to the first junction. You can ignore anything between that junction and the start/goal.
Locate all places in the maze which are dead ends (they have three walls). Move back to the next junction and take this path out of the search tree.
After you have removed all dead ends this way, there should be a single path left (or several if there are several ways to reach the goal).
I would not use the AStar algorithm there yet, unless I really need to, because this can be done with some simple 'coloring'.
# maze is a m x n array
def canBeTraversed(maze):
m = len(maze)
n = len(maze[0])
colored = [ [ False for i in range(0,n) ] for j in range(0,m) ]
open = [(0,0),]
while len(open) != 0:
(x,y) = open.pop()
if x == m-1 and y == n-1:
return True
elif x < m and y < n and maze[x][y] != 0 not colored[x][y]:
colored[x][y] = True
open.extend([(x-1,y), (x,y-1), (x+1,y), (x,y+1)])
return False
Yes it's stupid, yes it's breadfirst and all that.
Here is the A* implementation
def dist(x,y):
return (abs(x[0]-y[0]) + abs(x[1]-y[1]))^2
def heuristic(x,y):
return (x[0]-y[0])^2 + (x[1]-y[1])^2
def find(open,f):
result = None
min = None
for x in open:
tmp = f[x[0]][x[1]]
if min == None or tmp < min:
min = tmp
result = x
return result
def neighbors(x,m,n):
def add(result,y,m,n):
if x < m and y < n: result.append(y)
result = []
add(result, (x[0]-1,x[1]), m, n)
add(result, (x[0],x[1]-1), m, n)
add(result, (x[0]+1,x[1]), m, n)
add(result, (x[0],x[1]+1), m, n)
return result
def canBeTraversedAStar(maze):
m = len(maze)
n = len(maze[0])
goal = (m-1,n-1)
closed = set([])
open = set([(0,0),])
g = [ [ 0 for y in range(0,n) ] for x in range(0,m) ]
h = [ [ heuristic((x,y),goal) for y in range(0,n) ] for x in range(0,m) ]
f = [ [ h[x][y] for y in range(0,n) ] for x in range(0,m) ]
while len(open) != 0:
x = find(open,f)
if x == (m-1,n-1):
return True
open.remove(x)
closed.add(x)
for y in neighbors(x,m,n):
if y in closed: continue
if y not in open:
open.add(y)
g[y[0]][y[1]] = g[x[0]][x[1]] + dist(x,y)
h[y[0]][y[1]] = heuristic(y,goal)
f[y[0]][y[1]] = g[y[0]][y[1]] + h[y[0]][y[1]]
return True
Here is my (simple) benchmark code:
def tryIt(func,size, runs):
maze = [ [ 1 for i in range(0,size) ] for j in range(0,size) ]
begin = datetime.datetime.now()
for i in range(0,runs): func(maze)
end = datetime.datetime.now()
print size, 'x', size, ':', (end - begin) / runs, 'average on', runs, 'runs'
tryIt(canBeTraversed,100,100)
tryIt(canBeTraversed,1000,100)
tryIt(canBeTraversedAStar,100,100)
tryIt(canBeTraversedAStar,1000,100)
Which outputs:
# For canBeTraversed
100 x 100 : 0:00:00.002650 average on 100 runs
1000 x 1000 : 0:00:00.198440 average on 100 runs
# For canBeTraversedAStar
100 x 100 : 0:00:00.016100 average on 100 runs
1000 x 1000 : 0:00:01.679220 average on 100 runs
The obvious here: going A* to run smoothly requires a lot of optimizations I did not bother to go after...
I would say:
Don't optimize
(Expert only) Don't optimize yet
How much time are you talking about when you say too much ? Really a 100x100 grid is so easily parsed in brute force it's a joke :/
I would have solved this with an AStar implementation. If you want even more speed, you can optimize to only generate the nodes from the junctions rather than every tile/square/step.
A method you can use that does not need to visit all nodes in the maze is as follows:
create an integer[][] with one value per maze "room"
create a queue, add [startpoint, count=1, delta=1] and [goal, count=-1, delta=-1]
start coloring the route by:
popping an object from the head of the queue, put the count at the maze point.
check all reachable rooms for a count with sign opposite to that of the rooms delta, if you find one the maze is solved: run both ways and connect the routes with the biggest steps up and down in room counts.
otherwise add all reachable rooms that have no count to the tail of the queue, with delta added to the room count.
if the queue is empty no path through the maze is possible.
This not only determines if there is a path, but also shows the shortest path possible through the maze.
You don't need to backtrack, so its O(number of maze rooms)

Fewest number of turns heuristic

Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search? Perhaps some more explanation would help.
I have a random graph, much like this:
0 1 1 1 2
3 4 5 6 7
9 a 5 b c
9 d e f f
9 9 g h i
Starting in the top left corner, I need to know the fewest number of steps it would take to get to the bottom right corner. Each set of connected colors is assumed to be a single node, so for instance in this random graph, the three 1's on the top row are all considered a single node, and every adjacent (not diagonal) connected node is a possible next state. So from the start, possible next states are the 1's in the top row or 3 in the second row.
Currently I use a bidirectional search, but the explosiveness of the tree size ramps up pretty quickly. For the life of me, I haven't been able to adjust the problem so that I can safely assign weights to the nodes and have them ensure the fewest number of state changes to reach the goal without it turning into a breadth first search. Thinking of this as a city map, the heuristic would be the fewest number of turns to reach the goal.
It is very important that the fewest number of turns is the result of this search as that value is part of the heuristic for a more complex problem.
You said yourself each group of numbers represents one node, and each node is connected to adjascent nodes. Then this is a simple shortest-path problem, and you could use (for instance) Dijkstra's algorithm, with each edge having weight 1 (for 1 turn).
This sounds like Dijkstra's algorithm. The hardest part would lay in properly setting up the graph (keeping track of which node gets which children), but if you can devote some CPU cycles to that, you'd be fine afterwards.
Why don't you want a breadth-first search?
Here.. I was bored :-) This is in Ruby but may get you started. Mind you, it is not tested.
class Node
attr_accessor :parents, :children, :value
def initialize args={}
#parents = args[:parents] || []
#children = args[:children] || []
#value = args[:value]
end
def add_parents *args
args.flatten.each do |node|
#parents << node
node.add_children self unless node.children.include? self
end
end
def add_children *args
args.flatten.each do |node|
#children << node
node.add_parents self unless node.parents.include? self
end
end
end
class Graph
attr_accessor :graph, :root
def initialize args={}
#graph = args[:graph]
#root = Node.new
prepare_graph
#root = #graph[0][0]
end
private
def prepare_graph
# We will iterate through the graph, and only check the values above and to the
# left of the current cell.
#graph.each_with_index do |row, i|
row.each_with_index do |cell, j|
cell = Node.new :value => cell #in-place modification!
# Check above
unless i.zero?
above = #graph[i-1][j]
if above.value == cell.value
# Here it is safe to do this: the new node has no children, no parents.
cell = above
else
cell.add_parents above
above.add_children cell # Redundant given the code for both of those
# methods, but implementations may differ.
end
end
# Check to the left!
unless j.zero?
left = #graph[i][j-1]
if left.value == cell.value
# Well, potentially it's the same as the one above the current cell,
# so we can't just set one equal to the other: have to merge them.
left.add_parents cell.parents
left.add_children cell.children
cell = left
else
cell.add_parents left
left.add_children cell
end
end
end
end
end
end
#j = 0, 1, 2, 3, 4
graph = [
[3, 4, 4, 4, 2], # i = 0
[8, 3, 1, 0, 8], # i = 1
[9, 0, 1, 2, 4], # i = 2
[9, 8, 0, 3, 3], # i = 3
[9, 9, 7, 2, 5]] # i = 4
maze = Graph.new :graph => graph
# Now, going from maze.root on, we have a weighted graph, should it matter.
# If it doesn't matter, you can just count the number of steps.
# Dijkstra's algorithm is really simple to find in the wild.
This looks like same problem as this projeceuler http://projecteuler.net/index.php?section=problems&id=81
Comlexity of solution is O(n) n-> number of nodes
What you need is memoization.
At each step you can get from max 2 directions. So pick the solution that is cheaper.
It is something like (just add the code that takes 0 if on boarder)
for i in row:
for j in column:
matrix[i][j]=min([matrix[i-1][j],matrix[i][j-1]])+matrix[i][j]
And now you have lest expensive solution if you move just left or down
Solution is in matrix[MAX_i][MAX_j]
If you can go left and up too, than the BigO is much higher (I can figure out optimal solution)
In order for A* to always find the shortest path, your heuristic needs to always under-estimate the actual cost (the heuristic is "admissable"). Simple heuristics like using the Euclidean or Manhattan distance on a grid work well because they're fast to compute and are guaranteed to be less than or equal to the actual cost.
Unfortunately, in your case, unless you can make some simplifying assumptions about the size/shape of the nodes, I'm not sure there's much you can do. For example, consider going from A to B in this case:
B 1 2 3 A
C 4 5 6 D
C 7 8 9 C
C e f g C
C C C C C
The shortest path would be A -> D -> C -> B, but using spatial information would probably give 3 a lower heuristic cost than D.
Depending on your circumstances, you might be able to live with a solution that isn't actually the shortest path, as long as you can get the answer sooner. There's a nice blogpost here by Christer Ericson (progammer for God of War 3 on PS3) on the topic: http://realtimecollisiondetection.net/blog/?p=56
Here's my idea for an nonadmissable heuristic: from the point, move horizontally until you're even with the goal, then move vertically until you reach it, and count the number of state changes that you made. You can compute other test paths (e.g. vertically then horizontally) too, and pick the minimum value as your final heuristic. If your nodes are roughly equal size and regularly shaped (unlike my example), this might do pretty well. The more test paths you do, the more accurate you'd get, but the slower it would be.
Hope that's helpful, let me know if any of it doesn't make sense.
This untuned C implementation of breadth-first search can chew through a 100-by-100 grid in less than 1 msec. You can probably do better.
int shortest_path(int *grid, int w, int h) {
int mark[w * h]; // for each square in the grid:
// 0 if not visited
// 1 if not visited and slated to be visited "now"
// 2 if already visited
int todo1[4 * w * h]; // buffers for two queues, a "now" queue
int todo2[4 * w * h]; // and a "later" queue
int *readp; // read position in the "now" queue
int *writep[2] = {todo1 + 1, 0};
int x, y, same;
todo1[0] = 0;
memset(mark, 0, sizeof(mark));
for (int d = 0; ; d++) {
readp = (d & 1) ? todo2 : todo1; // start of "now" queue
writep[1] = writep[0]; // end of "now" queue
writep[0] = (d & 1) ? todo1 : todo2; // "later" queue (empty)
// Now consume the "now" queue, filling both the "now" queue
// and the "later" queue as we go. Points in the "now" queue
// have distance d from the starting square. Points in the
// "later" queue have distance d+1.
while (readp < writep[1]) {
int p = *readp++;
if (mark[p] < 2) {
mark[p] = 2;
x = p % w;
y = p / w;
if (x > 0 && !mark[p-1]) { // go left
mark[p-1] = same = (grid[p-1] == grid[p]);
*writep[same]++ = p-1;
}
if (x + 1 < w && !mark[p+1]) { // go right
mark[p+1] = same = (grid[p+1] == grid[p]);
if (y == h - 1 && x == w - 2)
return d + !same;
*writep[same]++ = p+1;
}
if (y > 0 && !mark[p-w]) { // go up
mark[p-w] = same = (grid[p-w] == grid[p]);
*writep[same]++ = p-w;
}
if (y + 1 < h && !mark[p+w]) { // go down
mark[p+w] = same = (grid[p+w] == grid[p]);
if (y == h - 2 && x == w - 1)
return d + !same;
*writep[same]++ = p+w;
}
}
}
}
}
This paper has a slightly faster version of Dijsktra's algorithm, which lowers the constant term. Still O(n) though, since you are really going to have to look at every node.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.8746&rep=rep1&type=pdf
EDIT: THE PREVIOUS VERSION WAS WRONG AND WAS FIXED
Since a Djikstra is out. I'll recommend a simple DP, which has the benefit of running in the optimal time and not having you construct a graph.
D[a][b] is the minimal distance to x=a and y=b using only nodes where the x<=a and y<=b.
And since you can't move diagonally you only have to look at D[a-1][b] and D[a][b-1] when calculating D[a][b]
This gives you the following recurrence relationship:
D[a][b] = min(if grid[a][b] == grid[a-1][b] then D[a-1][b] else D[a-1][b] + 1, if grid[a][b] == grid[a][b-1] then D[a][b-1] else D[a][b-1] + 1)
However doing only the above fails on this case:
0 1 2 3 4
5 6 7 8 9
A b d e g
A f r t s
A z A A A
A A A f d
Therefore you need to cache the minimum of each group of node you found so far. And instead of looking at D[a][b] you look at the minimum of the group at grid[a][b].
Here's some Python code:
Note grid is the grid that you're given as input and it's assumed the grid is N by N
groupmin = {}
for x in xrange(0, N):
for y in xrange(0, N):
groupmin[grid[x][y]] = N+1#N+1 serves as 'infinity'
#init first row and column
groupmin[grid[0][0]] = 0
for x in xrange(1, N):
gm = groupmin[grid[x-1][0]]
temp = (gm) if grid[x][0] == grid[x-1][0] else (gm + 1)
groupmin[grid[x][0]] = min(groupmin[grid[x][0]], temp);
for y in xrange(1, N):
gm = groupmin[grid[0][y-1]]
temp = (gm) if grid[0][y] == grid[0][y-1] else (gm + 1)
groupmin[grid[0][y]] = min(groupmin[grid[0][y]], temp);
#do the rest of the blocks
for x in xrange(1, N):
for y in xrange(1, N):
gma = groupmin[grid[x-1][y]]
gmb = groupmin[grid[x][y-1]]
a = (gma) if grid[x][y] == grid[x-1][y] else (gma + 1)
b = (gmb) if grid[x][y] == grid[x][y-1] else (gma + 1)
temp = min(a, b)
groupmin[grid[x][y]] = min(groupmin[grid[x][y]], temp);
ans = groupmin[grid[N-1][N-1]]
This will run in O(N^2 * f(x)) where f(x) is the time the hash function takes which is normally O(1) time and this is one of the best functions you can hope for and it has a lot lower constant factor than Djikstra's.
You should easily be able to handle N's of up to a few thousand in a second.
Is there anyway to ensure the that the fewest number of turns heuristic is met by anything except a breadth first search?
A faster way, or a simpler way? :)
You can breadth-first search from both ends, alternating, until the two regions meet in the middle. This will be much faster if the graph has a lot of fanout, like a city map, but the worst case is the same. It really depends on the graph.
This is my implementation using a simple BFS. A Dijkstra would also work (substitute a stl::priority_queue that sorts by descending costs for the stl::queue) but would seriously be overkill.
The thing to notice here is that we are actually searching on a graph whose nodes do not exactly correspond to the cells in the given array. To get to that graph, I used a simple DFS-based floodfill (you could also use BFS, but DFS is slightly shorter for me). What that does is to find all connected and same character components and assign them to the same colour/node. Thus, after the floodfill we can find out what node each cell belongs to in the underlying graph by looking at the value of colour[row][col]. Then I just iterate over the cells and find out all the cells where adjacent cells do not have the same colour (i.e. are in different nodes). These therefore are the edges of our graph. I maintain a stl::set of edges as I iterate over the cells to eliminate duplicate edges. After that it is a simple matter of building an adjacency list from the list of edges and we are ready for a bfs.
Code (in C++):
#include <queue>
#include <vector>
#include <iostream>
#include <string>
#include <set>
#include <cstring>
using namespace std;
#define SIZE 1001
vector<string> board;
int colour[SIZE][SIZE];
int dr[]={0,1,0,-1};
int dc[]={1,0,-1,0};
int min(int x,int y){ return (x<y)?x:y;}
int max(int x,int y){ return (x>y)?x:y;}
void dfs(int r, int c, int col, vector<string> &b){
if (colour[r][c]<0){
colour[r][c]=col;
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && b[nr][nc]==b[r][c])
dfs(nr,nc,col,b);
}
}
}
int flood_fill(vector<string> &b){
memset(colour,-1,sizeof(colour));
int current_node=0;
for(int i=0;i<b.size();i++){
for(int j=0;j<b[0].size();j++){
if (colour[i][j]<0){
dfs(i,j,current_node,b);
current_node++;
}
}
}
return current_node;
}
vector<vector<int> > build_graph(vector<string> &b){
int total_nodes=flood_fill(b);
set<pair<int,int> > edge_list;
for(int r=0;r<b.size();r++){
for(int c=0;c<b[0].size();c++){
for(int i=0;i<4;i++){
int nr=r+dr[i],nc=c+dc[i];
if (nr>=0 && nr<b.size() && nc>=0 && nc<b[0].size() && colour[nr][nc]!=colour[r][c]){
int u=colour[r][c], v=colour[nr][nc];
if (u!=v) edge_list.insert(make_pair(min(u,v),max(u,v)));
}
}
}
}
vector<vector<int> > graph(total_nodes);
for(set<pair<int,int> >::iterator edge=edge_list.begin();edge!=edge_list.end();edge++){
int u=edge->first,v=edge->second;
graph[u].push_back(v);
graph[v].push_back(u);
}
return graph;
}
int bfs(vector<vector<int> > &G, int start, int end){
vector<int> cost(G.size(),-1);
queue<int> Q;
Q.push(start);
cost[start]=0;
while (!Q.empty()){
int node=Q.front();Q.pop();
vector<int> &adj=G[node];
for(int i=0;i<adj.size();i++){
if (cost[adj[i]]==-1){
cost[adj[i]]=cost[node]+1;
Q.push(adj[i]);
}
}
}
return cost[end];
}
int main(){
string line;
int rows,cols;
cin>>rows>>cols;
for(int r=0;r<rows;r++){
line="";
char ch;
for(int c=0;c<cols;c++){
cin>>ch;
line+=ch;
}
board.push_back(line);
}
vector<vector<int> > actual_graph=build_graph(board);
cout<<bfs(actual_graph,colour[0][0],colour[rows-1][cols-1])<<"\n";
}
This is just a quick hack, lots of improvements can be made. But I think it is pretty close to optimal in terms of runtime complexity, and should run fast enough for boards of size of several thousand (don't forget to change the #define of SIZE). Also, I only tested it with the one case you have provided. So, as Knuth said, "Beware of bugs in the above code; I have only proved it correct, not tried it." :).

Resources