Finding the longest road in a Settlers of Catan game algorithmically - algorithm

I'm writing a Settlers of Catan clone for a class. One of the extra credit features is automatically determining which player has the longest road. I've thought about it, and it seems like some slight variation on depth-first search could work, but I'm having trouble figuring out what to do with cycle detection, how to handle the joining of a player's two initial road networks, and a few other minutiae. How could I do this algorithmically?
For those unfamiliar with the game, I'll try to describe the problem concisely and abstractly: I need to find the longest possible path in an undirected cyclic graph.

This should be a rather easy algorithm to implement.
To begin with, separate out the roads into distinct sets, where all the road segments in each set are somehow connected. There's various methods on doing this, but here's one:
Pick a random road segment, add it to a set, and mark it
Branch out from this segment, ie. follow connected segments in both directions that aren't marked (if they're marked, we've already been here)
If found road segment is not already in the set, add it, and mark it
Keep going from new segments until you cannot find any more unmarked segments that are connected to those currently in the set
If there's unmarked segments left, they're part of a new set, pick a random one and start back at 1 with another set
Note: As per the official Catan Rules, a road can be broken if another play builds a settlement on a joint between two segments. You need to detect this and not branch past the settlement.
Note, in the above, and following, steps, only consider the current players segments. You can ignore those other segments as though they weren't even on the map.
This gives you one or more sets, each containing one or more road segments.
Ok, for each set, do the following:
Pick a random road segment in the set that has only one connected road segment out from it (ie. you pick an endpoint)
If you can't do that, then the whole set is looping (one or more), so pick a random segment in this case
Now, from the segment you picked, do a recursive branching out depth-first search, keeping track of the length of the current road you've found so far. Always mark road segments as well, and don't branch into segments already marked. This will allow the algorithm to stop when it "eats its own tail".
Whenever you need to backtrack, because there are no more branches, take a note of the current length, and if it is longer than the "previous maximum", store the new length as the maximum.
Do this for all the sets, and you should have your longest road.

A simple polynomial-time depth-first search is unlikely to work, since the problem is NP-hard. You will need something that takes exponential time to get an optimal solution. Since the problem is so small, that should be no problem in practice, though.
Possibly the simplest solution would be dynamic programming: keep a table T[v, l] that stores for each node v and each length l the set of paths that have length l and end in v. Clearly T[v, 1] = {[v]} and you can fill out T[v, l] for l > 1 by collecting all paths from T[w, l-1] where w is a neighbor of v that do not already contain v, and then attaching v. This is similar to Justin L.'s solution, but avoids some duplicate work.

I would suggest a breadth-first search.
For each player:
Set a default known maximum of 0
Pick a starting node. Skip if it has zero or more than one connected neighbors and find all of the player's paths starting from it in a breadth-first manner. The catch: Once a node has been traversed to, it is "deactivated" for the all searches left in that turn. That is, it may no longer be traversed.
How can this be implemented? Here's one possible breadth-first algorithm that can be used:
If there are no paths from this initial node, or more than one path, mark it deactivated and skip it.
Keep a queue of paths.
Add a path containing only the initial dead-end node to the queue. Deactivate this node.
Pop the first path out of the queue and "explode" it -- that is, find all valid paths that are the the path + 1 more step in a valid direction. By "valid", the next node must be connected to the last one by a road, and it also must be activated.
Deactivate all nodes stepped to during the last step.
If there are no valid "explosions" of the previous path, then compare that length of that path to the known maximum. If greater than it, it is the new maximum.
Add all exploded paths, if any, back into the queue.
Repeat 4-7 until the queue is empty.
After you do this once, move onto the next activated node and start the process all over again. Stop when all nodes are deactivated.
The maximum you have now is your longest road length, for the given player.
Note that this is slightly inefficient, but if performance doesn't matter, then this would work :)
IMPORTANT NOTE, thanks to Cameron MacFarland
Assume all nodes with cities that do not belong to the current player automatically deactivated always.
Ruby pseudocode (assumes an get_neighbors function for each node)
def explode n
exploded = n.get_neighbors # get all neighbors
exploded.select! { |i| i.activated? } # we only want activated ones
exploded.select! { |i| i.is_connected_to(n) } # we only want road-connected ones
return exploded
end
max = 0
nodes.each do |n| # for each node n
next if not n.activated? # skip if node is deactivated
if explode(n).empty? or explode(n).size > 1
n.deactivate # deactivate and skip if
next # there are no neighbors or
end # more than one
paths = [ [n] ] # start queue
until paths.empty? # do this until the queue is empty
curr_path = paths.pop # pop a path from the queue
exploded = explode(curr_path) # get all of the exploded valid paths
exploded.each { |i| i.deactivate } # deactivate all of the exploded valid points
if exploded.empty? # if no exploded paths
max = [max,curr_path.size].max # this is the end of the road, so compare its length to
# the max
else
exploded.each { |i| paths.unshift(curr_path.clone + i) }
# otherwise, add the new paths to the queue
end
end
end
puts max

Little late, but still relevant. I implemented it in java, see here. Algorithm goes as follows:
Derive a graph from the main graph using all edges from a player, adding the vertices of the main graph connected to the edges
Make a list of ends (vertices where degree==1) and splits (vertices where degree==3)
Add a route to check (possibleRoute) for every end, and for every two edges + one vertex combination found (3, since degree is 3) at every split
Walk every route. Three things can be encountered: an end, intermediate or a split.
End: Terminate possibleRoute, add it to the found list
Intermediate: check if connection to other edge is possible. If so, add the edge. If not, terminate the route and add it to the found routes list
Split: For both edges, do as intermediate. When both routes connect, copy the second route and add it to the list, too.
Checking for a connection is done using two methods: see if the new edge already exists in the possibleRoute (no connection), and asking the new edge if it can connect by giving the vertex and the originating vertex as parameters. A road can connect to a road, but only if the vertex does not contain a city/settlement from another player. A road can connect to a ship, but only if the vertex holds a city or road from the player whom route is being checked.
Loops may exist. Every encountered edge is added to a list if not already in there. When all possibleRoutes are checked, but the amount of edges in this list is less then the total amount of edges of the player, one or more loops exist. When this is the case, any nonchecked edge is made a new possibleRoute from, and is being checked again for the route.
Length of longest route is determined by simply iterating over all routes. The first route encountered having equal or more then 5 edges is returned.
This implementation supports most variants of Catan. The edges can decide for themselves if they want to connect to another, see
SidePiece.canConnect(point, to.getSidePiece());
A road, ship, bridge have SidePiece interface implemented. A road has as implementation
public boolean canConnect(GraphPoint graphPoint, SidePiece otherPiece)
{
return (player.equals(graphPoint.getPlayer()) || graphPoint.getPlayer() == null)
&& otherPiece.connectsWithRoad();
}

What I'd do:
Pick a vertex with a road
Pick at most two roads to follow
Follow the the road; backtrack here if it branches, and choose that which was longer
If current vertex is in the visited list, backtrack to 3
Add the vertex to the visited list, recurse from 3
If there is no more road at 3, return the length
When you've followed at most 2 roads, add them up, note the length
If there were 3 roads at the initial vertex, backtrack to 2.
Sorry for the prolog-ish terminology :)

Here is my version if anyone needs it. Written in Typescript.
longestRoadLengthForPlayer(player: PlayerColors): number {
let longestRoad = 0
let mainLoop = (currentLongestRoad: number, tileEdge: TileEdge, passedCorners: TileCorner[], passedEdges: TileEdge[]) => {
if (!(passedEdges.indexOf(tileEdge) > -1) && tileEdge.owner == player) {
passedEdges.push(tileEdge)
currentLongestRoad++
for (let endPoint of tileEdge.hexEdge.endPoints()) {
let corner = this.getTileCorner(endPoint)!
if ((corner.owner == player || corner.owner == PlayerColors.None) && !(passedCorners.indexOf(corner) > -1)) {
passedCorners.push(corner)
for (let hexEdge of corner.hexCorner.touchingEdges()) {
let edge = this.getTileEdge(hexEdge)
if (edge != undefined && edge != tileEdge) {
mainLoop(currentLongestRoad, edge, passedCorners, passedEdges)
}
}
} else {
checkIfLongestRoad(currentLongestRoad)
}
}
} else {
checkIfLongestRoad(currentLongestRoad)
}
}
for (let tileEdge of this.tileEdges) {
mainLoop(0, tileEdge, [], [])
}
function checkIfLongestRoad(roadLength: number) {
if (roadLength > longestRoad) {
longestRoad = roadLength
}
}
return longestRoad
}

You could use Dijkstra and just change the conditions to choose the longest path instead.
http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
It's efficient, but won't always find the path that in reality is shortest/longest although it will work most of the times.

Related

Maze algorithm that actually works

I've been working on trying to solve mazes and output the path from start to the end. No matter what I do though, it never really works 100%. It manages to backtrack trough the whole maze like it should and find the end... but actually outputting ONLY the path that it should take won't work. It works on some of the generated mazes, but not generally. It also includes some of the dead-ends.
The problem is that i've probably tried 10 different DFS algorithms online, but none of them works. It's very confusing that so many different sites seem to have algorithms that doesn't work, i've implemented them practically literally.
Would be very kind if someone could give an actual working pseudocode of how to write a simple maze solver using stack that shows the relevant path to take. After 20-25 hours of experimenting I don't think I will get anywhere anymore.
not exactly pseudocode, but it should work
define node: {int id}
define edge: {node a , b ; int weight (default=1)}
define listNeighbours:
input: node n
output: list of neighbours
define dfs:
input: list nodes, list edges, node start, node end
output: list path
bool visited[length(nodes)]
fill(visited , false)
list stack
add(stack , start)
visited[start.id] = true
while ! isEmpty(stack)
node c = first(stack)
if c.id == end.id
return stack
list neighbours = listNeighbours(c)
bool allVisited = true
node next
for node n in neighbours
if visited[n.id]
continue
else
allVisited = false
next = n
if allVisited
pop(stack)
else
push(stack , next)
return null
NOTE: ids of nodes are always <= the total number of nodes in the graph and unique in the graph

Shortest Path by rolling a dice

I have a difficult problem to solve (at least that's how I see it). I have a die (faces 1 to 6) with different values (others than [1-6]), and a board (n-by-m). I have a starting position and a finish position. I can move from a square to another by rolling the die. By doing this I have to add the top face to the sum/cost.
Now I have to calculate how to get from the start position to the end position with a minimum
sum/cost. I have tried almost everything but I can't find the correct algorithm.
I tried Dijkstra but it's useless because in the right path there are some intermediate nodes
that I can reach with a better sum from another path (that proves to be incorrect in the end). How should I change my algorithm?
algorithm overview:
dijkstra : PriorityQueue
if(I can get to a node with a smaller sum)
,remove it from the queue,
I change its cost and its die position
,add it to queue.
This is the code :
public void updateSums() {
PriorityQueue<Pair> q = new PriorityQueue<>(1, new PairComparator());
Help h = new Help();
q.add(new Pair(startLine, startColumn, sums[startLine][startColumn]));
while (!q.isEmpty()) {
Pair current = q.poll();
ArrayList<Pair> neigh = h.getNeighbours(current, table, lines, columns);
table[current.line][current.column].visit(); //table ->matrix with Nodes
for (Pair a : neigh) {
int alt = sums[current.line][current.column] + table[current.line][current.column].die.roll(a.direction);
if (sums[a.line][a.column] > alt) {
q.remove(new Pair(a.line, a.column, sums[a.line][a.column]));
sums[a.line][a.column] = alt; //sums -> matrix with costs
table[a.line][a.column].die.setDie(table[current.line][current.column].die, a.direction);
q.add(new Pair(a.line, a.column, sums[a.line][a.column]));
}
}
}
}
You need to also consider the position of the die in your Dijkstra states.
I.e. you cannot just have sums[lines][column], you'll have to do something such as sums[lines][column][die_config], where die_config is some way you create to convert the die position into an integer.
For example, if you have a die that looks like this initially:
^1 <4 v2 >9 f5 b7 (^ = top face, < = left... down, right, front and
back)
int initial_die[6] = {1,4,2,9,5,7}
You can convert it to an integer by simply considering the index of the face (from 0 to 5) that is pointing up and the one that is to the left. This means your die has less than 36 (see bottom note) possible rotation positions, which you can encode through something such as (0-based) (up*6 + left). By this I mean each face would have a value from 0 through 5 that you decide, regardless of their cost-associated value, so following the example above we would encode the initially top face as being the index 0, the left face as being the index 1, and so on.
So the die with config value 30 means that left = 30%6 (=0) the face that was initially pointing up (initial_die[0]), is currently pointing to the left, and up = (30 - left)/6 (=5) the face that is currently pointing up, is the one that was initially pointing to the back of the die (initial_die[5]). So this means the die currently has the cost 1 on its left face, and the cost 7 on its top face, and you can derive the rest of the die's faces from this information, since you know the initial disposition. (Basically this tells us the die rolled once to its left, followed by once towards its front, in comparison to the initial state)
With this additional information, your Dijkstra will be able to find the correct answer you seek, by considering the cheapest cost that reaches the final node, as you could have multiple with different final die positions.
Note: It doesn't actually have 36 possible positions, because some are impossible, for example two initially opposite sides won't be able to become adjacent on Up/Left. There are in fact only 24 valid positions, but the simple encoding I used above will actually use indexes up to ~34 depending on how you encode your die.

Path finding Algorithms, Stuck on basics

This is a basic college puzzle question i have which i though i will nail, but when it comes to implementing it i got stuck right on the get go.
Background: a 4 digit path finding algorithm, so given for example a target number 1001 and target number 1040, find path using BFs,DFs...Etc.
Allowed moves: add or subtract 1 from any of the 4 digits in any order, you cannot add to 9, or subtract from 0. You cannot add or subtract from the same digit twice i.e. to get from 1000 to 3000, you cannot do 2000 ->3000 as you’ll be adding 1 twice to the first digit. So you'll have to do 2000->2100->3100->3000.
Where I am Stuck: I just cannot figure out how i will represent the problem programmatically, never mind the algorithms, but how i will represent 1001 and move towards 1004 .What data structure? How will I identify the neighbours?
Can someone please push me towards the right direction?
If I understand correctly, you're looking for a way to represent your problem. Perhaps you're having difficulties to find it, because the problem uses a 4 digit number. Try to reduce the dimensions, and consider the same problem with 2 digits numbers (see also this other answer). Suppose you want to move from 10 to 30 using the problem's rules. Now that you have two dimensions, you may easily represent your problem on a 10x10 matrix (each digit can have 10 different values):
:
In the picture above S is your source position (10), and T is your target position (30). Now you can apply the rules of the problem. I don't know what you want to find: how many paths? just a path? the shortest path?
Anyway, a sketch of algorithm for handling moves is:
if you look at T you see that according to the rules of your problem, the only possible neighbours from which you can reach (30) are (20), (31), and (40).
now, let's assume you choose (20), then the only possible neighbour from which you can reach (20) is (21).
now, you're forced to choose (21), and the only possible neighbours from which you can reach (21) are (11) and (31).
finally, if you choose (11), then the only possible neighbours from which you can reach (11) are (10) and (12) ... and (10) is your source S, so you're done.
This is just a sketch for the algorithm (it doesn't say anything on how to choose from the list of possible neighbours), but I hope it gives you an idea.
Finally, when you've solved the problem in the two-dimensional space, you can extend it to a three-dimensional space, and to a four-dimensional space (your original problem), where you have a 10x10x10x10 matrix. Each dimension has always 10 slots/positions.
I hope this helps.
I would start with a directed graph where each node represents your number plus an additional degree of freedom (because of the constraint). E.g. you can think of this as another digit added to your number which represents the "digit changed last". E.g. you start with 1001[0] and this node connects to
1001[0] -> 2001[1]
1001[0] -> 0001[1]
1001[0] -> 1101[2]
1001[0] -> 1011[3]
1001[0] -> 1002[4]
1001[0] -> 0000[4]
or 2001[1] to
2001[1] -> 2101[2]
2001[1] -> 2011[3]
2001[1] -> 2002[4]
2001[1] -> 2000[4]
Please note: the ????[0] nodes are only allowed as starting nodes. And each edge always connects numbers with different last digit.
So for each number xyzw[d] the outgoing edges are given by
xyzw[d] -> (x+1)yzw[1] if x<9 && d!=1
xyzw[d] -> (x-1)yzw[1] if x>0 && d!=1
xyzw[d] -> x(y+1)zw[2] if y<9 && d!=2
xyzw[d] -> x(y-1)zw[2] if y>0 && d!=2
xyzw[d] -> xy(z+1)w[3] if z<9 && d!=3
xyzw[d] -> xy(z-1)w[3] if z>0 && d!=3
xyzw[d] -> xyz(w+1)[4] if w<9 && d!=4
xyzw[d] -> xyz(w-1)[4] if w>0 && d!=4
Your targets are now the four nodes 1004[1] .. 1004[4]. If your algorithm requires a single target node you may add an artificial edge ????[?] -> ????[0] for each number and then have 1004[0] as final.
The first thing that came to my mind is that this is just a point with extra dimensions. That is, rather than [x,y] to represent a point on a 2-D grid, you have [w,x,y,z] to represent a point in a 4-D space.
The one thing I'm not sure in your problem is what happens when you add a 1 to a 9. Is there a remainder that get's carried over, or does the number just get rolled over to zero? Or does this represent an edge that can't be crossed?
In any case, I'd start with thinking of this as a type of point with extra dimensionality, and apply your algorithms appropriately.
You can represent each state as an integer array of length 4. Then have a graph structure to connect the states (I would not advice on building the graph fully, as this alone will require O(10^4) memory).
Then use A* algorithm for traversing the graph and reaching the destination node.
Or simply use a BFS traversal routine where you generate the neighbouring states as you process a state. You'll have to keep a track of the states already visited (some sort of a set like data structure).
When visit a node you need to iterate through all children of the node and visit some of them, so you should only implement children generator function getting one parameter - current number like 1001. It's the "core" of the algorithm, then you need to use recoursion to traverse a tree. The following is code that DOES NOT WORK and goes into infinite recoursion, but is a sketch explaining my idea:
from random import random
def shouldVisit(digits):
#very smart function defining search strategy
return random() > 0.5
def traverse(source, target, prev):
if source == target:
print 'wow!'
return
for i in range(len(source)):
if i == prev:
continue
d = source[i]
if d > 0:
res = list(source)
res[i] = d - 1
if shouldVisit(res):
traverse(res, target, i)
if d < 9:
res = list(source)
res[i] = d + 1
if shouldVisit(res):
traverse(res, target, i)
def traverse0(source, target):
def tolist(num):
digits = []
while num:
digits.append(num % 10)
num /= 10
return digits
traverse(tolist(source), tolist(target), -1)
traverse0(1001, 1040)
very smart stub function shouldVisit may check for example if given node has been already visited

Algorithm to establish ordering amongst a set of items

I have a set of students (referred to as items in the title for generality). Amongst these students, some have a reputation for being rambunctious. We are told about a set of hate relationships of the form 'i hates j'. 'i hates j' does not imply 'j hates i'. We are supposed to arrange the students in rows (front most row numbered 1) in a way such that if 'i hates j' then i should be put in a row that is strictly lesser numbered than that of j (in other words: in some row that is in front of j's row) so that i doesn't throw anything at j (Turning back is not allowed). What would be an efficient algorithm to find the minimum number of rows needed (each row need not have the same number of students)?
We will make the following assumptions:
1) If we model this as a directed graph, there are no cycles in the graph. The most basic cycle would be: if 'i hates j' is true, 'j hates i' is false. Because otherwise, I think the ordering would become impossible.
2) Every student in the group is at least hated by one other student OR at least hates one other student. Of course, there would be students who are both hated by some and who in turn hate other students. This means that there are no stray students who don't form part of the graph.
Update: I have already thought of constructing a directed graph with i --> j if 'i hates j and doing topological sorting. However, since the general topological sort would suit better if I had to line all the students in a single line. Since there is a variation of the rows here, I am trying to figure out how to factor in the change into topological sort so it gives me what I want.
When you answer, please state the complexity of your solution. If anybody is giving code and you don't mind the language, then I'd prefer Java but of course any other language is just as fine.
JFYI This is not for any kind of homework (I am not a student btw :)).
It sounds to me that you need to investigate topological sorting.
This problem is basically another way to put the longest path in a directed graph problem. The number of rows is actually number of nodes in path (number of edges + 1).
Assuming the graph is acyclic, the solution is topological sort.
Acyclic is a bit stronger the your assumption 1. Not only A -> B and B -> A is invalid. Also A -> B, B -> C, C -> A and any cycle of any length.
HINT: the question is how many rows are needed, not which student in which row. The answer to the question is the length of the longest path.
It's from a project management theory (or scheduling theory, I don't know the exact term). There the task is about sorting jobs (vertex is a job, arc is a job order relationship).
Obviously we have some connected oriented graph without loops. There is an arc from vertex a to vertex b if and only if a hates b. Let's assume there is a source (without incoming arcs) and destination (without outgoing arcs) vertex. If that is not the case, just add imaginary ones. Now we want to find length of a longest path from source to destination (it will be number of rows - 1, but mind the imaginary verteces).
We will define vertex rank (r[v]) as number of arcs in a longest path between source and this vertex v. Obviously we want to know r[destination]. Algorithm for finding rank:
0) r_0[v] := 0 for all verteces v
repeat
t) r_t[end(j)] := max( r_{t-1}[end(j)], r_{t-1}[start(j)] + 1 ) for all arcs j
until for all arcs j r_{t+1}[end(j)] = r_t[end(j)] // i.e. no changes on this iteration
On each step at least one vertex increases its rank. Therefore in this form complexity is O(n^3).
By the way, this algorithm also gives you student distribution among rows. Just group students by their respective ranks.
Edit: Another code with the same idea. Possibly it is better understandable.
# Python
# V is a list of vertex indices, let it be something like V = range(N)
# source has index 0, destination has index N-1
# E is a list of edges, i.e. tuples of the form (start vertex, end vertex)
R = [0] * len(V)
do:
changes = False
for e in E:
if R[e[1]] < R[e[0]] + 1:
changes = True
R[e[1]] = R[e[0]] + 1
while changes
# The answer is derived from value of R[N-1]
Of course this is the simplest implementation. It can be optimized, and time estimate can be better.
Edit2: obvious optimization - update only verteces adjacent to those that were updated on the previous step. I.e. introduce a queue with verteces whose rank was updated. Also for edge storing one should use adjacency lists. With such optimization complexity would be O(N^2). Indeed, each vertex may appear in the queue at most rank times. But vertex rank never exceeds N - number of verteces. Therefore total number of algorithm steps will not exceed O(N^2).
Essentailly the important thing in assumption #1 is that there must not be any cycles in this graph. If there are any cycles you can't solve this problem.
I would start by seating all of the students that do not hate any other students in the back row. Then you can seat the students who hate these students in the next row and etc.
The number of rows is the length of the longest path in the directed graph, plus one. As a limit case, if there is no hate relationship everyone can fit on the same row.
To allocate the rows, put everyone who is not hated by anyone else on the row one. These are the "roots" of your graph. Everyone else is put on row N + 1 if N is the length of the longest path from any of the roots to that person (this path is of length one at least).
A simple O(N^3) algorithm is the following:
S = set of students
for s in S: s.row = -1 # initialize row field
rownum = 0 # start from first row below
flag = true # when to finish
while (flag):
rownum = rownum + 1 # proceed to next row
flag = false
for s in S:
if (s.row != -1) continue # already allocated
ok = true
foreach q in S:
# Check if there is student q who will sit
# on this or later row who hates s
if ((q.row == -1 or q.row = rownum)
and s hated by q) ok = false; break
if (ok): # can put s here
s.row = rownum
flag = true
Simple answer = 1 row.
Put all students in the same row.
Actually that might not solve the question as stated - lesser row, rather than equal row...
Put all students in row 1
For each hate relation, put the not-hating student in a row behind the hating student
Iterate till you have no activity, or iterate Num(relation) times.
But I'm sure there are better algorithms - look at acyclic graphs.
Construct a relationship graph where i hates j will have a directed edge from i to j. So end result is a directed graph. It should be a DAG otherwise no solutions as it's not possible to resolve circular hate relations ship.
Now simply do a DFS search and during the post node callbacks, means the once the DFS of all the children are done and before returning from the DFS call to this node, simply check the row number of all the children and assign the row number of this node as row max row of the child + 1. Incase if there is some one who doesn't hate anyone basically node with no adjacency list simply assign him row 0.
Once all the nodes are processed reverse the row numbers. This should be easy as this is just about finding the max and assigning the row numbers as max-already assigned row numbers.
Here is the sample code.
postNodeCb( graph g, int node )
{
if ( /* No adj list */ )
row[ node ] = 0;
else
row[ node ] = max( row number of all children ) + 1;
}
main()
{
.
.
for ( int i = 0; i < NUM_VER; i++ )
if ( !visited[ i ] )
graphTraverseDfs( g, i );`enter code here`
.
.
}

Finding all cycles in a directed graph

How can I find (iterate over) ALL the cycles in a directed graph from/to a given node?
For example, I want something like this:
A->B->A
A->B->C->A
but not:
B->C->B
I found this page in my search and since cycles are not same as strongly connected components, I kept on searching and finally, I found an efficient algorithm which lists all (elementary) cycles of a directed graph. It is from Donald B. Johnson and the paper can be found in the following link:
http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF
A java implementation can be found in:
http://normalisiert.de/code/java/elementaryCycles.zip
A Mathematica demonstration of Johnson's algorithm can be found here, implementation can be downloaded from the right ("Download author code").
Note: Actually, there are many algorithms for this problem. Some of them are listed in this article:
http://dx.doi.org/10.1137/0205007
According to the article, Johnson's algorithm is the fastest one.
Depth first search with backtracking should work here.
Keep an array of boolean values to keep track of whether you visited a node before. If you run out of new nodes to go to (without hitting a node you have already been), then just backtrack and try a different branch.
The DFS is easy to implement if you have an adjacency list to represent the graph. For example adj[A] = {B,C} indicates that B and C are the children of A.
For example, pseudo-code below. "start" is the node you start from.
dfs(adj,node,visited):
if (visited[node]):
if (node == start):
"found a path"
return;
visited[node]=YES;
for child in adj[node]:
dfs(adj,child,visited)
visited[node]=NO;
Call the above function with the start node:
visited = {}
dfs(adj,start,visited)
The simplest choice I found to solve this problem was using the python lib called networkx.
It implements the Johnson's algorithm mentioned in the best answer of this question but it makes quite simple to execute.
In short you need the following:
import networkx as nx
import matplotlib.pyplot as plt
# Create Directed Graph
G=nx.DiGraph()
# Add a list of nodes:
G.add_nodes_from(["a","b","c","d","e"])
# Add a list of edges:
G.add_edges_from([("a","b"),("b","c"), ("c","a"), ("b","d"), ("d","e"), ("e","a")])
#Return a list of cycles described as a list o nodes
list(nx.simple_cycles(G))
Answer: [['a', 'b', 'd', 'e'], ['a', 'b', 'c']]
First of all - you do not really want to try find literally all cycles because if there is 1 then there is an infinite number of those. For example A-B-A, A-B-A-B-A etc. Or it may be possible to join together 2 cycles into an 8-like cycle etc., etc... The meaningful approach is to look for all so called simple cycles - those that do not cross themselves except in the start/end point. Then if you wish you can generate combinations of simple cycles.
One of the baseline algorithms for finding all simple cycles in a directed graph is this: Do a depth-first traversal of all simple paths (those that do not cross themselves) in the graph. Every time when the current node has a successor on the stack a simple cycle is discovered. It consists of the elements on the stack starting with the identified successor and ending with the top of the stack. Depth first traversal of all simple paths is similar to depth first search but you do not mark/record visited nodes other than those currently on the stack as stop points.
The brute force algorithm above is terribly inefficient and in addition to that generates multiple copies of the cycles. It is however the starting point of multiple practical algorithms which apply various enhancements in order to improve performance and avoid cycle duplication. I was surprised to find out some time ago that these algorithms are not readily available in textbooks and on the web. So I did some research and implemented 4 such algorithms and 1 algorithm for cycles in undirected graphs in an open source Java library here : http://code.google.com/p/niographs/ .
BTW, since I mentioned undirected graphs : The algorithm for those is different. Build a spanning tree and then every edge which is not part of the tree forms a simple cycle together with some edges in the tree. The cycles found this way form a so called cycle base. All simple cycles can then be found by combining 2 or more distinct base cycles. For more details see e.g. this : http://dspace.mit.edu/bitstream/handle/1721.1/68106/FTL_R_1982_07.pdf .
The DFS-based variants with back edges will find cycles indeed, but in many cases it will NOT be minimal cycles. In general DFS gives you the flag that there is a cycle but it is not good enough to actually find cycles. For example, imagine 5 different cycles sharing two edges. There is no simple way to identify cycles using just DFS (including backtracking variants).
Johnson's algorithm is indeed gives all unique simple cycles and has good time and space complexity.
But if you want to just find MINIMAL cycles (meaning that there may be more then one cycle going through any vertex and we are interested in finding minimal ones) AND your graph is not very large, you can try to use the simple method below.
It is VERY simple but rather slow compared to Johnson's.
So, one of the absolutely easiest way to find MINIMAL cycles is to use Floyd's algorithm to find minimal paths between all the vertices using adjacency matrix.
This algorithm is nowhere near as optimal as Johnson's, but it is so simple and its inner loop is so tight that for smaller graphs (<=50-100 nodes) it absolutely makes sense to use it.
Time complexity is O(n^3), space complexity O(n^2) if you use parent tracking and O(1) if you don't.
First of all let's find the answer to the question if there is a cycle.
The algorithm is dead-simple. Below is snippet in Scala.
val NO_EDGE = Integer.MAX_VALUE / 2
def shortestPath(weights: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
weights(i)(j) = throughK
}
}
}
Originally this algorithm operates on weighted-edge graph to find all shortest paths between all pairs of nodes (hence the weights argument). For it to work correctly you need to provide 1 if there is a directed edge between the nodes or NO_EDGE otherwise.
After algorithm executes, you can check the main diagonal, if there are values less then NO_EDGE than this node participates in a cycle of length equal to the value. Every other node of the same cycle will have the same value (on the main diagonal).
To reconstruct the cycle itself we need to use slightly modified version of algorithm with parent tracking.
def shortestPath(weights: Array[Array[Int]], parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = k
weights(i)(j) = throughK
}
}
}
Parents matrix initially should contain source vertex index in an edge cell if there is an edge between the vertices and -1 otherwise.
After function returns, for each edge you will have reference to the parent node in the shortest path tree.
And then it's easy to recover actual cycles.
All in all we have the following program to find all minimal cycles
val NO_EDGE = Integer.MAX_VALUE / 2;
def shortestPathWithParentTracking(
weights: Array[Array[Int]],
parents: Array[Array[Int]]) = {
for (k <- weights.indices;
i <- weights.indices;
j <- weights.indices) {
val throughK = weights(i)(k) + weights(k)(j)
if (throughK < weights(i)(j)) {
parents(i)(j) = parents(i)(k)
weights(i)(j) = throughK
}
}
}
def recoverCycles(
cycleNodes: Seq[Int],
parents: Array[Array[Int]]): Set[Seq[Int]] = {
val res = new mutable.HashSet[Seq[Int]]()
for (node <- cycleNodes) {
var cycle = new mutable.ArrayBuffer[Int]()
cycle += node
var other = parents(node)(node)
do {
cycle += other
other = parents(other)(node)
} while(other != node)
res += cycle.sorted
}
res.toSet
}
and a small main method just to test the result
def main(args: Array[String]): Unit = {
val n = 3
val weights = Array(Array(NO_EDGE, 1, NO_EDGE), Array(NO_EDGE, NO_EDGE, 1), Array(1, NO_EDGE, NO_EDGE))
val parents = Array(Array(-1, 1, -1), Array(-1, -1, 2), Array(0, -1, -1))
shortestPathWithParentTracking(weights, parents)
val cycleNodes = parents.indices.filter(i => parents(i)(i) < NO_EDGE)
val cycles: Set[Seq[Int]] = recoverCycles(cycleNodes, parents)
println("The following minimal cycle found:")
cycles.foreach(c => println(c.mkString))
println(s"Total: ${cycles.size} cycle found")
}
and the output is
The following minimal cycle found:
012
Total: 1 cycle found
To clarify:
Strongly Connected Components will find all subgraphs that have at least one cycle in them, not all possible cycles in the graph. e.g. if you take all strongly connected components and collapse/group/merge each one of them into one node (i.e. a node per component), you'll get a tree with no cycles (a DAG actually). Each component (which is basically a subgraph with at least one cycle in it) can contain many more possible cycles internally, so SCC will NOT find all possible cycles, it will find all possible groups that have at least one cycle, and if you group them, then the graph will not have cycles.
to find all simple cycles in a graph, as others mentioned, Johnson's algorithm is a candidate.
I was given this as an interview question once, I suspect this has happened to you and you are coming here for help. Break the problem into three questions and it becomes easier.
how do you determine the next valid
route
how do you determine if a point has
been used
how do you avoid crossing over the
same point again
Problem 1)
Use the iterator pattern to provide a way of iterating route results. A good place to put the logic to get the next route is probably the "moveNext" of your iterator. To find a valid route, it depends on your data structure. For me it was a sql table full of valid route possibilities so I had to build a query to get the valid destinations given a source.
Problem 2)
Push each node as you find them into a collection as you get them, this means that you can see if you are "doubling back" over a point very easily by interrogating the collection you are building on the fly.
Problem 3)
If at any point you see you are doubling back, you can pop things off the collection and "back up". Then from that point try to "move forward" again.
Hack: if you are using Sql Server 2008 there is are some new "hierarchy" things you can use to quickly solve this if you structure your data in a tree.
In the case of undirected graph, a paper recently published (Optimal listing of cycles and st-paths in undirected graphs) offers an asymptotically optimal solution. You can read it here http://arxiv.org/abs/1205.2766 or here http://dl.acm.org/citation.cfm?id=2627951
I know it doesn't answer your question, but since the title of your question doesn't mention direction, it might still be useful for Google search
Start at node X and check for all child nodes (parent and child nodes are equivalent if undirected). Mark those child nodes as being children of X. From any such child node A, mark it's children of being children of A, X', where X' is marked as being 2 steps away.). If you later hit X and mark it as being a child of X'', that means X is in a 3 node cycle. Backtracking to it's parent is easy (as-is, the algorithm has no support for this so you'd find whichever parent has X').
Note: If graph is undirected or has any bidirectional edges, this algorithm gets more complicated, assuming you don't want to traverse the same edge twice for a cycle.
If what you want is to find all elementary circuits in a graph you can use the EC algorithm, by JAMES C. TIERNAN, found on a paper since 1970.
The very original EC algorithm as I managed to implement it in php (hope there are no mistakes is shown below). It can find loops too if there are any. The circuits in this implementation (that tries to clone the original) are the non zero elements. Zero here stands for non-existence (null as we know it).
Apart from that below follows an other implementation that gives the algorithm more independece, this means the nodes can start from anywhere even from negative numbers, e.g -4,-3,-2,.. etc.
In both cases it is required that the nodes are sequential.
You might need to study the original paper, James C. Tiernan Elementary Circuit Algorithm
<?php
echo "<pre><br><br>";
$G = array(
1=>array(1,2,3),
2=>array(1,2,3),
3=>array(1,2,3)
);
define('N',key(array_slice($G, -1, 1, true)));
$P = array(1=>0,2=>0,3=>0,4=>0,5=>0);
$H = array(1=>$P, 2=>$P, 3=>$P, 4=>$P, 5=>$P );
$k = 1;
$P[$k] = key($G);
$Circ = array();
#[Path Extension]
EC2_Path_Extension:
foreach($G[$P[$k]] as $j => $child ){
if( $child>$P[1] and in_array($child, $P)===false and in_array($child, $H[$P[$k]])===false ){
$k++;
$P[$k] = $child;
goto EC2_Path_Extension;
} }
#[EC3 Circuit Confirmation]
if( in_array($P[1], $G[$P[$k]])===true ){//if PATH[1] is not child of PATH[current] then don't have a cycle
$Circ[] = $P;
}
#[EC4 Vertex Closure]
if($k===1){
goto EC5_Advance_Initial_Vertex;
}
//afou den ksana theoreitai einai asfales na svisoume
for( $m=1; $m<=N; $m++){//H[P[k], m] <- O, m = 1, 2, . . . , N
if( $H[$P[$k-1]][$m]===0 ){
$H[$P[$k-1]][$m]=$P[$k];
break(1);
}
}
for( $m=1; $m<=N; $m++ ){//H[P[k], m] <- O, m = 1, 2, . . . , N
$H[$P[$k]][$m]=0;
}
$P[$k]=0;
$k--;
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC5_Advance_Initial_Vertex:
if($P[1] === N){
goto EC6_Terminate;
}
$P[1]++;
$k=1;
$H=array(
1=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
2=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
3=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
4=>array(1=>0,2=>0,3=>0,4=>0,5=>0),
5=>array(1=>0,2=>0,3=>0,4=>0,5=>0)
);
goto EC2_Path_Extension;
#[EC5 Advance Initial Vertex]
EC6_Terminate:
print_r($Circ);
?>
then this is the other implementation, more independent of the graph, without goto and without array values, instead it uses array keys, the path, the graph and circuits are stored as array keys (use array values if you like, just change the required lines). The example graph start from -4 to show its independence.
<?php
$G = array(
-4=>array(-4=>true,-3=>true,-2=>true),
-3=>array(-4=>true,-3=>true,-2=>true),
-2=>array(-4=>true,-3=>true,-2=>true)
);
$C = array();
EC($G,$C);
echo "<pre>";
print_r($C);
function EC($G, &$C){
$CNST_not_closed = false; // this flag indicates no closure
$CNST_closed = true; // this flag indicates closure
// define the state where there is no closures for some node
$tmp_first_node = key($G); // first node = first key
$tmp_last_node = $tmp_first_node-1+count($G); // last node = last key
$CNST_closure_reset = array();
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$CNST_closure_reset[$k] = $CNST_not_closed;
}
// define the state where there is no closure for all nodes
for($k=$tmp_first_node; $k<=$tmp_last_node; $k++){
$H[$k] = $CNST_closure_reset; // Key in the closure arrays represent nodes
}
unset($tmp_first_node);
unset($tmp_last_node);
# Start algorithm
foreach($G as $init_node => $children){#[Jump to initial node set]
#[Initial Node Set]
$P = array(); // declare at starup, remove the old $init_node from path on loop
$P[$init_node]=true; // the first key in P is always the new initial node
$k=$init_node; // update the current node
// On loop H[old_init_node] is not cleared cause is never checked again
do{#Path 1,3,7,4 jump here to extend father 7
do{#Path from 1,3,8,5 became 2,4,8,5,6 jump here to extend child 6
$new_expansion = false;
foreach( $G[$k] as $child => $foo ){#Consider each child of 7 or 6
if( $child>$init_node and isset($P[$child])===false and $H[$k][$child]===$CNST_not_closed ){
$P[$child]=true; // add this child to the path
$k = $child; // update the current node
$new_expansion=true;// set the flag for expanding the child of k
break(1); // we are done, one child at a time
} } }while(($new_expansion===true));// Do while a new child has been added to the path
# If the first node is child of the last we have a circuit
if( isset($G[$k][$init_node])===true ){
$C[] = $P; // Leaving this out of closure will catch loops to
}
# Closure
if($k>$init_node){ //if k>init_node then alwaya count(P)>1, so proceed to closure
$new_expansion=true; // $new_expansion is never true, set true to expand father of k
unset($P[$k]); // remove k from path
end($P); $k_father = key($P); // get father of k
$H[$k_father][$k]=$CNST_closed; // mark k as closed
$H[$k] = $CNST_closure_reset; // reset k closure
$k = $k_father; // update k
} } while($new_expansion===true);//if we don't wnter the if block m has the old k$k_father_old = $k;
// Advance Initial Vertex Context
}//foreach initial
}//function
?>
I have analized and documented the EC but unfortunately the documentation is in Greek.
There are two steps (algorithms) involved in finding all cycles in a DAG.
The first step is to use Tarjan's algorithm to find the set of strongly connected components.
Start from any arbitrary vertex.
DFS from that vertex. For each node x, keep two numbers, dfs_index[x] and dfs_lowval[x].
dfs_index[x] stores when that node is visited, while dfs_lowval[x] = min(dfs_low[k]) where
k is all the children of x that is not the directly parent of x in the dfs-spanning tree.
All nodes with the same dfs_lowval[x] are in the same strongly connected component.
The second step is to find cycles (paths) within the connected components. My suggestion is to use a modified version of Hierholzer's algorithm.
The idea is:
Choose any starting vertex v, and follow a trail of edges from that vertex until you return to v.
It is not possible to get stuck at any vertex other than v, because the even degree of all vertices ensures that, when the trail enters another vertex w there must be an unused edge leaving w. The tour formed in this way is a closed tour, but may not cover all the vertices and edges of the initial graph.
As long as there exists a vertex v that belongs to the current tour but that has adjacent edges not part of the tour, start another trail from v, following unused edges until you return to v, and join the tour formed in this way to the previous tour.
Here is the link to a Java implementation with a test case:
http://stones333.blogspot.com/2013/12/find-cycles-in-directed-graph-dag.html
I stumbled over the following algorithm which seems to be more efficient than Johnson's algorithm (at least for larger graphs). I am however not sure about its performance compared to Tarjan's algorithm.
Additionally, I only checked it out for triangles so far. If interested, please see "Arboricity and Subgraph Listing Algorithms" by Norishige Chiba and Takao Nishizeki (http://dx.doi.org/10.1137/0214017)
DFS from the start node s, keep track of the DFS path during traversal, and record the path if you find an edge from node v in the path to s. (v,s) is a back-edge in the DFS tree and thus indicates a cycle containing s.
Regarding your question about the Permutation Cycle, read more here:
https://www.codechef.com/problems/PCYCLE
You can try this code (enter the size and the digits number):
# include<cstdio>
using namespace std;
int main()
{
int n;
scanf("%d",&n);
int num[1000];
int visited[1000]={0};
int vindex[2000];
for(int i=1;i<=n;i++)
scanf("%d",&num[i]);
int t_visited=0;
int cycles=0;
int start=0, index;
while(t_visited < n)
{
for(int i=1;i<=n;i++)
{
if(visited[i]==0)
{
vindex[start]=i;
visited[i]=1;
t_visited++;
index=start;
break;
}
}
while(true)
{
index++;
vindex[index]=num[vindex[index-1]];
if(vindex[index]==vindex[start])
break;
visited[vindex[index]]=1;
t_visited++;
}
vindex[++index]=0;
start=index+1;
cycles++;
}
printf("%d\n",cycles,vindex[0]);
for(int i=0;i<(n+2*cycles);i++)
{
if(vindex[i]==0)
printf("\n");
else
printf("%d ",vindex[i]);
}
}
DFS c++ version for the pseudo-code in second floor's answer:
void findCircleUnit(int start, int v, bool* visited, vector<int>& path) {
if(visited[v]) {
if(v == start) {
for(auto c : path)
cout << c << " ";
cout << endl;
return;
}
else
return;
}
visited[v] = true;
path.push_back(v);
for(auto i : G[v])
findCircleUnit(start, i, visited, path);
visited[v] = false;
path.pop_back();
}
http://www.me.utexas.edu/~bard/IP/Handouts/cycles.pdf
The CXXGraph library give a set of algorithms and functions to detect cycles.
For a full algorithm explanation visit the wiki.

Resources