How to verify if there is a circle in a tree?

How to verify if there is a circle in a tree? - data-structures

Here is a tree:
There will be one root.
Each tree node has zero or more children.
It is allowed that two nodes points to the same child. Say, both node A
and node B has child C.
However, it is prohibited that,
Node A is an offspring of Node B, and
Node B is an offspring of Node A.
One prohibited case is
Node A has a child Node C and Node D,
Both Node C and D has a child node E,
Node E has a child of A.
The question is, how to determine this circle in a fastest manner?
UPDATE: I realize this is to find any cycle in a directed graph. Just now I managed to think out a solution similar to Tarjan's algorithm.
Thanks for comments.

Do a Depth First Search through the tree. If at any point you find a node that is already in your backtracking stack, there is a circle.

circles can be found using 2 pointers and advancing them at different intervals. Eventually the pointers will match, indicating a loop, or the "faster" one will reach then end. The question is usually asked of linked lists though, not trees.

Related

What's the meaning of "explored" in DFS

I am tring to complete this tree where Iteration is clear, but i don't undertand how to find [tag:explored ] (on the table) or what is explored. I did't find any solution so can anyone explain what is explored or any source. my start state in A and goal state is E

You can find the term "explore" being used in the context of tree traversals.
For instance in class notes from TS University, we find the terms "expanded", "explored", "visited", and "frontier" all used in an explanation of DFS in a (search) tree:
expansion of nodes
as states are explored, the corresponding nodes are expanded by
applying the successor function
this generates a new set of (child) nodes
the fringe (frontier) is the set of nodes not yet visited
newly generated nodes are added to the fringe
Here you see how exploring happens at the moment of expanding: it is synonymous to visiting. Ignore the term "generated" here, as that is specific to search trees. You could read it as "discovered".
As the frontier consists of nodes that are by definition not yet visited, the set of explored nodes is disjunct from the set of nodes on the frontier. Furthermore, the nodes on the frontier are always direct children of the nodes that have been explored. The first node on the frontier will be moved to the explored set in the next iteration.
The table in your question can be completed as follows:
Iteration
Frontier
Explored
A
B,C,D
A
A,B,C,D
E,C,D
A,B
A,B,C,D,E
C,D
A,B,E
Explanation:
Initially, we could say the frontier consists of A (not depicted in the table). It is the caller of the DFS algorithm that should pass this node reference.
In the first iteration the node A is popped from the frontier, marked as explored, and is expanded, i.e. its children are added to the frontier. So that means the frontier consists of B, C, and D.
In the second iteration the node B is popped from the frontier (from its left side), marked as explored, and is expanded: its children are added to the frontier (at its left side). The frontier thus becomes E, C, D.
In the third iteration the node E is popped from the frontier, marked as explored, and as this is the target node, the process stops. The frontier ends up with C, D still there, but these nodes will never be explored.

How to remove invalid edges in graded directed acyclig graph?

The following graph sample is a portion of a directed acyclic graph which is to be layered and cleaned up so that only edges connecting consecutive layers are kept.
So what I need is to eliminate edges that form "shortcuts", that is, that jump between non-consecutive layers.
The following considerations apply:
The bluish ring layering is valid because, starting at 83140 and ending at 29518, both branches have the same amount (3) of intermediary nodes, and there is no path that is longer between start and end node;
The green ring, starting at 94347 and ending at 107263, has an invalid edge (already red-crossed), because the left branch encompasses only one intermediary node, while the right branch encompasses three intermediary nodes; Besides, since the first edge of that branch is already valid - we know it pertains to the valid blue ring - it is possible to know which is the right edge to cross-out - otherwise it would be impossible to know which layer should be assigned to node 94030 and so it should be eliminated;
If we consider the pink ring after considering the green one, we know that the lower red-crossed edge is to be removed.
BUT if we consider only the yellow ring, both branches seem to be right (they contain the same number of inner nodes), but actually they only seem right because they contain symmetric errors (shortcuts jumping the same amount of nodes on both branches). If we take this ring locally, at least one of the branches would end up in wrong layers, so it is necessary to use more global data to avoid this error.
My questions are:
What typical concepts and operations are involved in the formulation and possible solution of this problem?
Is there an algorithm for that?

First, topologically sort the graph.
Now from the beginning of sorted array, start breadth first search and try to find the proper "depth" (i.e distance from root) of every node. Since a node can have multiple parents, for a node x, depth[x] is maximum of depth of all it's parents, plus one. We initialize depth for all nodes as -1.
Now in bfs traversal, when we encounter a node p, we try to update the depth of all it's childs c, where depth[c] = max(depth[c],depth[p]+1). Now there are two ways we can detect a child with shortcut.
if depth[p]+1 < depth[c], it means c has a parent with higher depth than p. So edge p to c must be a shortcut.
if depth[p]+1 > depth[c] and depth[c]!=-1, it means c have a parent with lower depth than p. So p is a better parent, and that other parent of c must have a shortcut with p.
In both cases, we mark c as problematic.
Now our goal is for every 'problematic' node x, we check all it's parent, whose depth should be depth[x]-1. If any of them have depth that is lower than that, that one have a shortcut edge with x that needs to be removed.
Since the graph can have multiple roots, we should have a variable to mark visited nodes, and repeat the above thing for any that's left unvisited.
This will sort the yellow ring problem, because before we visit any node, all it's predecessors has already been visited and properly ranked. This is ensured by the topological sort.
(Note : we can do this by just one pass. Instead of marking problematic nodes, we can maintain a parent variable for all nodes, and delete edge with the old parent whenever case 2 occurs. case 1 should be obvious)

How to find nodes at a distance k from the given node in binary tree

I came across THIS geeksforgeeks post to find nodes at distance k from the given node in a binary tree.
I am not able to understand it even after spending multiple hours. Specially the part to find the nodes at distance k in ancestors.
Can someone please please help me with a small dry run on the code/algorithm in the geeksforgeeks post? Or any other easy to understand solution without using parent pointer?

Let's say the depth of target node is D.
If the nodes you want is in the subtree rooted with target node, their depth should be D+k.
After that, you need to find all ancestors of the target node.
For each ancestor, if the depth is d, the distance between this ancestor to the target node is D-d.
So the final step is to find nodes in the other subtree of this ancestor whose distance is k - (D-d).

How can I do this graph traversal?

I have a Directed Cyclic graph consisting of node a, b, c, d, e,f g, where ever node is connected to every other node. The edges may be unidirectional or bidirectional. I need to printout a valid order like this for eg. f->a->c->b->e->d->g such that I can reach the end node from the start node. Note that all the nodes must be present in the output list.
Also note that there may be cycles in the graph.
What I came up with:
Basically first we can try to find a start node. If there is a node such that there is no incoming edge to it (there could be atmost one such node). I may find a start node or may not. Also I will do some preprocessing to find the total number of nodes(lets call it n). Now I will start a DFS from the start node marking nodes as visited when I reach them and counting how many nodes I visited. If I can reach n nodes by this method. I am done. If I hit a node, from which there are no outgoing edges to any unvisited node, I have hit a dead end, and I will just mark that node as unvisited again, reduce the pointer and go to its previous node to try a different route.
This was the case when I find a start node. If I dont find a start node, I will just have to try this with various nodes.
I have no idea if I am even close to the solution. Can anyone help me in this regard?

In my opinion, if there is no incoming edge to a node, it means that node is a start node. You can traverse the graph using this start node. And if this start node can not visit all the n nodes, then there is no solution (as you said that all the nodes must be present in the output list.). This is because if you start with some other nodes, you won't be able to reach this start node.

The problem with your solution is that if you enter a loop you don't know if and when to exit.
A DFS search in these conditions can easily became a non polynomial task!
Let me introduce a polynomial algorithm for your problem.
It looks complicated I hope there's room for simplifications.
Here my suggested solution
1) For each node construct the table of the nodes it can reach (if a can reach b and c; b can reach d; c can reach e; a can reach b,c,d,e even tough there is not a single pathfrom a passing through all of them).
If no node can reach all the other ones you're done: there is no the path you're looking for.
2) Find loops. That's easy: if a node can reach itself, there is a loop. This should be part of the construction of the table at the previous point.
Once you have find one loop you can shrink it (and its nodes) to the representative node whose ingoing (outgoing) connections are the union of the ingoing (outgoing) connections of the nodes in the loop.
You keep reducing loops until you cannot do any more.
3) At this point you are left with an acyclic graph, If there is a path connecting all nodes, there is a single node connected to all and starting from it you can perform depth first search.
4)
Write down the path by replacing the traversal of representative nodes with a loop from the entry point of the loop to the exit point.

Most efficient way to visit nodes of a DAG in order

I have a large (100,000+ nodes) Directed Acyclic Graph (DAG) and would like to run a "visitor" type function on each node in order, where order is defined by the arrows in the graph. i.e. all parents of a node are guaranteed to be visited before the node itself.
If two nodes do not refer to each other directly or indirectly, then I don't care which order they are visited in.
What's the most efficient algorithm to do this?

You would have to perform a topological sort on the nodes, and visit the nodes in the resulting order.
The complexity of such algorithm is O(|V|+|E|) which is quite good. You want to traverse all nodes, so if you would want a faster algorithm than that, you would have to solve it without even looking at all edges, which would be dangerous, because one single edge could havoc the order completely.

There are some answers here:
Good graph traversal algorithm
and here:
http://en.wikipedia.org/wiki/Topological_sorting
In general, after visiting a node, you should visit its related nodes, but only the nodes that are not already visited. In order to keep track of the visited nodes, you need to keep the IDs of the nodes in a set (or map), or you can mark the node as visited (somehow).
If you care about the topological order, you must first get hold of a collection of all the un-traversed links ("remaining links") to a node, sorted by the id of the referenced node (typically: map(node-ID -> link-count)). If you haven't got that, you might need to build it using an approach similar to the one above. Then, start by visiting a node whose remaining incoming link count is zero. For each link from that node, reduce the remaining link count for each related node, adding the related node to the set of nodes-to-visit (or just visiting the node) if the count reaches zero.

As mentioned in the other answers, this problem can be solved by Topological Sorting.
A very simple algorithm for that (not the most efficient):
Keep an array (or map) indegree[] where indegree[node]=number of incoming edges of node
while there is at least one node n with indegree[n]=0:
for each node n in nodes where indegree[n]>0:
visit(n)
indegree[n]=-1 # mark n as visited
for each node x adjacent to n:
indegree[x]=indegree[x]-1 # its parent has been visited, so one less edge coming into it

You can traverse a DAG in O(N) (without any topsort) by just running your dfs from every node with zero indegree, because those will be the valid "starting point". This will work because graph has no cycles, those zero indegree nodes must exist, and must traverse the whole graph.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio