I've some texts that the sequence follows a specific order. Some texts change in consequence of the traversed trail. My goal is generate static pages for each page, interconnecting them through links.
The question is to solve a problem for a tool that will generate text for printed books (that is static, obviously). So imagine you are reading a book that is represented in the Example 1 (on the image bellow). Initially, you're in the node A, and the text of this page is "Go to page B or page C". Choosing the node C, followed of F -> B -> E -> H, you'll see a content in the node H that should be different than what you would see whether you have been traversed by A -> B -> D -> H, for instance. As it is a printed book, I need to duplicate some paths to make possible change the content of some nodes according with the path traversed.
Example:
In this example, I have two possibilities for traversing:
A -> B -> D
A -> C -> D
Expected result:
Page 1: A (link to page 2 and 3)
Page 2: B (link to page 4)
Page 3: C (link to page 5)
Page 4: D
Page 5: DD'
This simple example generates 5 pages, once the page 4 has a part of text that should be displayed only whether the reading passes through page 3.
To model this problem I chose use the Graph Theory. For a better understanding, I drew in the below graph two examples of the problem I'm trying to solve :
Note that the red dashed edges are not edges in fact. These are a way that I've used to represent when the content of the a given node X changes in consequence of visiting the node Y (reads "the content of node X changes if the path to arrive in X passes by Y").
I read a lot about graphs, traversing strategies (BFS and DFS) and some other topics. My goal is develop a algorithm that rearranges a given graph in a manner to be possible generate the pages mentioned previously. I didn't find any well-known problem that solves this problem, but I believe it should already exist. My research didn't find anything useful, so I tried to solve by myself.
My successfully approach consisted in traversing the graph up to find a node that contains a content that depends of others nodes. Once this node has been found, finds all paths from the dependent nodes to the current node. Traverses these paths duplicating all nodes that contains more than one incoming edge, removing the previous connection and connecting the current node with the duplicated one, and so on until consume all nodes of the path. This algorithm works well, but this approach is not efficient and can be very slow with long texts.
My question is: do you know any other better way to solve this problem? Is there any theory or known algorithm that can solve this kind of problem?
Thanks in advance.
Do a DFS and when you see a visited node, duplicate it, break the link through which you just visited and mark the new node as visited and continue dfs from this node. This method does not visit node multiple times and hence is the fastest(meaning it will visit H1 just exacyly 2 times not n or k times).
This is linear in terms of output graph. That is if the output graph has V' vertices and E' edges its order is O(V'+E'). You can not achieve better as you must visit everything in output graph atleast once.
I am assuming the rules of these red edges are stactic. Keep multiple content in one node instead of duplicating it. Now as the content displayed depends on the path taken to reach it, at each step we can check the "stack" of the DFS to see the path taken to reach it.The stack will give us the exact path taken to reach it (but note that it will not give details whether path visited other decendants of the parent). Then we compare to our static rules that we already have and display the content.
Time complexity analysis(Worst case):
At each step of DFS we check the entire stack against the rules. The maximum length of stack can be h(where h is the hight of tree). Hence the time complexity is O((V+E)*h).
Alternatively, if path visited other decendants of the parent matters(like analysing path A->B->E and if it matters D was already visted ), you can introduce the red edges yourself on the data-structure based upon the rules. Again keep multiple content in a node. While deciding which content to display simply check if the "endpoint" of "red edges originating from this vertice" are already visited. Now use the rules to display the appropriate content
Related
The problem statement
Given a directed graph of N nodes where each node is pointing to any one of the N nodes (can possibly point to itself). Ishu, the coder, is bored and he has discovered a problem out of it to keep himself busy. Problem is as follows:
A node is 'good' if it satisfies one of the following:
It is the special node (marked as node 1)
It is pointing to the special node (node 1)
It is pointing to a good node.
Ishu is going to change pointers of some nodes to make them all 'good'. You have to find the minimum number of pointers to change in order to make all the nodes good (Thus, a Good Graph).
NOTE: Resultant Graph should hold the property that all nodes are good and each node must point to exactly one node.
Problem Constraints
1 <= N <= 10^5
Input Format
First and only argument is an integer array A containing N numbers all between 1 to N, where i-th number is the number of node that i-th node is pointing to.
Output Format
An Integer denoting minimum number of pointer changes.
My Attempted Solution
I tried building a graph by reversing the edges, and then trying to colour connected nodes, the answer would be colors - 1. In short , I was attempting to find number of connected components for a directed graph, which does not make sense(as directed graphs do not have any concept of connected components). Other solutions on the web like this and this also point out to find number of connected components, but again connected components for aa directed graph does not make any sense to me. The question looks a bit trickier to me than a first look at it suggests.
#mcdowella gives an accurate characterization of the graph -- in each connected component, all chains of pointers and up in the same dead end or cycle.
There's a complication if the special node has a non-null pointer, though. If the special node has a non-null pointer, then it may or may not be in the terminal cycle for its connected component. First set the special node pointer to null to make this problem go away, and then:
If the graph has m connected components, then m-1 pointers would have to be changed to make all nodes good.
Since you don't have to actually change the pointers, all you need to do to solve the problem is count the connected components in the graph.
There are many ways to do this. If you think of the edges as undirected, then you could use BFS or DFS to trace each connected component, for example.
With the kind of graph you actually have, though, where each node has a directed pointer, the easiest way to solve this is with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Initially you just put each node in its own set, then union the sets on either side of each pointer, and then count the number of sets by counting the number of roots in the disjoint set structure.
If you haven't implemented a disjoint set structure before, it's a little tough to understand just how easy it is. There are couple simple implementations here: How to properly implement disjoint set data structure for finding spanning forests in Python?
I think this is a very similar question to Graphic: Rerouting problem test in python language. Because each node points to just one other node, your graph consists of a set of cycles (counting a node pointing to itself as a cycle) with trees feeding in to them. You can find a cycle by following pointers between nodes and checking for nodes that you have already visited. You need to break one link in each cycle, redirecting it to point to the special node. This routes everything else in that cycle, and all the trees that feed into it, to the special node.
Here is the problem:
assuming two persons are registered in a social networking website, how to decide whether they are connected or not?
my analysis (after reading more): actually, the question is looking for - the shortest path from A to B in a graph. I think both BFS and Dijkstra's Algorithms works here and time complexity is exactly the same (O(V+E)) because it is an unweighted graph, so we can't take advantage of the priority queue. So, a simple queue could resolve the problem. But, both of them doesnt resolve the problem that: find the path between them.
Bidrectrol should be a better solution at this point.
To find a path between the two, you should begin with a breadth first search. First find all neighbors of A, then find all neighbors of all neighbors of A, etc. Once B is hit, not only do you have a path from A to B, but you also have a shortest such path.
Dijkstra's algorithm rocks, and you may be able to speed this up by working from both end, i.e. find neighbors of A and neighbors of B, and compare.
If you do a depth first search, then you're following one path at a time. This will be much much slower.
If you do dfs for finding whether two people are connected on a social network, then it will take too long!
You already know the two persons, so you should use Bidirectional Search.. But, simple bidirectional search won't be enough for a graph as big as a social networking site. You will have to use some heuristics. Wikipedia page has some links to it.
You may also be able to use A* search. From wikipedia : "A* uses a best-first search and finds the least-cost path from a given initial node to one goal node (out of one or more possible goals)."
Edit: I suggest A* because "The additional complexity of performing a bidirectional search means that the A* search algorithm is often a better choice if we have a reasonable heuristic." So, if you can't form a reasonable heuristic, then use Bidirectional search. (Forming a good heuristic is never easy ;).)
One way is to use Union Find, add all links union(from,to), and if find(A) is find(B) is True then A and B are connected. This avoids the recursive search but it actually computes the connectivity of all pairs and doesn't give you the paths that connects A and B.
I think that the true criteria is: there are at least N paths between A and B shorter then K, or A and B are connected diectly. I would go with K = 3 and N near 5, i.e. have 5 common friends.
Note: answer edited.
Any method might end up being very slow. If you need to do this repeatedly, it's best to find the connected components of the graph, after which the task becomes a trivial O(1) operation: if two people are in the same component, they are connected.
Note that finding connected components for the first time might be slow, but keeping them updated as new edges/nodes are added to the graph is fast.
There are several methods for finding connected components.
One method is to construct the Laplacian of the graph, and look at its eigenvalues / eigenvectors. The number of zero eigenvalues gives you the number of connected components. The non-zero elements of the corresponding eigenvectors gives the nodes belonging to the respective components.
Another way is along the following lines:
Create a transformation table of nodes. Element n of the array contains the index of the node that node n transforms to.
Loop through all edges (i,j) in the graph (denoting a connection between i and j):
Compute recursively which node do i and j transform to based on the current table. Let us denote the results by k and l. Update entry k to make it transform to l. Update entries i and j to point to l as well.
Loop through the table again, and update each entry to point directly to the node it recursively transforms to.
Now nodes in the same connected component will have the same entry in the transformation table. So to check if two nodes are connected, just check if they transform to the same value.
Every time a new node or edge is added to the graph, the transformation table needs to be updated, but this update will be much faster than the original calculation of the table.
Can someone explain Breadth-first search to solve the below kind of problems
I need to find all the paths between 4 and 7
You look at all the nodes adjacent to your start node. Then you look at all the nodes adjacent to those (not returning to nodes you've already looked at). Repeat until satisfying node found or no more nodes.
For the kind of problem you indicate, you use the process described above to build a set of paths, terminating any that arrive at the desired destination node, and when your graph is exhausted, the set of paths that so terminated is your solution set.
Breadth first search (BFS) means that you process all of your starting nodes' direct children, before going to deeper levels. BFS is implemented with a queue which stores a list of nodes that need to be processed.
You start the algorithm by adding your start node to the queue. Then you continue to run your algorithm until you have nothing left to process in the queue.
When you dequeue an element from the queue, it becomes your active node. When you are processing your active node, you add all of its children to the back of the queue. You terminate the algorithm when you have an empty queue.
Since you are dealing with a graph instead of a tree, you need to store a list of your processed nodes so you don't enter into a loop.
Once you reach node 7 you have a matching path and you can stop doing the BFS for that recursive path. Meaning that you simply don't add its children to the queue. To avoid loops and to know the exact path you've visited, as you do your recursive BFS calls, pass up the already visited nodes.
Think of it as websites with links to other sites on them. Let A be our root node or our starting website.
A is the starting page (Layer 1)
A links to AA, AB, AC (Layer 2)
AA links to AAA, AAB, AAC (Layer 3)
AB links to ABA, ABB, ABC (Layer 3)
AC links to ACA, ACB, ACC (Layer 3)
This is only three layers deep. You search one layer at a time. So in this case you would start at layer A. If that does not match you go to the next layer, AA, AB and AC. If none of those websites is the one you are searching for then you follow the links and go to the next layer. In other words, you look at one layer at a time.
A depth first search (its complement) you would go from A to AA to AAA. In other words you would go DEEP before going WIDE.
You test each node connected to the root node. Then you test each node connected to the previous nodes. So on, until you find your answer.
Basically, each iteration tests nodes that are the same distance away from the root node.
BEGIN
4;
4,2;
4,2,1; 4,2,3; 4,2,5;
4,2,1;(fail) 4,2,3,6; 4,2,5,6; 4,2,5,11;
4,2,3,6,7;(pass) 4,2,3,6,8; 4,2,5,6,7;(pass) 4,2,5,6,8; 4,2,5,11,12;
4,2,3,6,8,9; 4,2,3,6,8,10; 4,2,5,6,8,9; 4,2,5,6,8,10; 4,2,5,11,12;(fail)
4,2,3,6,8,9;(fail) 4,2,3,6,8,10;(fail) 4,2,5,6,8,9;(fail) 4,2,5,6,8,10;(fail)
END
I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves.
The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy.
Example:
(source: theuprightape.net)
On that structure, I want to perform the following operations:
find all items (sinks) below a particular node (all items in Europe)
find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com)
find all nodes that lie below all of a set of nodes (intersection: goyish brown foods)
The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations?
How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach?
The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?
It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.
Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.
Also see Robert Tarjan's basic algorithms.
There is a directed graph (not necessarily connected) of which one or more nodes are distinguished as sources. Any node accessible from any one of the sources is considered 'lit'.
Now suppose one of the edges is removed. The problem is to determine the nodes that were previously lit and are not lit anymore.
An analogy like city electricity system may be considered, I presume.
This is a "dynamic graph reachability" problem. The following paper should be useful:
A fully dynamic reachability algorithm for directed graphs with an almost linear update time. Liam Roditty, Uri Zwick. Theory of Computing, 2002.
This gives an algorithm with O(m * sqrt(n))-time updates (amortized) and O(sqrt(n))-time queries on a possibly-cyclic graph (where m is the number of edges and n the number of nodes). If the graph is acyclic, this can be improved to O(m)-time updates (amortized) and O(n/log n)-time queries.
It's always possible you could do better than this given the specific structure of your problem, or by trading space for time.
If instead of just "lit" or "unlit" you would keep a set of nodes from which a node is powered or lit, and consider a node with an empty set as "unlit" and a node with a non-empty set as "lit", then removing an edge would simply involve removing the source node from the target node's set.
EDIT: Forgot this:
And if you remove the last lit-from-node in the set, traverse the edges and remove the node you just "unlit" from their set (and possibly traverse from there too, and so on)
EDIT2 (rephrase for tafa):
Firstly: I misread the original question and thought that it stated that for each node it was already known to be lit or unlit, which as I re-read it now, was not mentioned.
However, if for each node in your network you store a set containing the nodes it was lit through, you can easily traverse the graph from the removed edge and fix up any lit/unlit references.
So for example if we have nodes A,B,C,D like this: (lame attempt at ascii art)
A -> B >- D
\-> C >-/
Then at node A you would store that it was a source (and thus lit by itself), and in both B and C you would store they were lit by A, and in D you would store that it was lit by both A and C.
Then say we remove the edge from B to D: In D we remove B from the lit-source-list, but it remains lit as it is still lit by A. Next say we remove the edge from A to C after that: A is removed from C's set, and thus C is no longer lit. We then go on to traverse the edges that originated at C, and remove C from D's set which is now also unlit. In this case we are done, but if the set was bigger, we'd just go on from D.
This algorithm will only ever visit the nodes that are directly affected by a removal or addition of an edge, and as such (apart from the extra storage needed at each node) should be close to optimal.
Is this your homework?
The simplest solution is to do a DFS (http://en.wikipedia.org/wiki/Depth-first_search) or a BFS (http://en.wikipedia.org/wiki/Breadth-first_search) on the original graph starting from the source nodes. This will get you all the original lit nodes.
Now remove the edge in question. Do again the DFS. You can the nodes which still remain lit.
Output the nodes that appear in the first set but not the second.
This is an asymptotically optimal algorithm, since you do two DFSs (or BFSs) which take O(n + m) times and space (where n = number of nodes, m = number of edges), which dominate the complexity. You need at least o(n + m) time and space to read the input, therefore the algorithm is optimal.
Now if you want to remove several edges, that would be interesting. In this case, we would be talking about dynamic data structures. Is this what you intended?
EDIT: Taking into account the comments:
not connected is not a problem, since nodes in unreachable connected components will not be reached during the search
there is a smart way to do the DFS or BFS from all nodes at once (I will describe BFS). You just have to put them all at the beginning on the stack/queue.
Pseudo code for a BFS which searches for all nodes reachable from any of the starting nodes:
Queue q = [all starting nodes]
while (q not empty)
{
x = q.pop()
forall (y neighbour of x) {
if (y was not visited) {
visited[y] = true
q.push(y)
}
}
}
Replace Queue with a Stack and you get a sort of DFS.
How big and how connected are the graphs? You could store all paths from the source nodes to all other nodes and look for nodes where all paths to that node contain one of the remove edges.
EDIT: Extend this description a bit
Do a DFS from each source node. Keep track of all paths generated to each node (as edges, not vertices, so then we only need to know the edges involved, not their order, and so we can use a bitmap). Keep a count for each node of the number of paths from source to node.
Now iterate over the paths. Remove any path that contains the removed edge(s) and decrement the counter for that node. If a node counter is decremented to zero, it was lit and now isn't.
I would keep the information of connected source nodes on the edges while building the graph.(such as if edge has connectivity to the sources S1 and S2, its source list contains S1 and S2 ) And create the Nodes with the information of input edges and output edges. When an edge is removed, update the output edges of the target node of that edge by considering the input edges of the node. And traverse thru all the target nodes of the updated edges by using DFS or BFS. (In case of a cycle graph, consider marking). While updating the graph, it is also possible to find nodes without any edge that has source connection (lit->unlit nodes). However, it might not be a good solution, if you'd like to remove multiple edges at the same time since that may cause to traverse over same edges again and again.