I am trying to implement Dijkstra's algorithm for finding the shortest paths between nodes using Fibonacci Heap with Adjacency List representation for the graph. According to the algorithm I know, we have to find the minimum node in the Heap and then iterate through all its neighbours and update their distances. But to get the current distances of the neighbours(which is stored in each node in the heap), I have to find that particular node from the heap. 'Find' operation takes O(N) time where N is number of nodes in the Fibonacci Heap. So is my algorithm correct or am I missing something? Any help would be greatly appreciated.
An implicit assumption in a Fibonacci heap is that when you enqueue an element into the heap, you can, in time O(1), pull up the resulting node in the Fibonacci heap. There are two common ways you'll see this done.
First, you could use an invasive Fibonacci heap. In this approach, the actual node structures in the graph have extra fields stored in them corresponding to what the Fibonacci heap needs to keep things linked up the right order. If you set things up this way, then when you iterate over a node's neighbors, those neighbors already have the necessary fields built into them, which makes it easy to query the Fibonacci heap on those nodes without needing to do a search for them.
Second, you can have the enqueue operation on the Fibonacci heap return a pointer to the newly-created Fibonacci heap node, then somehow associate each node in the graph with that Fibonacci heap node (maybe store a pointer to it, or have a hash table, etc.). This is the approach I took when I coded up a version of Dijkstra's algorithm that uses Fibonacci heap many years back.
Related
I have two binary trees. One, A which I can access its nodes and pointers (left, right, parent) and B which I don't have access to any of its internals. The idea is to copy A into B by iterating over the nodes of A and doing an insert into B. B being an AVL tree, is there a traversal on A (preorder, inorder, postorder) so that there is a minimum number of rotations when inserting elements to B?
Edit:
The tree A is balanced, I just don't know the exact implementation;
Iteration on tree A needs to be done using only pointers (the programming language is C and there is no queue or stack data structure that I can make use of).
Rebalancing in AVL happens when the depth of one part of the tree exceeds the depth of some other part of the tree by more than one. So to avoid triggering a rebalance you want to feed nodes into the AVL tree one level at a time; that is, feed it all of the nodes from level N of the original tree before you feed it any of the nodes from level N+1.
That ordering would be achieved by a breadth-first traversal of the original tree.
Edit
OP added:
Iteration on tree A needs to be done using only pointers (the
programming language is C and there is no queue or stack data
structure that I can make use of).
That does not affect the answer to the question as posed, which is still that a breadth-first traversal requires the fewest rebalances.
It does affect the way you will implement the breadth-first traversal. If you can't use a predefined queue then there are several ways that you could implement your own queue in C: an array, if permitted, or some variety of linked list are the obvious choices.
If you aren't allowed to use dynamic memory allocation, and the size of the original tree is not bounded such that you can build a queue using a fixed buffer that is sized for the worst case, then you can abandon the queue-based approach and instead use recursion to visit successively deeper levels of the tree. (Imagine a recursive traversal that stops when it reaches a specified depth in the tree, and only emits a result for nodes at that specified depth. Wrap that recursion in a while or for loop that runs from a depth of zero to the maximum depth of the tree.)
If the original tree is not necessarily AVL-balanced, then you can't just copy it.
To ensure that there is no rebalancing in the new tree, you should create a complete binary tree, and you should insert the nodes in BFS/level order so that every intermediate tree is also complete.
A "complete" tree is one in which every level is full, except possibly the last. Since every complete tree is AVL-balanced, and every intermediate tree is complete, there will be no rebalancing required.
If you can't copy your original tree out into an array or other data structure, then you'll need to do log(N) in-order traversals of the original tree to copy all the nodes. During the first traversal, you select and copy the root. During the second, you select and copy level 2. During the third, you copy level 3, etc.
Whether or not a source node is selected for each level depends only on its index within the source tree, so the actual structure of the source tree is irrelevant.
Since each traversal takes O(N) time, the total time spent traversing is O(N log N). Since inserts take O(log N) time, though, that is how long insertion takes as well, so doing log N traversals does not increase the complexity of the overall process.
Let's say I have an array and I have to answer queries like find the sum of all elements from index i to j, now can I do this on a rooted tree, like answering such queries for path from node i to j ( On the only path that exists from i to j).
I know how to find LCA using range minimum query where we decompose it to linear array and then use a segment tree but I am not able to modify it for sum queries. How do I do that?
This depends on your processing requirements: do you have complexity limits, or is the goal to have working, maintainable code by lunch time?
If it's the latter, take the simple approach:
Locate both nodes in the tree.
Starting at the first node, traverse back to the root, summing the path cost as you go, maintaining the partial sum at each node.
Starting at the second node, also traverse-and-sum (full sum only, no partials) toward the root.
When you encounter a node that's an ancestor of the first one, you've found the LCA.
Add your current sum to the first node's partial sum for the LCA. Done.
This algorithm is easily understood by a first-term data structures student. With the consistent check for LCA, it's O(log(n)^2) -- not bad, but not the optimum of linear pre-work and constant-time query return for LCA.
If you need a faster algorithm, then I suggest that you augment the LCA pre-work algorithm so that each node also computes a hashed list (e.g. Python dictionary) of partial sums to each of its ancestors. Once you've done that, you have a constant-time computation for the LCA, and a constant-time look-up for each partial sum.
apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow
There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.
I found that there are two ways to implement Prim algorithm, and that the time complexity with an adjacency matrix is O(V^2) while time complexity with a heap and adjacency list is O(E lg(V)).
I'm wondering can I use a heap when graph is represented with adjacency matrix. Does it make sense? If it does, is there any difference between adjacency matrix + heap and adjacency list + heap?
Generally, the matrix graph-representation is not so good for Prim's algorithm.
This is because of the main iteration of the algorithm, which pops out a node from the heap, and then scans its neighbors. How do you find its neighbors? Using the matrix graph representation, you basically need to loop over an entire matrix row (in the list graph-representation, you just need to loop over the node's list, which can be significantly shorter).
This means that, irrespective of the heap, just the sum of the part of finding the neighbor's of the popped node is already Ω(|V|2), as each node's row is eventually scanned.
So, no - it doesn't make much sense. The heap does not reduce the overall complexity.
This picture from Wikipedia article has three nodes of a Fibonacci heap marked in blue . What is the purpose of some of the nodes being marked in this data structure ?
Intuitively, the Fibonacci heap maintains a collection of trees of different orders, coalescing them when a delete-min occurs. The hope in constructing a Fibonacci heap is that each tree holds a large number of nodes. The more nodes in each tree, the fewer the number of trees that need to be stored in the tree, and therefore the less time will be spent coalescing trees on each delete-min.
At the same time, the Fibonacci heap tries to make the decrease-key operation as fast as possible. To do this, Fibonacci heaps allow subtrees to be "cut out" of other trees and moved back up to the root. This makes decrease-key faster, but makes each tree hold fewer nodes (and also increases the number of trees). There is therefore a fundamental tension in the structure of the design.
To get this to work, the shape of the trees in the Fibonacci heap have to be somewhat constrained. Intuitively, the trees in a Fibonacci heap are binomial trees that are allowed to lose a small number of children. Specifically, each tree in a Fibonacci heap is allowed to lose at most two children before that tree needs to be "reprocessed" at a later step. The marking step in the Fibonacci heap allows the data structure to count how many children have been lost so far. An unmarked node has lost no children, and a marked node has lost one child. Once a marked node loses another child, it has lost two children and thus needs to be moved back to the root list for reprocessing.
The specifics of why this works are documented in many introductory algorithms textbooks. It's not obvious that this should work at all, and the math is a bit tricky.
Hopefully this provides a useful intuition!
A node is marked when one of its child nodes is cut because of a decrease-key. When a second child is cut, the node also cuts itself from its parent. Marking is done so that you know when the second cut occurs.
Good Explanation from Wiki: Operation decrease key will take the node, decrease the key and if the heap property becomes violated (the new key is smaller than the key of the parent), the node is cut from its parent. If the parent is not a root, it is marked. If it has been marked already, it is cut as well and its parent is marked. We continue upwards until we reach either the root or an unmarked node. Now we set the minimum pointer to the decreased value if it is the new minimum.