How to answer queries on a path on a tree? - algorithm

Let's say I have an array and I have to answer queries like find the sum of all elements from index i to j, now can I do this on a rooted tree, like answering such queries for path from node i to j ( On the only path that exists from i to j).
I know how to find LCA using range minimum query where we decompose it to linear array and then use a segment tree but I am not able to modify it for sum queries. How do I do that?

This depends on your processing requirements: do you have complexity limits, or is the goal to have working, maintainable code by lunch time?
If it's the latter, take the simple approach:
Locate both nodes in the tree.
Starting at the first node, traverse back to the root, summing the path cost as you go, maintaining the partial sum at each node.
Starting at the second node, also traverse-and-sum (full sum only, no partials) toward the root.
When you encounter a node that's an ancestor of the first one, you've found the LCA.
Add your current sum to the first node's partial sum for the LCA. Done.
This algorithm is easily understood by a first-term data structures student. With the consistent check for LCA, it's O(log(n)^2) -- not bad, but not the optimum of linear pre-work and constant-time query return for LCA.
If you need a faster algorithm, then I suggest that you augment the LCA pre-work algorithm so that each node also computes a hashed list (e.g. Python dictionary) of partial sums to each of its ancestors. Once you've done that, you have a constant-time computation for the LCA, and a constant-time look-up for each partial sum.

Related

Graph represented as adjacency list, as binary tree, is it possible?

apologies first, english is not my first language.
So here's my understanding on graph that's represented as adjancey list: It's usually used for sparse graph, which is the case for most of graphs, and it uses V (number of vertex) lists. so, V head pointers + 2e (# of edges) nodes for undirected graph. Therefore, space complexity = O(E+V)
Since any node can have upto V-1 edges (excluding itself) it has time complexity of O(V) to check a node's adjacency.
As to check all the edges, it takes O(2e + V) so O(v + e)
Now, since it's mostly used for sparse graph, it's rarely O(v) to check adjacency, but simply the number of edges a given vertex has (which is O(V) at worst since V-1 is the possible maximum)
What I'm wondering is, is it possible to make the list (the edge nodes) binary tree? So to find out whether node A is adjacent to node B, time complexity would be O(logn) and not linear O(n).
If it is possible, is it actually done quite often? Also, what is that kind of data structure called? I've been googling if such combinations are possible but couldn't find anything. I would be very grateful if anyone could explain this to me in detail as i'm new to data structure. Thank you.
Edit: I know binary search can be performed on arrays. I'm talking about linked list representation, I thought I made it obvious when I said heads to the lists but wow
There's no reason the adjacency list for each vertex couldn't be stored as a binary tree, but there are tradoffs.
As you say, this adjacency list representation is often used for sparse graphs. Often, "sparse graph" means that a particular vertex is adjacent to few others. So your "adjacency list" for a particular vertex would be very small. Whereas it's true that binary search is O(log n) and sequential search is O(n), when n is very small sequential search is faster. I've seen cases where sequential search beats binary search when n is smaller than 16. It depends on the implementation, of course, but don't count on binary search being faster for small lists.
Another thing to think about is memory. Linked list overhead is one pointer per node. Unless, of course, you're using a doubly linked list. Binary tree overhead is two pointers per node. Perhaps not a big deal, but if you're trying to represent a very large graph, that extra pointer will become important.
If the graph will be updated frequently at run time, you have to take that into account, too. Adding a new edge to a linked list of edges is an O(1) operation. But adding an edge to a binary tree will require O(log n). And you want to make sure you keep that tree balanced. An unbalanced tree starts to act like a linked list.
So, yes, you could make your adjacency lists binary trees. You have to decide whether it's worth the extra effort, based on your application's speed requirements and the nature of your data.

Time/Space Complexity of Depth First Search

I've looked at various other StackOverflow answer's and they all are different to what my lecturer has written in his slides.
Depth First Search has a time complexity of O(b^m), where b is the
maximum branching factor of the search tree and m is the maximum depth
of the state space. Terrible if m is much larger than d, but if search
tree is "bushy", may be much faster than Breadth First Search.
He goes on to say..
The space complexity is O(bm), i.e. space linear in length of action
sequence! Need only store a single path from the root to the leaf
node, along with remaining unexpanded sibling nodes for each node on
path.
Another answer on StackOverflow states that it is O(n + m).
Time Complexity: If you can access each node in O(1) time, then with branching factor of b and max depth of m, the total number of nodes in this tree would be worst case = 1 + b + b2 + … + bm-1. Using the formula for summing a geometric sequence (or even solving it ourselves) tells that this sums to = (bm - 1)/(b - 1), resulting in total time to visit each node proportional to bm. Hence the complexity = O(bm).
On the other hand, if instead of using the branching factor and max depth you have the number of nodes n, then you can directly say that the complexity will be proportional to n or equal to O(n).
The other answers that you have linked in your question are similarly using different terminologies. The idea is same everywhere. Some solutions have added the edge count too to make the answer more precise, but in general, node count is sufficient to describe the complexity.
Space Complexity: The length of longest path = m. For each node, you have to store its siblings so that when you have visited all the children, and you come back to a parent node, you can know which sibling to explore next. For m nodes down the path, you will have to store b nodes extra for each of the m nodes. That’s how you get an O(bm) space complexity.
The complexity is O(n + m) where n is the number of nodes in your tree, and m is the number of edges.
The reason why your teacher represents the complexity as O(b ^ m), is probably because he wants to stress the difference between Depth First Search and Breadth First Search.
When using BFS, if your tree has a very large amount of spread compared to it's depth, and you're expecting results to be found at the leaves, then clearly DFS would make much more sense here as it reaches leaves faster than BFS, even though they both reach the last node in the same amount of time (work).
When a tree is very deep, and non-leaves can give information about deeper nodes, BFS can detect ways to prune the search tree in order to reduce the amount of nodes necessary to find your goal. Clearly, the higher up the tree you discover you can prune a sub tree, the more nodes you can skip.
This is harder when you're using DFS, because you're prioritize reaching a leaf over exploring nodes that are closer to the root.
I suppose this DFS time/space complexity is taught on an AI class but not on Algorithm class.
The DFS Search Tree here has slightly different meaning:
A node is a bookkeeping data structure used to represent the search
tree. A state corresponds to a configuration of the world. ...
Furthermore, two different nodes can contain the same world state if
that state is generated via two different search paths.
Quoted from book 'Artificial Intelligence - A Modern Approach'
So the time/space complexity here is focused on you visit nodes and check whether this is the goal state. #displayName already give a very clear explanation.
While O(m+n) is in algorithm class, the focus is the algorithm itself, when we store the graph as adjacency list and how we discover nodes.

How to find the number of edges or vertices between any two vertices of a tree?

Its a general tree (acyclic graph), so there can be only one such path. What algorithm can I use for this?
EDIT:
Need to find the paths for all pairs of vertices in the tree
I want to extend #templatetypedef's answer here1.
Note that in your problem, you need to do at least one write per each pair of nodes in the tree. There are n*(n-1)/2 of these.
Thus, in terms of big O notation, you cannot find an algorithm that runs better than O(n^2).
Now, use DFS or BFS to find the path per node. It will run in O(n+m) (n vertices, m edges). But since it is a tree, we know that m=n-1, giving us O(n) for BFS/DFS. Note that in a single BFS/DFS from some node v - you get d(v,u) for EVERY node u.
If you repeat it per each node, it will get you O(n^2) - which is optimal in terms of big O notation. I do agree you might get better constants with some optimizations, but that's about it.
(1) Started it as a comment, but it got too long and I figured it worth an answer.
One simple option is to do a depth-first search starting at one of the nodes until you reach the other node. You can then look at the path that was taken from the first node to the second, which must be the unique path between those nodes, and then count how many edges are in that path. This runs in linear time because a DFS on a tree only takes linear time.
Hope this helps!
You need to find the Lowest Common Ancestor (LCA). There are different approaches as you can study here: http://leetcode.com/2011/07/lowest-common-ancestor-of-a-binary-tree-part-i.html
I have used the fallowing solution since I'm not fan to recursivity, that should work better for your problem since you need to find all the pairs in an efficient manner.
1-) Finding path between A -> B :
-Iterate starting from node A going up each parent node flagging each one with A Flag, until parent root fund.
-Iterate starting from node B going up until A Flag found. You have found LCA node.
- Result path is a list from A to LCA node, plus a reversed list from B to LCA node.
2-) Improvement finding A -> B :
- Iterate both nodes simultaneously, flagging with A Flag, and B Flag each ancestor node. Until A iterations find B Flag or B iterations find A Flag. Then the first to find another node flag has found the LCA node.
3-) Finding all pairs paths:
You can simply use solution above for each pair.
Or try to consider making a first pass iterating all the tree creating lists of flags, then a second pass to identify all LCA nodes for each pair, that would be inconvenient when the tree number of nodes grows too big.

Determining if a tree walk is breadth first, depth first, or neither

Given a tree T and a sequence of nodes S, with the only constraint on S being that it's done through some type of recursion - that is, a node can only appear in S if all of its ancestors have already appeared, what's a good algorithm to determine if S is a breadth first visit, a depth first visit, or neither?
A brute force approach is to compute every breadth first and depth first sequences and see if any is identical to S. Is there a better approach?
What if we don't want a yes or no answer, but a measure of distance?
UPDATE 1 By measure of distance, I mean that a visit may not be an exact BFS, but it's close (a few edits might make it one); I'd like to be able to order them and say BFS < S < R < U < DFS.
UPDATE 2 Of course, a brute force enumeration of every BFS or DFS can answer the question; I'd like something more efficient.
You have the tree and the sequence, right? In that case it is pretty easy to determine if a sequence is breadth first search or not, and if it is depth first or not.
To check if it is breadth first: divide the nodes into groups L0, L1, ..., Lk where L0 is the set of 0 level nodes (there is only one root node, so its size is 1), L2 is the set of level 1 nodes and so on. If sequence S = (permutation(L0), permutation(1), ...) then it is a breadth first search.
To check if it is depth first: start with a pointer to the first node in the sequence and root node of the tree. They should be same. Next element of the sequence must be a child of previous node, if the previous node has any children at all. If there is a conflict then it is not a DFS sequence. If there is no child, then the next sequence element must be child of parent of previous node,... and so on. This approach is not as complicated as it sounds and could be easily implemented with the help of a stack.
I am not very sure for your need for "measure of distance". But as you can see, both of these approaches can return number of conflicts. Maybe you can use it to calculate "distance"?

Why in-order traversal of a threaded tree is O(N)?

I can't seem to figure out how the in-order traversal of a threaded binary tree is O(N)..
Because you have to descend the links to find the the leftmost child and then go back by the thread when you want to add the parent to the traversal path. would not that be O(N^2)?
Thanks!
The traversal of a tree (threaded or not) is O(N) because visiting any node, starting from its parent, is O(1). The visitation of a node consists of three fixed operations: descending to the node from parent, the visitation proper (spending time at the node), and then returning to the parent. O(1 * N) is O(N).
The ultimate way to look at it is that the tree is a graph, and the traversal crosses each edge in the graph only twice. And the number of edges is proportional to the number of nodes since there are no cycles or redundant edges (each node can be reached by one unique path). A tree with N nodes has exactly N-1 edges: each node has an edge leading to it from its parent node, except for the root node of the tree.
At times it appears as if visiting a node requires more than one descent. For instance, after visiting the rightmost node in a subtree, we have to pop back up numerous levels before we can march to the right into the next subtree. But we did not descend all the way down just to visit that node. Each one-level descent can be accounted for as being necessary for visiting just the node immediately below, and the opposite ascent's
cost is lumped with that. By visiting a node V, we also gain access to all the nodes below it, but all those nodes benefit from and share the edge traversal from V's parent down to V, and back up again.
This is related to amortized analysis, which applies in situations where we can globally understand the overall cost based on some general observation about the structure of the problem, but at the detailed level of the individual operations, the costs are distributed in an uneven way that appears confusing.
Amortized analysis helps us understand that, for instance, N insertions into a hash table which resizes itself by growing exponentially are O(N). Most of the insertion operations are quick, but from time to time, we grow the table and process its contents. This is similar to how, from time to time during a tree traversal, we have to perform numerous consecutive ascents to climb out of a deep subtree.
The global observation about the hash table is that each item inserted into the table will move to a larger table on average about three times in three resize operations, and so each insertion can be regarded as "pre paying" for three re-insertions, which is a fixed cost. Of course, "older" items will be moved more times, but this is offset by "younger" entries that move fewer times, diluting the cost. And the global observation about the tree was already noted above: it has N-1 edges, each of which are traversed exactly twice during the traversal, so the visitation of each node "pays" for the double traversal of its respective edge. Because this is so easy to see, we don't actually have to formally apply amortized analysis to tree traversal.
Now suppose we performed an individual searches for each node (and the tree is a balanced search tree). Then the traversal would still not be O(N*N), but rather O(N log N). Suppose we have an ordered search tree which holds consecutive integers. If we increment over the integers and perform individual searches for each value, then each search is O(log N), and we end up doing N of these. In this situation, the edge traversals are no longer shared, so amortization does not apply. To reach some given node that we are searching for which is found at depth D, we have to cross D edges twice, for the sake of that node and that node alone. The next search in the loop for another integer will be completely independent of the previous one.
It may also help you to think of a linked list, which can be regarded as a very unbalanced tree. To visit all the items in a linked list of length N and return back to the head node is obviously O(N). Searching for each item individually is O(N*N), but in a traversal, we are not searching for each node individually, but using each predecessor as a springboard into finding the next node.
There is no loop to find the parent. Otherwise said, you are going through each arc between two node twice. That would be 2*number of arc = 2*(number of node -1) which is O(N).

Resources