range search complexity of R tree and R* tree - data-structures

What is the range search complexity for R tree and R* Tree? I understand the process of range search: similar to a DFS search, it visits each node and if a node's bounding box intersects the target range, then include the node in the result set. More precisely, we also need to consider the branch-and-bound strategy it uses: if a parent node doesn't intersect with the target, then we don't visit its children nodes. Then the complexity should be smaller than O(n), where n is the number of nodes. I really don't know how to calculate the number of nodes given the number of leaves(or data points).
Could anybody give me an explanation here? Thank you.

Obviously, the worst case must be at least O(n) if your range is [-∞;∞] in every dimension. It may be as bad as O(n log n) then because of the tree.
Assuming the answer is a single entry, the average case probably is O(log n) - only few paths through the tree need to be followed (if you have little enough overlap).
It is log to the base of your page size. So it will usually not exceed 5, because you never want trees with more than say 1000^5=10^15 objects.
For all practical purposes, assume the runtime complexity is simply the answer set size O(s). Select 2% of your data it takes twice as long as 1%.

Related

Ternary tree time complexity

I've an assignment to explain the time complexity of a ternary tree, and I find that info on the subject on the internet is a bit contradictory, so I was hoping I could ask here to get a better understanding.
So, with each search in the tree, we move to the left or right child a logarithmic amount of times, log3(n), with n being the amount of String in the tree, correct? And no matter what, we would also have to traverse down the middle child L number of times, where L is the length of the prefix we are searching.
Does the running time then come out to O(log3(n)+L)? I see many people simply saying that it runs in logarithmic time, but does Linear time not grow faster, and hence dominate?
Hope I'm making sense, thanks for any answers on the subject!
If the tree is balanced, then yes, any search that needs to visit only one child per iteration will run in logarithmic time.
Notice that O(log_3(n) = O(ln(n) / ln(3)) = O(ln(n) * c) = O(ln(n))
so the base of the logarithm does not matter. We say logarithmic time, O(log n).
Notice also that a balanced tree has a height of O(log(n)), where n is the number of nodes. So it looks like your L describes the height of the tree and is therefore also O(log n), so not linear w.r.t. n.
Does this answer your questions?

Complete binary tree time complexity

If someone wants to generates a complete binary tree. This tree has h levels where h can be any positive integer and as an input to the algorithm. What complexity will it lie in and why?
A complete binary tree is tree where all levels are full of nodes except the last level, we can define the time complexity in terms of upper bound.
If we know the height of the tree is h, then the maximum number of possible nodes in the tree are 2h - 1.
Therefore, time complexity = O(2h - 1).
To sell your algorithm in the market, you need tight upper bounds to prove that your algorithm is better than the others'.
A slightly tight upper bound for this problem can be defined after knowing exactly how many nodes are there in the tree. Let's say there are N.
Then, the time complexity = O(N).

Confusion related to the time complexity of an algorithm

I was going through this algorithm https://codereview.stackexchange.com/questions/63921/print-all-nodes-from-root-to-leaves
In one of the comments it is mentioned that printing the paths from the root to leaf itself has average time complexity of O(nlogn). I am not quite sure how he came up with that. Any clarification will be much appreciated.
I think this is what they mean:
in the best case, the tree is perfectly balanced, and it contains N nodes, where log(N)+1 is the number of levels. The tree has N/2 leaves.
Every time we move to a lower level, we duplicate the currently accumulated path. If you assume copying an array of length k as an O(k) operation, then when we move from the second to last level to a leaf we do an O(log(N)) operation. As there are N/2 leaves, and for each we do an O(log(N)) operation, you get O(N*log(N)).
Instead of duplicating arrays, the function could pass recursively the same array, and the current level number, making sure that the path is printed only up to the level of the leaf.

What is the space requirement of many trees?

I was asked the space requirements of my project and I wasn't sure about the answer, so I am asking here. Here is what I do:
I am building a number of perfect binary trees (let's say m).
Every leaf indexes one point (i.e. that we keep the data stored and
every tree indexes the points that are stored there).
My thought is that the space requirement is: O(n*d + m), where n is the number of points, d is the dimension of a point and m is the number of the trees, but I am telling this by experience, not sure if I fully understand it!
Can anyone help?
To be honest, every leaf contains a number of points, p, but I think that I will be able to work out the result, if I will get an answer to my question above.
In a perfect binary tree with n leaves, the total number of nodes is 2n - 1, which is O(n). More generally, if you have a collection of perfect binary trees with n total leaves, the total number of nodes will be 2n - 1. Therefore, if each of the n leaf nodes stores a d-dimensional point, the total space usage is O(nd).
The number of trees m here actually doesn't need to show up in the big-O space analysis. Think of it this way: if you have m trees, assuming each is nonempty, then you have to have at least n leaf nodes (at least one per tree), so we know that m = O(n). Therefore, even if you do account for the space overhead per tree as O(m), the total space usage of O(nd + m) is equivalent to O(nd).
Hope this helps!

Check if 2 tree nodes are related (ancestor/descendant) in O(1) with pre-processing

Check if 2 tree nodes are related (i.e. ancestor-descendant)
solve it in O(1) time, with O(N) space (N = # of nodes)
pre-processing is allowed
That's it. I'll be going to my solution (approach) below. Please stop if you want to think yourself first.
For a pre-processing I decided to do a pre-order (recursively go through the root first, then children) and give a label to each node.
Let me explain the labels in details. Each label will consist of comma-separated natural numbers like "1,2,1,4,5" - the length of this sequence equals to (the depth of the node + 1). E.g. the label of the root is "1", root's children will have labels "1,1", "1,2", "1,3" etc.. Next-level nodes will have labels like "1,1,1", "1,1,2", ..., "1,2,1", "1,2,2", ...
Assume that "the order number" of a node is the "1-based index of this node" in the children list of its parent.
Common rule: node's label consists of its parent label followed by comma and "the order number" of the node.
Thus, to answer if two nodes are related (i.e. ancestor-descendant) in O(1), I'll be checking if the label of one of them is "a prefix" of the other's label. Though I'm not sure if such labels can be considered to occupy O(N) space.
Any critics with fixes or an alternative approach is expected.
You can do it in O(n) preprocessing time, and O(n) space, with O(1) query time, if you store the preorder number and postorder number for each vertex and use this fact:
For two given nodes x and y of a tree T, x is an ancestor of y if and
only if x occurs before y in the preorder traversal of T and after y
in the post-order traversal.
(From this page: http://www.cs.arizona.edu/xiss/numbering.htm)
What you did in the worst case is Theta(d) where d is the depth of the higher node, and so is not O(1). Space is also not O(n).
if you consider a tree where a node in the tree has n/2 children (say), the running time of setting the labels will be as high as O(n*n). So this labeling scheme wont work ....
There are linear time lowest common ancestor algorithms(at least off-line). For instance have a look here. You can also have a look at tarjan's offline LCA algorithm. Please note that these articles require that you know the pairs for which you will be performing the LCA in advance. I think there are also online linear time precomputation time algorithms but they are very complex. For instance there is a linear precomputation time algorithm for the range minimum query problem. As far as I remember this solution passed through the LCA problem twice . The problem with the algorithm is that it had such a large constant that it require enormous input to be actually faster then the O(n*log(n)) algorithm.
There is much simpler approach that requires O(n*log(n)) additional memory and again answers in constant time.
Hope this helps.

Resources