What is the space requirement of many trees? - data-structures

I was asked the space requirements of my project and I wasn't sure about the answer, so I am asking here. Here is what I do:
I am building a number of perfect binary trees (let's say m).
Every leaf indexes one point (i.e. that we keep the data stored and
every tree indexes the points that are stored there).
My thought is that the space requirement is: O(n*d + m), where n is the number of points, d is the dimension of a point and m is the number of the trees, but I am telling this by experience, not sure if I fully understand it!
Can anyone help?
To be honest, every leaf contains a number of points, p, but I think that I will be able to work out the result, if I will get an answer to my question above.

In a perfect binary tree with n leaves, the total number of nodes is 2n - 1, which is O(n). More generally, if you have a collection of perfect binary trees with n total leaves, the total number of nodes will be 2n - 1. Therefore, if each of the n leaf nodes stores a d-dimensional point, the total space usage is O(nd).
The number of trees m here actually doesn't need to show up in the big-O space analysis. Think of it this way: if you have m trees, assuming each is nonempty, then you have to have at least n leaf nodes (at least one per tree), so we know that m = O(n). Therefore, even if you do account for the space overhead per tree as O(m), the total space usage of O(nd + m) is equivalent to O(nd).
Hope this helps!

Related

range search complexity of R tree and R* tree

What is the range search complexity for R tree and R* Tree? I understand the process of range search: similar to a DFS search, it visits each node and if a node's bounding box intersects the target range, then include the node in the result set. More precisely, we also need to consider the branch-and-bound strategy it uses: if a parent node doesn't intersect with the target, then we don't visit its children nodes. Then the complexity should be smaller than O(n), where n is the number of nodes. I really don't know how to calculate the number of nodes given the number of leaves(or data points).
Could anybody give me an explanation here? Thank you.
Obviously, the worst case must be at least O(n) if your range is [-∞;∞] in every dimension. It may be as bad as O(n log n) then because of the tree.
Assuming the answer is a single entry, the average case probably is O(log n) - only few paths through the tree need to be followed (if you have little enough overlap).
It is log to the base of your page size. So it will usually not exceed 5, because you never want trees with more than say 1000^5=10^15 objects.
For all practical purposes, assume the runtime complexity is simply the answer set size O(s). Select 2% of your data it takes twice as long as 1%.

Why do Fibonacci heaps need cascading cuts?

I'm learning f-heap from 'introduction to algorithms', and the 'decrease-key' operation really confuses me -- why does this need a 'cascading-cut' ?
if this operation is removed:
the cost of make-heap(), insert(), minimum() and union() remains obviously unchanged
extract-min() is still O(D(n)), because in the O(n(H)) 'consolidate' operation, the cost of most rooted trees can be payed when they were added to the root list, and the remaining costs O(D(n))
decrease-key(): obviously O(1)
As for D(n), though i can't explain it precisely, i think it is still O(lgn), cuz without 'cascading-cut', a node may just be moved to root list a bit later, and any node hides under its father does not affect the efficiency. At least, this will not make the situation worse.
apologize for my poor english :(
anyone can help?
thanks
The reason for the cascading cut is to keep D(n) low. It turns out that if you allow any number of nodes to be cut from a tree, then D(n) can grow to be linear, which makes delete and delete-min take linear time.
Intuitively, you want the number of nodes in a tree of order k to be exponential in k. That way, you can have only logarithmically many trees in a consolidated heap. If you can cut an arbitrary number of nodes from a tree, you lose this guarantee. Specifically, you could take a tree of order k, then cut all of its grandchildren. This leaves a tree with k children, each of which are leaves. Consequently, you can create trees of order k with just k + 1 total nodes in them. This means that in the worst case you would need a tree of order n - 1 to hold all the nodes, so D(n) becomes n - 1 rather than O(log n).
Hope this helps!

Why lookup in a Binary Search Tree is O(log(n))?

I can see how, when looking up a value in a BST we leave half the tree everytime we compare a node with the value we are looking for.
However I fail to see why the time complexity is O(log(n)). So, my question is:
If we have a tree of N elements, why the time complexity of looking up the tree and check if a particular value exists is O(log(n)), how do we get that?
Your question seems to be well answered here but to summarise in relation to your specific question it might be better to think of it in reverse; "what happens to the BST solution time as the number of nodes goes up"?
Essentially, in a BST every time you double the number of nodes you only increase the number of steps to solution by one. To extend this, four times the nodes gives two extra steps. Eight times the nodes gives three extra steps. Sixteen times the nodes gives four extra steps. And so on.
The base 2 log of the first number in these pairs is the second number in these pairs. It's base 2 log because this is a binary search (you halve the problem space each step).
For me the easiest way was to look at a graph of log2(n), where n is the number of nodes in the binary tree. As a table this looks like:
log2(n) = d
log2(1) = 0
log2(2) = 1
log2(4) = 2
log2(8) = 3
log2(16)= 4
log2(32)= 5
log2(64)= 6
and then I draw a little binary tree, this one goes from depth d=0 to d=3:
d=0 O
/ \
d=1 R B
/\ /\
d=2 R B R B
/\ /\ /\ /\
d=3 R B RB RB R B
So as the number of nodes, n, in the tree effectively doubles (e.g. n increases by 8 as it goes from 7 to 15 (which is almost a doubling) when the depth d goes from d=2 to d=3, increasing by 1.) So the additional amount of processing required (or time required) increases by only 1 additional computation (or iteration), because the amount of processing is related to d.
We can see that we go down only 1 additional level of depth d, from d=2 to d=3, to find the node we want out of all the nodes n, after doubling the number of nodes. This is true because we've now searched the whole tree, well, the half of it that we needed to search to find the node we wanted.
We can write this as d = log2(n), where d tells us how much computation (how many iterations) we need to do (on average) to reach any node in the tree, when there are n nodes in the tree.
This can be shown mathematically very easily.
Before I present that, let me clarify something. The complexity of lookup or find in a balanced binary search tree is O(log(n)). For a binary search tree in general, it is O(n). I'll show both below.
In a balanced binary search tree, in the worst case, the value I am looking for is in the leaf of the tree. I'll basically traverse from root to the leaf, by looking at each layer of the tree only once -due to the ordered structure of BSTs. Therefore, the number of searches I need to do is number of layers of the tree. Hence the problem boils down to finding a closed-form expression for the number of layers of a tree with n nodes.
This is where we'll do a simple induction. A tree with only 1 layer has only 1 node. A tree of 2 layers has 1+2 nodes. 3 layers 1+2+4 nodes etc. The pattern is clear: A tree with k layers has exactly
n=2^0+2^1+...+2^{k-1}
nodes. This is a geometric series, which implies
n=2^k-1,
equivalently:
k = log(n+1)
We know that big-oh is interested in large values of n, hence constants are irrelevant. Hence the O(log(n)) complexity.
I'll give another -much shorter- way to show the same result. Since while looking for a value we constantly split the tree into two halves, and we have to do this k times, where k is number of layers, the following is true:
(n+1)/2^k = 1,
which implies the exact same result. You have to convince yourself about where that +1 in n+1 is coming from, but it is okay even if you don't pay attention to it, since we are talking about large values of n.
Now let's discuss the general binary search tree. In the worst case, it is perfectly unbalanced, meaning all of its nodes has only one child (and it becomes a linked list) See e.g. https://www.cs.auckland.ac.nz/~jmor159/PLDS210/niemann/s_fig33.gif
In this case, to find the value in the leaf, I need to iterate on all nodes, hence O(n).
A final note is that these complexities hold true for not only find, but also insert and delete operations.
(I'll edit my equations with better-looking Latex math styling when I reach 10 rep points. SO won't let me right now.)
Whenever you see a runtime that has an O(log n) factor in it, there's a very good chance that you're looking at something of the form "keep dividing the size of some object by a constant." So probably the best way to think about this question is - as you're doing lookups in a binary search tree, what exactly is it that's getting cut down by a constant factor, and what exactly is that constant?
For starters, let's imagine that you have a perfectly balanced binary tree, something that looks like this:
*
/ \
* *
/ \ / \
* * * *
/ \ / \ / \ / \
* * * * * * * *
At each point in doing the search, you look at the current node. If it's the one you're looking for, great! You're totally done. On the other hand, if it isn't, then you either descend into the left subtree or the right subtree and then repeat this process.
If you walk into one of the two subtrees, you're essentially saying "I don't care at all about what's in that other subtree." You're throwing all the nodes in it away. And how many nodes are in there? Well, with a quick visual inspection - ideally one followed up with some nice math - you'll see that you're tossing out about half the nodes in the tree.
This means that at each step in a lookup, you either (1) find the node that you're looking for, or (2) toss out half the nodes in the tree. Since you're doing a constant amount of work at each step, you're looking at the hallmark behavior of O(log n) behavior - the work drops by a constant factor at each step, and so it can only do so logarithmically many times.
Now of course, not all trees look like this. AVL trees have the fun property that each time you descend down into a subtree, you throw away roughly a golden ratio fraction of the total nodes. This therefore guarantees you can only take logarithmically many steps before you run out of nodes - hence the O(log n) height. In a red/black tree, each step throws away (roughly) a quarter of the total nodes, and since you're shrinking by a constant factor you again get the O(log n) lookup time you'd like. The very fun scapegoat tree has a tuneable parameter that's used to determine how tightly balanced it is, but again you can show that every step you take throws away some constant factor based on this tuneable parameter, giving O(log n) lookups.
However, this analysis breaks down for imbalanced trees. If you have a purely degenerate tree - one where every node has exactly one child - then every step down the tree that you take only tosses away a single node, not a constant fraction. That means that the lookup time gets up to O(n) in the worst case, since the number of times you can subtract a constant from n is O(n).
If we have a tree of N elements, why the time complexity of looking up
the tree and check if a particular value exists is O(log(n)), how do
we get that?
That's not true. By default, a lookup in a Binary Search Tree is not O(log(n)), where n is a number of nodes. In the worst case, it can become O(n). For instance, if we insert values of the following sequence n, n - 1, ..., 1 (in the same order), then the tree will be represented as below:
n
/
n - 1
/
n - 2
/
...
1
A lookup for a node with value 1 has O(n) time complexity.
To make a lookup more efficient, the tree must be balanced so that its maximum height is proportional to log(n). In such case, the time complexity of lookup is O(log(n)) because finding any leaf is bounded by log(n) operations.
But again, not every Binary Search Tree is a Balanced Binary Search Tree. You must balance it to guarantee the O(log(n)) time complexity.

Time complexity, binary (search) tree

assume I have a complete binary tree up-to a certain depth d. What would the time complexity be to traverse (pre-order traversal) this tree.
I am confused because I know that the amount of nodes in the tree is 2^d, so therefore the time complexity would be BigO(2^d) ? because the tree is growing exponentially.
But, upon research on the internet, Everyone states that's traversal is BigO(n) where n is the number of elements (which would be 2^d in this case), not BigO(2^d), what am I missing?
thanks
n is defined as the number of nodes.
2^d is only the number of nodes when every possible node at that depth is full
ie.
o
/ \
o o
/ \
o o
only has 5 nodes when 2^d is 8
A complete binary tree has every node filled except for last row and all of the nodes are filled to the left. You can find the definition on wikipedia
http://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_trees
Even if you can express the time complexity as O(2^d), that's pretty useless as it's not something that you can use to compare it to the time complexity of any other collection.
Expressing the time complexity as O(n) is on the other hand very useful. It tells you exactly how the collection reacts when you increase the number of items, without having to know exactly how the collection is implement, and you can compare it to the time complexity of other collections.

Check if 2 tree nodes are related (ancestor/descendant) in O(1) with pre-processing

Check if 2 tree nodes are related (i.e. ancestor-descendant)
solve it in O(1) time, with O(N) space (N = # of nodes)
pre-processing is allowed
That's it. I'll be going to my solution (approach) below. Please stop if you want to think yourself first.
For a pre-processing I decided to do a pre-order (recursively go through the root first, then children) and give a label to each node.
Let me explain the labels in details. Each label will consist of comma-separated natural numbers like "1,2,1,4,5" - the length of this sequence equals to (the depth of the node + 1). E.g. the label of the root is "1", root's children will have labels "1,1", "1,2", "1,3" etc.. Next-level nodes will have labels like "1,1,1", "1,1,2", ..., "1,2,1", "1,2,2", ...
Assume that "the order number" of a node is the "1-based index of this node" in the children list of its parent.
Common rule: node's label consists of its parent label followed by comma and "the order number" of the node.
Thus, to answer if two nodes are related (i.e. ancestor-descendant) in O(1), I'll be checking if the label of one of them is "a prefix" of the other's label. Though I'm not sure if such labels can be considered to occupy O(N) space.
Any critics with fixes or an alternative approach is expected.
You can do it in O(n) preprocessing time, and O(n) space, with O(1) query time, if you store the preorder number and postorder number for each vertex and use this fact:
For two given nodes x and y of a tree T, x is an ancestor of y if and
only if x occurs before y in the preorder traversal of T and after y
in the post-order traversal.
(From this page: http://www.cs.arizona.edu/xiss/numbering.htm)
What you did in the worst case is Theta(d) where d is the depth of the higher node, and so is not O(1). Space is also not O(n).
if you consider a tree where a node in the tree has n/2 children (say), the running time of setting the labels will be as high as O(n*n). So this labeling scheme wont work ....
There are linear time lowest common ancestor algorithms(at least off-line). For instance have a look here. You can also have a look at tarjan's offline LCA algorithm. Please note that these articles require that you know the pairs for which you will be performing the LCA in advance. I think there are also online linear time precomputation time algorithms but they are very complex. For instance there is a linear precomputation time algorithm for the range minimum query problem. As far as I remember this solution passed through the LCA problem twice . The problem with the algorithm is that it had such a large constant that it require enormous input to be actually faster then the O(n*log(n)) algorithm.
There is much simpler approach that requires O(n*log(n)) additional memory and again answers in constant time.
Hope this helps.

Resources