Exponential Search vs Binary Search - algorithm

Does a binary search beat an exponential search in any way, except in space complexity?

Both these algorithms search for a value in an ordered list of elements, but they address different issues. Exponential search is explicitly designed for unbounded lists whereas binary search deals with bounded lists.
The idea behind exponential search is very simple: Search for a bound, and then perform a binary search.
Example
Let's take an example. A = [1, 3, 7, 8, 10, 11, 12, 15, 19, 21, 22, 23, 29, 31, 37]. This list can be seen as a binary tree (although there is no need to build the tree):
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
Binary search
A binary search for e = 27 (for example) will undergo the following steps
b0) Let T, R be the tree and its root respectively
15 (R)
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b1) Compare e to R: e > 15. Let T, R be T right subtree and its root respectively
15
____/ \____
/ \
__8__ _23_(R)
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b2) Compare e to R: e > 23. Let T, R be T right subtree and its root respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b3) Compare e to R: e < 31. Let T, R be T left subtree and its root respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31___
/ \ / \ / \ / \
1 7 10 12 19 22 29 (R) 37
b4) Compare e to R: e <> 29: the element is not in the list, since T has no subtree.
Exponential search
An exponential search for e = 27 (for example) will undergo the following steps
Let T, R be the leftmost subtree (ie the leaf 1) and its root (1) respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
(R) 1 7 10 12 19 22 29 37
e1) Compare e to R: e > 1. Let R be the parent of R and T be the tree having R as root
15
____/ \____
/ \
__8__ _23__
/ \ / \
(R) 3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e2) Compare e to R: e > 3. Let R be the parent of R and T be the tree having R as root:
15
____/ \____
/ \
(R)_8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e3) Compare e to R: e > 8. Let R be the parent of R and T be the tree having R as root:
(R) 15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e4) Compare e to R: e > 15. R has no parent. Let T be the right subtree of T and R be its root:
15
____/ \____
/ \
__8__ _23_(R)
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e5..7) See steps b2..4)
Time complexity
For the sake of demonstration, let N = 2^n be the size of A and let indices start from 1. If N is not a power of two, the results are almost the same.
Let 0 <= i <= n be the minimum so that A[2^(i-1)] < e <= A[2^i] (let A[2^-1] = -inf). Note that this kind of interval may not be unique if you have duplicate values, hence the "minimum".
Exponential search
You need i + 1 iterations to find i. (In the example, you are jumping from child to parent repeatedly until you find a parent greater than e or there is no more parent)
Then you use a binary search on the selected interval. The size of this interval is 2^i - 2^(i-1) = 2^(i-1).
The cost of a binary search in an array of size 2^k is variable: you might find the value in the first iteration, or after k iterations (There are sophisticated analysis depending on the distribution of the elements, but basically, it's between 1 and k iterations and you can't know it in advance)
Let j_i, 1 <= j_i <= i - 1 be the number of iterations needed for the binary search in our case (The size of this interval is 2^(i-1)).
Binary search
Let i be the minimum so that A[2^(i-1)] < e <= A[2^i]. Because of the assumption that N = 2^n, the binary search will meet this interval:
We start with the root A[2^(n-1)]. If e > A[2^(n-1)], i = n because R = A[2^(n-1)] < e < A[2^n]. Else, we have e <= A[2^(n-1)]. If e > A[2^(n-2)], then i = n-1, else we continue until we find i.
You need n - i + 1 steps to find i using a binary search:
if i = n, you know it at the first iteration (e > R) else, you select the left subtree
if i = n-1, you need two iterations
and so on: if i = 0, you'll need n iterations.
Then you'll need j_i iterations as shown above to complete the search.
Comparison
As you see, the j_i iterations are common to both algorithms. The question is: Is i + 1 < n - i + 1? i.e. Is i < n - i or 2i < n? If yes, the exponential search will be faster than the binary search. If no, the binary search will be faster than the exponential search (or equally fast)
Let's get some distance: 2i < n is equivalent to (2^i)^2 < 2^n or 2^i < sqrt(2^n). While 2^i < sqrt(N), the exponential search is faster. As soon as 2^i > sqrt(N), the binary search is faster. Remember that the index of e is lower or equal than 2^i because e <= A[2^i].
In simple words, if you have N elements and if e is in the firstsqrt(N) elements, then exponential search will be faster, else binary search will be faster.
It depends on the distribution, but N - sqrt(N) > sqrt(N) if N > 4, and thus the binary search is likely to be faster than the exponential search unless you know that the element will be among the first ones or the list is ridiculously short.
If 2^n < N < 2^(n+1)
I won't go into details, but this does not change the general conclusion.
If the value is beyond the last power of two, the cost of exponential to find the bound is already n+2, more than the binary search (less than or equal to 2^(n+1)). Then you have a binary search to perform, maybe in a small interval, but binary search is already the winner.
Else you add the value A[N] to the list until you have 2^(n+1) value. This won't change anything for exponential search, and this will slow down the binary search. But this slow binary search remains faster if e is not in the firstsqrt(2^(n+1)) values.
Space complexity
That's an interesting question which I don't talk about, size of the pointer and things like that. If you are performing an exponential search and consuming elements as they arrive (imagine timestamps), you don't need to store the whole list at once. You just have to store one element (the first), then one element (the second), then two elements (the third and the fourth), then four elements, ... then 2^(i-1) elements. If i is small, then you won't need to store a large list as in a regular binary search.
Implementation
Implementation is really not a problem here. See the Wikipedia pages for information: Binary search algorithm and Exponential search.
Applications and how to choose among the two
Use the exponential search only when the sequence is unbounded or when you know the value is likely to be among the first ones. Unbounded: I like the example of timestamps: they are strictly growing. You can imagine a server with stored timestamps. You can ask for n timestamps and you are looking for a specific timestamp. Ask 1, then 2, then 4, then 8,... timestamps and perform the binary search when one timestamps exceeds the value you are looking for.
In other cases, use the binary search.
Remark: the idea behind the first part of the exponential search has some applications:
Guess an integer number when the upper limit is unbounded: Try 1, 2, 4, 8, 16,... and narrow the guess when you exceed the number (this is exponential search);
Find a bridge to cross a river by a foggy day: Make 100 steps left. If you didn't find the bridge, return to the initial point and make 200 steps right. If you still didn't find the bridge, return to the initial point and make 400 steps left. Repeat until you find the bridge (or swim);
Comput a congestion window in the TCP slow start: Double the quantity of data sent until there is a congestion. The TCP congestion algorithms are in general more careful and perform something similar to a linear search in the second part of the algorithm, because exceeding tries have a cost here.

Related

Maximum depth of a min heap

Consider a min heap containing all the integers from 1 to 1023 exactly once. If root is at depth 0, the maximum depth at which 9 can appear is?
The answer to the question is 8.
But, considering that a min heap is a nearly complete BT with-
1) for d <- 0 to h-1, all levels have 2^d nodes.
2) for d <- h, nodes are filled from left.
Source:http://homepages.math.uic.edu/~leon/cs-mcs401-s08/handouts/nearly_complete.pdf
What mistake is in answer being 4,as the level order traversal would be {1,2 3,4 5 6,7 8 9...}
The min-heap requires to put elements which are greater than their parent node.
Considering the question, one can put 1 as root, then 2 as its left child and any element greater than 9 (say 512) as its right child.For 2, one can continue in this way by putting 3 as left child and, say 513 as its right child. The final min heap obtained will be -
1
/ \
/ \
2 512
/ \ /\
/ \ / \
3 513 514 515
/\ /\ /\ /\
/ \
4 516 . . . . . .
/ / \ /\ /\ /\ /\ /\ /\
5 . . .. .. .. .. .. ...
/ /\ /\ /\/\
6 . . . . ...........................
/
7 .......................................................
/
8 ......................................................
/
9
The dots denote filled levels and can be replaced by elements from [517,758], as the levels must be filled.
The depth of 9 is 8

AVL Rotation - Which node to rotate

I have read many sources about AVL trees, but did not find anyone addressing this issue: When AVL tree gets unbalanced, which node should be rotated first?
Assuming I have the tree:
10
/ \
5 25
/
20
and I'm trying to add 15, both the root and its child 25 will be unbalanced.
10
/ \
5 25
/
20
/
15
I could do a RR rotation (or single rotation) of 25, resulting in the following tree:
10
/ \
5 20
/\
15 25
or a RL rotation (double rotation) about the root, creating the following tree:
20
/ \
10 25
/ \
5 15
I am confused about which rotation is the most suitable here and in similar cases.
The RR rotation is correct here. The rotation should be done as soon (as low) as the rule is broken. Which is for 25 here.
The higher rotations first don't necessarily break the rule and secondly would become too complex although it doesn't seem so here at the first sight.

Else than backtracking, how do I find longest path in a graph?

I have a graph shaped as a triangle.
8
/ \
1 4
/ \ / \
4 2 0
/ \ / \ / \
9 1 9 4
In the above graph the longest path is {8, 4, 2, 9}
My current algorithm calculates the max number of the adjacent nodes and add it to the list, then calculates the sum of that list. This works in the above graph but won't work in situations such as this scenario:
8
/ \
0 1
/ \ / \
4 0 4
/ \ / \ / \
9 99 3 4
My algorithm will mistakenly go through {8,1,4,4} where the correct longest path is {8,0,4,99}
The only solution I can think of is Backtracking. Where I have to go through all the paths and calculate the max path, which will be insanely slow in a huge graph. This about a 100k nodes graph.
My question is can I do better than this?
Start at the top.
For each node, pick the maximum of its parents (the nodes above connected to it) and add its own value.
Then, in the last row, pick the maximum.
This will just give you the value of the longest path, but you could easily get the actual path by simply starting at the value picked at the bottom and moving upwards, always picking the greater parent.
The running time would be linear in the number of nodes.
Example:
Original:
First example: Second example:
8 8
/ \ / \
1 4 0 1
/ \ / \ / \ / \
4 2 0 4 0 4
/ \ / \ / \ / \ / \ / \
9 1 9 4 9 99 3 4
Output:
First example: Second example:
8 8
/ \ / \
9 12 8 9
/ \ / \ / \ / \
13 14 12 12 9 13
/ \ / \ / \ / \ / \ / \
22 15 23 16 21 111 16 17
Then you'd pick 23 for the first and 111 for the second.
To get the path, we'd have 23-14-12-8, which corresponds to 9-2-4-8, for the first, and 111-12-8-8, which corresponds to 99-4-0-8, for the second.
I'm of course assuming we have a tree, as stated. For general graphs, this problem is quite a lot more difficult - NP-hard, to be exact.
You do not need backtracking here - you can use breadth-first search to propagate the max for the path that you have found so far to the corresponding node, level by level.
Start at the root, and set its max to its own value.
Go through nodes level-by-level
For each node check the max stored in its parent. There may be one or two of these parents. Pick the max of two max-es, add the value of the node itself, and store it in the current node
When the path through the graph is complete, the result would look like this:
Max graph:
8
/ \
8 9
/ \ / \
12 9 13
/ \ / \ / \
21 111 16 17
To recover a path, find the max value in the bottom layer. This is the final node of your path. You can reconstruct the path from the max graph and the original by starting at the max (111), subtracting the value (99), looking for the result (111-99=12) in the max graph, and continuing to that node until you reach the top:
111 - 99 = 12 -- Take 99
12 - 4 = 8 -- Take 4
8 - 0 = 8 -- Take 0
8 is the root -- Take 8
This gives you the max path in reverse. Note that this may not be the unique path (think of a graph filled with equal values to see how there may be multiple max paths). In this case, however, any path that you would recover will satisfy the max path requirement.

Balanced Binary Search Tree for numbers

I wanted to draw a balanced binary search tree for numbers from 1 to 20.
_______10_______
/ \
___5___ 15
/ \ / \
3 8 13 18
/ \ / \ / \ / \
2 4 7 9 12 14 17 19
/ / / /
1 6 11 16
Is the above tree correct and balanced?
In answer to your original question as to whether or not you need to first calculate the height, no, you don't need to. You just have to understand that a balanced tree is one where the height difference between the tallest and shortest node is zero or one, and the simplest way to achieve this is to ensure that you always pick the midpoint of the possible list, when populating the top node in a sub-tree.
Your sample tree is balanced since all leaf nodes are either at the bottom or next-to-bottom level, hence the difference in heights between any two leaf nodes is at most one.
To create a balanced tree from the numbers 1 through 20 inclusive, you can just make the root entry 10 or 11 (the midpoint being 10.5 for those numbers), so that there's an equal quantity of numbers in either sub-tree.
Then just do that recursively for each sub-tree. On the lower side of 10, 5 is the midpoint:
10
/ \
5 11-thru-19 sub-tree
/ \
1-thru-4 6-thru-9
sub-tree sub-tree
Just expand on that and you'll end up with something like:
_______10_______
/ \
___5___ 15
/ \ / \
2 7 13 17
/ \ / \ / / \
1 3 6 8 11 16 18 <- depth of highest leaf node
\ \ \ \
4 9 12 19 <- depth of lowest leaf node
^
|
Difference is 1
The midpoint can be found at the number where the difference between quantities above and below that numbers is one or zero. For the whole list of numbers 1 through 20 inclusive, there are nine less than 10 and ten greater than 10 (or, if you chose 11 as the midpoint, the quantities are ten and nine).
The difference between your sample and mine is probably to do with the fact that I preferred to pick the midpoint by rounding down where there was a choice (meaning my right sub-trees tend to be "heavier"). Because your left sub-trees are heavier, you appear to have rounded up.
After choosing 10 as the initial midpoint, there's no leeway on the left sub-tree, you have to choose 5 since it has four above and below it. Any other midpoint would result in a difference of at least two between the two halves (for example, choosing 4 as the midpoint would have the two halves of size three and five). This can still give you a balanced sub-tree depending on the data but it's "safer" to choose the midpoint.

High and Low bits in van Emde Boas Tree

I was trying to understand the concept of vEB tree.
In an example:
I assumed a universe set U = {0, 1, 2, 3 ..... 8}. So the size is 9.
Now lets take a subset S = {0, 1, 3, 4, 6, 7}.
For an operation FindSuccessor (3, S); where I need to know the smallest element > 3 in subset S, I need to know the high and low bits of my element i.e. 3.
One explanation says its the first half and second half bits, giving the result 00 and 11 as high and low respectively.
Another says:
high = Floor [element/sqrt(|U|)] = Floor [3/ sqrt (9)] = Floor [1] = 1;
low = element % sqrt(|U|) = 3 % sqrt (9) = 0;
Please explain where am I going wrong?
You're not going wrong—the explanations are for two slightly different data structures that coincide only when |U| is a square power of two. At a high level, we're trying to divide a key k into two halves, each with about √|U| possibilities. The first method achieves this goal directly; the second is an approximation that runs faster on commodity hardware (assuming |U| is a power of two, the worst case is when |U| is not square and the first half has twice as many possibilities as the second). Pick one method and stick with it.
Here's an example of FindSuccessor(3, S). For simplicity, I'm going to bottom out the recursion at three elements.
The tree looks like
min=0| aux
max=7|------->min=0|
/ | \ max=2|
/ | \ /|\
/ | \ 0 1 2
/ | \
v v v
min=0| min=3| min=6|
max=1| max=4| max=7|
/| /| /|
0 1 3 4 6 7
At the root, we split 3 = (1, 0) and check whether the 1th (middle) child has max > 3. It does, so we descend there and use brute force to compute the answer, 4. (Of course, if the tree had more than two levels, we would search recursively.)
A more interesting case is when S = {0, 1, 3, 6, 7}.
min=0| aux
max=7|------->min=0|
/ | \ max=2|
/ | \ /|\
/ | \ 0 1 2
/ | \
v v v
min=0| min=3| min=6|
max=1| max=3| max=7|
/| / /|
0 1 3 6 7
Here, we examine the 1th subtree of the root, {3}, and find that its max is not greater than 3. We find the successor of 1 in the aux data structure, which is 2, and return the min of the 2th subtree, which is 6.

Resources