I came up with a problem o detecting "mutations" between two directed trees.
Example:
tree1:
A
/ \
B C - D
/ \ / \ \
G A 2 A 3
| \ |\
1 3 2 3
tree2:
A
/ \
B C - F
/ \ / \
G A 2 3
| \ |\
1 3 2 3
The algorithm should find that there is a mutation with
R
|
C - D
|\ \
X Y Z
Subsituted with
R
|
C - D
| \
X Z
Where R, Y and Z are the respective values
I am looking for any ideas, which might be:
link to algorithm or book with some algorithms
pseaudocode
code in any language (preferably python)
library in any language (preferably Python)
Have you looked at any tree difference problems?
Most tree diff problems produce a list of changes (e.g. insertion, deletion, moving, and relabelling of nodes) rather than a template subtree, but they might give you a starting place.
Why does AA tree do the operation first skew and then split? What is the reason for this and why shouldn't the balancing functions be called on the contrary?
Consider this sub-tree:
|
v
L<-T->R
/ \ / \
A B C D
If you apply skew first, and then split, you will get a legal tree.
|
v
T
/ \
L R
/ \ / \
A B C D
If you apply split first, then skew, you will get an illegal tree:
|
v
L->T->R
/ / / \
A B C D
Does a binary search beat an exponential search in any way, except in space complexity?
Both these algorithms search for a value in an ordered list of elements, but they address different issues. Exponential search is explicitly designed for unbounded lists whereas binary search deals with bounded lists.
The idea behind exponential search is very simple: Search for a bound, and then perform a binary search.
Example
Let's take an example. A = [1, 3, 7, 8, 10, 11, 12, 15, 19, 21, 22, 23, 29, 31, 37]. This list can be seen as a binary tree (although there is no need to build the tree):
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
Binary search
A binary search for e = 27 (for example) will undergo the following steps
b0) Let T, R be the tree and its root respectively
15 (R)
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b1) Compare e to R: e > 15. Let T, R be T right subtree and its root respectively
15
____/ \____
/ \
__8__ _23_(R)
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b2) Compare e to R: e > 23. Let T, R be T right subtree and its root respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
b3) Compare e to R: e < 31. Let T, R be T left subtree and its root respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31___
/ \ / \ / \ / \
1 7 10 12 19 22 29 (R) 37
b4) Compare e to R: e <> 29: the element is not in the list, since T has no subtree.
Exponential search
An exponential search for e = 27 (for example) will undergo the following steps
Let T, R be the leftmost subtree (ie the leaf 1) and its root (1) respectively
15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31
/ \ / \ / \ / \
(R) 1 7 10 12 19 22 29 37
e1) Compare e to R: e > 1. Let R be the parent of R and T be the tree having R as root
15
____/ \____
/ \
__8__ _23__
/ \ / \
(R) 3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e2) Compare e to R: e > 3. Let R be the parent of R and T be the tree having R as root:
15
____/ \____
/ \
(R)_8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e3) Compare e to R: e > 8. Let R be the parent of R and T be the tree having R as root:
(R) 15
____/ \____
/ \
__8__ _23__
/ \ / \
3 11 21 31 (R)
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e4) Compare e to R: e > 15. R has no parent. Let T be the right subtree of T and R be its root:
15
____/ \____
/ \
__8__ _23_(R)
/ \ / \
3 11 21 31
/ \ / \ / \ / \
1 7 10 12 19 22 29 37
e5..7) See steps b2..4)
Time complexity
For the sake of demonstration, let N = 2^n be the size of A and let indices start from 1. If N is not a power of two, the results are almost the same.
Let 0 <= i <= n be the minimum so that A[2^(i-1)] < e <= A[2^i] (let A[2^-1] = -inf). Note that this kind of interval may not be unique if you have duplicate values, hence the "minimum".
Exponential search
You need i + 1 iterations to find i. (In the example, you are jumping from child to parent repeatedly until you find a parent greater than e or there is no more parent)
Then you use a binary search on the selected interval. The size of this interval is 2^i - 2^(i-1) = 2^(i-1).
The cost of a binary search in an array of size 2^k is variable: you might find the value in the first iteration, or after k iterations (There are sophisticated analysis depending on the distribution of the elements, but basically, it's between 1 and k iterations and you can't know it in advance)
Let j_i, 1 <= j_i <= i - 1 be the number of iterations needed for the binary search in our case (The size of this interval is 2^(i-1)).
Binary search
Let i be the minimum so that A[2^(i-1)] < e <= A[2^i]. Because of the assumption that N = 2^n, the binary search will meet this interval:
We start with the root A[2^(n-1)]. If e > A[2^(n-1)], i = n because R = A[2^(n-1)] < e < A[2^n]. Else, we have e <= A[2^(n-1)]. If e > A[2^(n-2)], then i = n-1, else we continue until we find i.
You need n - i + 1 steps to find i using a binary search:
if i = n, you know it at the first iteration (e > R) else, you select the left subtree
if i = n-1, you need two iterations
and so on: if i = 0, you'll need n iterations.
Then you'll need j_i iterations as shown above to complete the search.
Comparison
As you see, the j_i iterations are common to both algorithms. The question is: Is i + 1 < n - i + 1? i.e. Is i < n - i or 2i < n? If yes, the exponential search will be faster than the binary search. If no, the binary search will be faster than the exponential search (or equally fast)
Let's get some distance: 2i < n is equivalent to (2^i)^2 < 2^n or 2^i < sqrt(2^n). While 2^i < sqrt(N), the exponential search is faster. As soon as 2^i > sqrt(N), the binary search is faster. Remember that the index of e is lower or equal than 2^i because e <= A[2^i].
In simple words, if you have N elements and if e is in the firstsqrt(N) elements, then exponential search will be faster, else binary search will be faster.
It depends on the distribution, but N - sqrt(N) > sqrt(N) if N > 4, and thus the binary search is likely to be faster than the exponential search unless you know that the element will be among the first ones or the list is ridiculously short.
If 2^n < N < 2^(n+1)
I won't go into details, but this does not change the general conclusion.
If the value is beyond the last power of two, the cost of exponential to find the bound is already n+2, more than the binary search (less than or equal to 2^(n+1)). Then you have a binary search to perform, maybe in a small interval, but binary search is already the winner.
Else you add the value A[N] to the list until you have 2^(n+1) value. This won't change anything for exponential search, and this will slow down the binary search. But this slow binary search remains faster if e is not in the firstsqrt(2^(n+1)) values.
Space complexity
That's an interesting question which I don't talk about, size of the pointer and things like that. If you are performing an exponential search and consuming elements as they arrive (imagine timestamps), you don't need to store the whole list at once. You just have to store one element (the first), then one element (the second), then two elements (the third and the fourth), then four elements, ... then 2^(i-1) elements. If i is small, then you won't need to store a large list as in a regular binary search.
Implementation
Implementation is really not a problem here. See the Wikipedia pages for information: Binary search algorithm and Exponential search.
Applications and how to choose among the two
Use the exponential search only when the sequence is unbounded or when you know the value is likely to be among the first ones. Unbounded: I like the example of timestamps: they are strictly growing. You can imagine a server with stored timestamps. You can ask for n timestamps and you are looking for a specific timestamp. Ask 1, then 2, then 4, then 8,... timestamps and perform the binary search when one timestamps exceeds the value you are looking for.
In other cases, use the binary search.
Remark: the idea behind the first part of the exponential search has some applications:
Guess an integer number when the upper limit is unbounded: Try 1, 2, 4, 8, 16,... and narrow the guess when you exceed the number (this is exponential search);
Find a bridge to cross a river by a foggy day: Make 100 steps left. If you didn't find the bridge, return to the initial point and make 200 steps right. If you still didn't find the bridge, return to the initial point and make 400 steps left. Repeat until you find the bridge (or swim);
Comput a congestion window in the TCP slow start: Double the quantity of data sent until there is a congestion. The TCP congestion algorithms are in general more careful and perform something similar to a linear search in the second part of the algorithm, because exceeding tries have a cost here.
I have a connected undirected graph having n nodes. Given two nodes, I want to find the minimum number of edges that would have to be removed in order to ensure that there's only one cycle-free path between those two nodes.
For example, if this is the graph:
1------------2------------5
| |
| |
3-------------------------4
then given the nodes 1 and 5, the answer will be 1: just remove (for example) the edge between node 3 and node 4.
The brute-force approach is, for each subset of the set of edges, to try removing those edges and test if there's a unique cycle-free path between the two nodes of interest.
Is there a more efficient approach? (I Googled it, but did not find anything relevant.)
(Dear cryptomanic, I added these examples to help in the discussion about the exact requirements; please edit this part and indicate which of these solutions are valid. m69)
Input graph: (going from X to Y)
O---O---O---O O
/ \ / \ / \
O---O---X O Y---O---O
\ /
O---O---O---O
/ \ \
O---O O
Solution A: (no cycles inbetween X and Y)
O---O---O---O O
/ / \ / \
O---O---X O Y---O---O
/
O---O---O---O
/ \ \
O---O O
Solution B: (no side-paths inbetween X and Y)
O---O---O---O O
/ \ / \
O---O---X O Y---O---O
/
O---O---O---O
/ \ \
O---O O
Solution C: (no cycles connected to X and Y)
O---O---O---O O
/ / \ \
O---O---X O Y---O---O
/
O---O---O O
/ \ \
O---O O
Solution D: (completely isolate path from X to Y)
O---O---O---O O
/ \ / \
O---O X O Y O---O
O---O---O---O
/ \ \
O---O O
Solution E: (P can only be used once, so P-Q-R-P is not part of an alternative path)
O---O---O---O O
\ / / \
O---O---X O Y O---O
\ /
O---P---O---O
/ \ \
Q---R O
Solution F:
O---O---O---O O
\ / \ / \
O---O---X O Y---O---O
\ /
O---O---O---O
/ \ \
O---O O
hey i have a questions on my homework and i am being able to solve it i just want someone to see if i am doing right or wrong...
A b-tree with minimum branching factor of t=3
[D][G][K][N][V]
/ / / | \ \
/ / / | \ \
/ / / | \ \
AC EF HI LM OPRST WX
Now when i insert J in above tree this is the output i am getting....
[K]
/ \
/ \
/ \
[D][G] [N][V]
/ / / / \ \
/ / / / \ \
/ / / / \ \
AC EF HIJ LM OPRST WX
After Inserting Q in above tree this is the Final tree i am getting.
[K]
/ \
/ \
/ \
[D][G] [N][Q][V]
/ / / / / \ \
/ / / / / \ \
/ / / / / \ \
AC EF HIJ LM OP RST WX
Is this the Final Tree Correct?
No, the final B tree is not correct. The intermediate one is though. The last one should be like this
[K]
/ \
/ \
/ \
[D][G] [N][R][V]
/ / / / / \ \
/ / / / / \ \
/ / / / / \ \
AC EF HIJ LM OPQ ST WX
You missed something very important. In a B-tree, insertions are only done in the leaf node and every full node on the way is split. You inserted Q in a level 2 node in your final tree.
Edit: I think you are confused about the insertion algorithm. Insertions only take place in the leaf node. In the downward path from root to leaf, if any full node is encountered it is split first. If the leaf node is full, it will be split first and then the key will be inserted. In your case the leaf node OPRST will be split when it is encountered because it has 5 nodes and is full. Thus R will be moved up and and a new leaf node containing keys ST will be created. The older leaf node now will only have OP keys. Q is then compared with R and search moves leftward to OP node where Q finally gets inserted.
If the branching factor is 3, doesn't that mean the minimum number of keys in non-root node? How can the initial tree be correct?
Initial state would be:
βββ E, I, N, S
βββ A, C, D
βββ F, G, H
βββ K, L, M
βββ O, P, R
βββ T, V, W, X