Extending a expression tree evaluation algorithm - algorithm

This is a recursive algorithm that I came up with. I've seen examples of algorithms that are similar to this in books.
f(n)
if n is an integer
return n
else
l = left child of n
r = right child of n
return f(l) n f(r)
It can be used to evaluate expression trees like the one shown on the left in the above image in Θ(n) time. As long as this is all correct, I want to extend it to evaluate expression trees like the one on the right, where common subexpressions are de-duplicated. I think the algorithm can evaluate these types of trees correctly, but I am not sure of the time that it will take. Perhaps some method of storing subtrees should be used? Such as:
f(n, A)
if n is an integer
return node
else
if n has more than 1 parent AND n is in A (A is a list of stored subtrees)
return n from A
else
l = left child of n
r = right child of n
s = f(l, A) n f(r, A)
add s to list A
return s
Is this extension correct? It seems really messy. Also I have a feeling it would run in O(n2) time because the function would be called on n nodes and during each call it would have to iterate over a list of stored nodes that has an upper bound of n. Can this be done in better than quadratic time?

Processing a DAG should work fine if you store the result of a subgraph evaluation at the operator node upon the first visit. Any subsequent visit of that node would not trigger a recursive call to evaluate the subexpression but simply return the stored value. The technique is called 'memoization'. The run time is basically the number of edges in the graph, assuming all operator evaluations are O(1).
Pseudocode:
f(n)
if n is an integer
return n
else
if property evalResult of n is defined
return property evalResult of n
else
l = left successor of n
r = right successor of n
s = f(l) n f(r)
set property evalResult of n to s
return s

Related

Why solving a range minimum query with segment tree time complexity is O(Log n)?

i was trying to solve how to find in a given array and two indexes the minimum value between these two indexes in O(Log(n)).
i saw the solution of using a segment-tree but couldn't understand why the time complexity for this solution is O(Logn) because it doesnt seems like this because if your range is not exactly within the nod's range you need to start spliting the search.
First proof:
The claim is that there are at most 2 nodes which are expanded at each level. We will prove this by contradiction.
Consider the segment tree given below.
Let's say that there are 3 nodes that are expanded in this tree. This means that the range is from the left most colored node to the right most colored node. But notice that if the range extends to the right most node, then the full range of the middle node is covered. Thus, this node will immediately return the value and won't be expanded. Thus, we prove that at each level, we expand at most 2 nodes and since there are logn levels, the nodes that are expanded are 2⋅logn=Θ(logn).
Source
Second proof:
There are four cases when query the interval (x,y)
FIND(R,x,y) //R is the node
% Case 1
if R.first = x and R.last = y
return {R}
% Case 2
if y <= R.middle
return FIND(R.leftChild, x, y)
% Case 3
if x >= R.middle + 1
return FIND(R.rightChild, x, y)
% Case 4
P = FIND(R.leftChild, x, R.middle)
Q = FIND(R.rightChild, R.middle + 1, y)
return P union Q.
Intuitively, first three cases reduce the level of tree height by 1, since the tree has height log n, if only first three cases happen, the running time is O(log n).
For the last case, FIND() divide the problem into two subproblems. However, we assert that this can only happen at most once. After we called FIND(R.leftChild, x, R.middle), we are querying R.leftChild for the interval [x, R.middle]. R.middle is the same as R.leftChild.last. If x > R.leftChild.middle, then it is Case 1; if x <= R.leftChild, then we will call
FIND ( R.leftChild.leftChild, x, R.leftChild.middle );
FIND ( R.leftChild.rightChild, R.leftChild.middle + 1, , R.leftChild.last );
However, the second FIND() returns R.leftChild.rightChild.sum and therefore takes constant time, and the problem will not be separate into two subproblems (strictly speaking, the problem is separated, though one subproblem takes O(1) time to solve).
Since the same analysis holds on the rightChild of R, we conclude that after case4 happens the first time, the running time T(h) (h is the remaining level of the tree) would be
T(h) <= T(h-1) + c (c is a constant)
T(1) = c
which yields:
T(h) <= c * h = O(h) = O(log n) (since h is the height of the tree)
Hence we end the proof.
Source

asymptotic running time of printing all keys in a red-black tree that fall in a certain range

I'm working on exercise 14.2-4 of CLRS (Intro to Algorithms 3ed):
We wish to augment red-black trees with an operation RB-ENUMERATE(x, a, b) that outputs all the keys k such that a ≤ k ≤ b in a red-black tree rooted at x. Describe how to implement RB-ENUMERATE in Θ(m + lg n) time, where m is the number of keys that are output and n is the number of internal nodes in the tree. (Hint, you do not need to add new attributes to the red-black tree.)
I found an algorithm in a solution online that seems to do the job well, but I can't tell whether the complexity is really Θ(m + lg n).
RB-ENUMERATE(x, a, b)
T = red-black tree that x belongs in
nil = T.nil // sentinel NIL leaf node
if a <= x.key <= b
print(x)
if a <= x.key and x.left != nil
RB-ENUMERATE(x.left, a, b)
if x.key <= b and x.right != nil
RB-ENUMERATE(x.right, a, b)
Is this recursive algorithm really Θ(m + lg n) running time, or does this algorithm not satisfy that requirement? I see where the lg n comes from, but I'm worried about the case of m = Θ(lg n), but the running time being asymptotically more than lg n.
In particular, in each call of RB-ENUMERATE, there is either 2 recursive calls, which occurs if x falls in the interval, or 1 recursive call, which occurs if x does not fall in the interval. Hence, there are exactly m "instances" of RB-ENUMERATE which make 2 recursive calls, but the number that make 1 recursive call is unclear. What if all m of the "2-recursive" calls occur at the upper levels of the recursion tree, and they all propagate all the way down to the bottom of the red-black tree? In that case, would it not be Θ(m lg n) running time?
Even if a node is within the interval there can be 0, 1 or 2 recursive calls. A black node can have a red node on one side but a nil node on the other (and it doesn't matter which side is which). A red node will have two black children. which will match in being nil or not.
It takes up to lg(n) operations to figure out where to start printing, it then takes m operations to print the nodes of interest and then stops. The algorithm can actually do better than a strict lg(n) because it might not have to walk all the way down the tree before finding the prune point.

Time complexity of a recursive function where n size reduces randomly

I created the following pseudocode but I am not sure how to calculate it's complexity:
(Pseudocode)
MyFunction(Q, L)
if (Q = empty) return
M = empty queue
NM = empty queue
M.Enqueue(Q.Dequeue)
while (Q is not empty)
pt = Q.Dequeue()
if (pt.y > M.peek().y) M.Enqueue(pt)
else NM.Enqueue(pt)
L.add(M)
if (NM is not empty) MyFunction(NM, L)
return L;
MyFunction receives a set Q of n points and a list L in which we will save k subsets of Q (1<=k<=n). When we calculate the first subset we go through all the n points of Q and select the ones that belong to the first subset. For the second subset we go through all the n points of Q except those that are already in the first subset and so on.
So, every recursive call the number of points will be reduced by an integer x until the number of points is 0. This integer x can be different from one recursive call to the other (it can be any value between 1 and n (n being the current number of points))
What would be the complexity of my algorithm then?
I was thinking that my recurrence relation would be something like this:
T(0) = 1
T(n) = T(n-x) + an
Is this correct? and if so how can I solve it?
Without any information on the distribution of points in Q, we can not know how they will be dispatched to M or NM queues.
However, it is easy to calculate the worst-case complexity of your algorithm. To calculate this, we assume that at each recursive call, all points in Q will end up in NM except the one that is being added to M before entering the loop. With this assumption, x becomes 1 in your recurrence relation. And you end up having O(n^2).

Complexity of edit distance (Levenshtein distance) recursion top down implementation

I have been working all day with a problem which I can't seem to get a handle on. The task is to show that a recursive implementation of edit distance has the time complexity Ω(2max(n,m)) where n & m are the length of the words being measured.
The implementation is comparable to this small python example
def lev(a, b):
if("" == a):
return len(b) # returns if a is an empty string
if("" == b):
return len(a) # returns if b is an empty string
return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1)
From: http://www.clear.rice.edu/comp130/12spring/editdist/
I have tried drawing trees of the recursion depth for different short words but I cant find the connection between the tree depth and complexity.
Recursion Formula from my calculation
m = length of word1
n = length of word2
T(m,n) = T(m-1,n-1) + 1 + T(m-1,n) + T(m,n-1)
With the base cases:
T(0,n) = n
T(m,0) = m
But I have no idea on how to proceed since each call leads to 3 new calls as the lengths don't reach 0.
I would be grateful for any tips on how I can proceed to show that the lower bound complexity is Ω(2max(n,m)).
Your recursion formula:
T(m,n) = T(m-1,n-1) + T(m-1,n) + T(m,n-1) + 1
T(0,n) = n
T(m,0) = m
is right.
You can see, that every T(m,n) splits of into three paths. Due to every node runs in O(1) we only have to count the nodes.
A shortest path has the length min(m,n), so the tree has at least 3min(m,n) nodes. But there are some path that are longer. You get the longest path by alternately reduce the first and the second string. This path will have the length m+n-1, so the whole tree has at most 3m+n-1 nodes.
Let m = min(m,n). The tree contains also at least
different paths, one for each possible order of reducing n.
So Ω(2max(m,n)) and Ω(3min(m,n)) are lower bounds and O(3m+n-1) is an upper bound.

How to generate a permutation?

My question is: given a list L of length n, and an integer i such that 0 <= i < n!, how can you write a function perm(L, n) to produce the ith permutation of L in O(n) time? What I mean by ith permutation is just the ith permutation in some implementation defined ordering that must have the properties:
For any i and any 2 lists A and B, perm(A, i) and perm(B, i) must both map the jth element of A and B to an element in the same position for both A and B.
For any inputs (A, i), (A, j) perm(A, i)==perm(A, j) if and only if i==j.
NOTE: this is not homework. In fact, I solved this 2 years ago, but I've completely forgotten how, and it's killing me. Also, here is a broken attempt I made at a solution:
def perm(s, i):
n = len(s)
perm = [0]*n
itCount = 0
for elem in s:
perm[i%n + itCount] = elem
i = i / n
n -= 1
itCount+=1
return perm
ALSO NOTE: the O(n) requirement is very important. Otherwise you could just generate the n! sized list of all permutations and just return its ith element.
def perm(sequence, index):
sequence = list(sequence)
result = []
for x in xrange(len(sequence)):
idx = index % len(sequence)
index /= len(sequence)
result.append( sequence[idx] )
# constant time non-order preserving removal
sequence[idx] = sequence[-1]
del sequence[-1]
return result
Based on the algorithm for shuffling, but we take the least significant part of the number each time to decide which element to take instead of a random number. Alternatively consider it like the problem of converting to some arbitrary base except that the base name shrinks for each additional digit.
Could you use factoradics? You can find an illustration via this MSDN article.
Update: I wrote an extension of the MSDN algorithm that finds i'th permutation of n things taken r at a time, even if n != r.
A computational minimalistic approach (written in C-style pseudocode):
function perm(list,i){
for(a=list.length;a;a--){
list.switch(a-1,i mod a);
i=i/a;
}
return list;
}
Note that implementations relying on removing elements from the original list tend to run in O(n^2) time, at best O(n*log(n)) given a special tree style list implementation designed for quickly inserting and removing list elements.
The above code rather than shrinking the original list and keeping it in order just moves an element from the end to the vacant location, still makes a perfect 1:1 mapping between index and permutation, just a slightly more scrambled one, but in pure O(n) time.
So, I think I finally solved it. Before I read any answers, I'll post my own here.
def perm(L, i):
n = len(L)
if (n == 1):
return L
else:
split = i%n
return [L[split]] + perm(L[:split] + L[split+1:], i/n)
There are n! permutations. The first character can be chosen from L in n ways. Each of those choices leave (n-1)! permutations among them. So this idea is enough for establishing an order. In general, you will figure out what part you are in, pick the appropriate element and then recurse / loop on the smaller L.
The argument that this works correctly is by induction on the length of the sequence. (sketch) For a length of 1, it is trivial. For a length of n, you use the above observation to split the problem into n parts, each with a question on an L' with length (n-1). By induction, all the L's are constructed correctly (and in linear time). Then it is clear we can use the IH to construct a solution for length n.

Resources