How can I compute the time complexity of this code? - runtime

def long_common_prefix(input_list):
for i in range(1, len(input_list)):
input_list[0] = get_prefix(input_list[0], input_list[i])
return input_list[0]
def get_prefix(s1, s2):
p1 = p2 = 0
e1, e2 = len(s1), len(s2)
currLongest = ''
while p1 != e1 and p2 != e2:
if s1[p1] == s2[p2]:
currLongest += s1[p1]
p1 += 1
p2 += 1
else:
break
return currLongest
This is a code that computes the longest common prefix of strings in a list. So, for example, ['abcd','abc','ab'] would give me 'ab'. The code works well, but I wasn't sure how to define the time and space complexity of the code.
In long_common_prefix, there is only a single loop which is O(N), and inside the loop, it calls the helper get_prefix. Since the get_prefix would be O(N) and it will be running inside the single loop, would it be O(N^2) where N is the length of the string?
Thanks.

You are right, except N is the length of the input list, but get_prefix operates in linear time (and space) on the strings, and the lengths of the strings is in no way related to the length of the input list.
The complexity of get_prefix is O(M), where M is the length of the common prefix.
Time complexity: O(N*M)
Space complexity: O(M)

Related

What is the space and time complexity for this function that gets a node in a linked list based on its position from the tail?

What is the space and time complexity (worst-case) for the following solution?
Node getNodeFromTail(Node head, int x){
Node p = head;
Node q = head;
int diff = 0;
while (p.next != NULL){
p = p.next;
if (diff >= x)
q = q.next;
else
diff++;
}
return q;
}
This is the explanation of the code above:
Take this linked list for example: 1 --> 2 --> 3 --> 4 --> 5
x == value from tail,
when x is 0, the result should be 5
when x is 1, the result should be 4
when x is 2, the result should be 3 and so on.
This is what I think:
Space Complexity
Constant space O(1)
Time Complexity
The while loop walks through the linked list in O(n) time;
where n = the length of the linked list.
However, I think there is an extra complexity because of the if statement, which I feel should be O(n-x) time
Thus leaving us with: O(n) * O(n-x) which is almost an overall time complexity of O(n^2) (i.e. quadratic).
I just have a feeling that this is not exactly linear time in the worst case.
Is this correct?
Reference: https://stackoverflow.com/a/31190255/12357170
Time complexity is O(n) or linear. The condition of the while loop depends only on the p = p->next instruction inside the loop. The other instructions inside the loop are executed in constant time, and don't have impact on the condition of the while loop. So, we can conclude that loop is executed exactly n times ( if list is n elements long).
Also note that there is no other loop inside first while loop, like you are suggesting in your question.

Deriving the cost function from algorithm and proving their complexity is O(...)

When computing algorithm costs with 1 for each operation, it gets confusing when while loops depend on more than one variable. This pseudo code inserts an element into the right place of a heap.
input: H[k] // An array of size k, storing a heap
e // an element to insert
s // last element in array (s < k - 1)
output: Array H, e is inserted into H in the right place
s = s+1 [2]
H[s] = e [3]
while s > 1: ]
t=s/2 ][3]
if H[s] < H[t] ][3]
tmp = H[s] ][3]
H[s] = H[t] ][3]
H[t] = tmp ][3]
s = t ][2]
else
break ][1]
return H
What would be the cost function in terms of f(n)? and the Big O complexity?
I admit, I was initially confused by the indentation of your pseudo-code. After being prompted by M.K's comment, I reindented your code, and understood what you meant about more than one variable.
Hint: If s is equal to 2k, the loop will iterate k times, worst case. Expected average is k/2 iterations.
The reason for k/2 is that absent any other information, we assume the input e has equal chance of being any value between the current min and max of the array. If you know the distribution, then you can skew the expected average accordingly. Usually, though, the expected average will be constant factor of k, and so does not affect the big-O.
Let n be the number of elements in the heap. So, the cost function f(n) represents the cost of the function for a heap of size n. The cost of the function outside of the while loop is constant C1, so f(n) is dominated by the while loop itself, g(n). The cost of each iteration of the loop is also constant C2, so the cost is dependent on the number of iterations. So: f(n) = C1 + g(n+1). And g(n) = C2 + g(n/2). Now, you can solve the characteristic equation for g(n). Note that g(1) is 0, and g(2) is C2.
The algorithm as presented uses swaps to sort of bubble the element up into the correct position. To make the inner loop more efficient (it doesn't change the complexity, mind you), the inner loop can instead behave more like an insertion sort would behave, and place the element in the right place only at the end.
s = s+1
while s > 1 and e < H[s/2]:
H[s] = H[s/2];
s = s/2;
H[s] = e;
If you look at the while loop, you’ll observe that s divides itself by 2 until you reach 1.
Therefore the number of iterations will be equal to the log of s to the base 2.

Algorithm run-time complexity recursion BIG-O

I understand the concept of the Big O notation and its something i've brought upon myself to learn recently.
But say for a given algorithm, which recursively calls itself until the job is complete, if there is an OR in my return statement, how will that effect the Big O notation?
Here's the algorithm thus far:
**Algorithm: orderedSort(a,b,c)**
given strings a,b,c determine whether c is an ordered shuffle of a and b
l := length of (a + b)
if (l = 0) then
return true
if (l not = length of c)
return false
else
d := substring of a position 1 to 1
e := substring of b position 1 to 1
f := substring of c position 1 to 1
if (d = f) then
return orderedSort(a-d, b, c-f)
if (e = f) then
return orderedSort(a, b-e, c-f)
if (d and e = f) then
return orderedSort(a-d, b, c-f) or orderedSort(a, b-e, c-f)
Does having the or make it n^2?
It's far worse than you think. If both halves of the "or" will need to be evaluated some % of the time, then you will end up with O(2^n) (not O(n^2)) recursive calls.
Let's say it takes both halves of the OR 10% of the time. On average you have to go down 10ish levels before you do both halves, so you have around:
1 call with length n
2 calls with length n-10
4 calls with length n-20
8 calls with length n-30
...
2^(n/10) calls with length 0
Also, it's worse than that again, because all those string manipulations (length(a+b), a-d, etc.) take O(n) time, not constant time.
EDIT: I should mention that O(2^n) is not actually correct. It's "exponential time", but O(2^(n/10)) or whatever is strictly less than O(2^n). A correct way to write it is 2^O(n)
EDIT:
A good solution for this problem would use dynamic programming.
Let OK(i,j) = true if the first i+j characters of c are an ordered shuffle of the first i characters of a and the first j characters of b.
OK(i,0) is easy to calculate for all i. Then you can calculate all the OK(i,j) from OK(i,j-1). When you've covered all the cases with i+j = length(c), then return true if any one of them is true.

Complexity of edit distance (Levenshtein distance) recursion top down implementation

I have been working all day with a problem which I can't seem to get a handle on. The task is to show that a recursive implementation of edit distance has the time complexity Ω(2max(n,m)) where n & m are the length of the words being measured.
The implementation is comparable to this small python example
def lev(a, b):
if("" == a):
return len(b) # returns if a is an empty string
if("" == b):
return len(a) # returns if b is an empty string
return min(lev(a[:-1], b[:-1])+(a[-1] != b[-1]), lev(a[:-1], b)+1, lev(a, b[:-1])+1)
From: http://www.clear.rice.edu/comp130/12spring/editdist/
I have tried drawing trees of the recursion depth for different short words but I cant find the connection between the tree depth and complexity.
Recursion Formula from my calculation
m = length of word1
n = length of word2
T(m,n) = T(m-1,n-1) + 1 + T(m-1,n) + T(m,n-1)
With the base cases:
T(0,n) = n
T(m,0) = m
But I have no idea on how to proceed since each call leads to 3 new calls as the lengths don't reach 0.
I would be grateful for any tips on how I can proceed to show that the lower bound complexity is Ω(2max(n,m)).
Your recursion formula:
T(m,n) = T(m-1,n-1) + T(m-1,n) + T(m,n-1) + 1
T(0,n) = n
T(m,0) = m
is right.
You can see, that every T(m,n) splits of into three paths. Due to every node runs in O(1) we only have to count the nodes.
A shortest path has the length min(m,n), so the tree has at least 3min(m,n) nodes. But there are some path that are longer. You get the longest path by alternately reduce the first and the second string. This path will have the length m+n-1, so the whole tree has at most 3m+n-1 nodes.
Let m = min(m,n). The tree contains also at least
different paths, one for each possible order of reducing n.
So Ω(2max(m,n)) and Ω(3min(m,n)) are lower bounds and O(3m+n-1) is an upper bound.

How to avoid generating all subsequences [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Square Subsequence
I have been trying to solve the "Square Subsequences" problem on interviewstreet.com:
A string is called a square string if it can be obtained by concatenating two copies of the same string. For example, "abab", "aa" are square strings, while "aaa", "abba" are not.
Given a string, how many subsequences of the string are square strings?
I tried working out a DP solution, but this constraint seems impossible to circumvent: S will have at most 200 lowercase characters (a-z).
From what I know, finding all subsequences of a list of length n is O(2^n), which stops being feasible as soon as n is larger than, say, 30.
Is it really possible to systematically check all solutions if n is 200? How do I approach it?
First, for every letter a..z you get a list of their indices in S:
`p[x] = {i : S[i] = x}`, where `x = 'a',..,'z'`.
Then we start DP:
S: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
^ ^ ^
r1 l2 r2
Let f(r1,l2,r2) be the number of square subsequences (subsequences that are square strings) of any length L such that
SS[L-1] = r1
SS[L] = l2
SS[2L-1] = r2
i.e. the first half ends exactly at r1, the second half starts exactly at l2 and ends at r2.
The algorithm is then:
Let f[r1,l2,l2] = 1 if S[r1] = S[l2], else 0.
for (l2 in 1..2L-1 )
for( r1 in 0..l2-1 )
for (r2 in l2..2L-1)
if( f(r1, l2, r2) != 0 )
for (x in 'a'..'z')
for (i,j: r1 < i < l2, r2 < j, S[i] = S[j] = x) // these i,j are found using p[x] quickly
f[i, l2, j] += f[r1, l2, r2]
In the end, the answer is the sum of all the values in the f[.,.,.] array.
So basically, we divide S unisg l2 into two parts and then count the common subsequences.
It's hard for me to provide exact time complexity estimation right now, it's surely below n^4 and n^4 is acceptable for n = 200.
There are many algorithms (e.g. Z-algorithm) which can in linear time generate an array of prefix lengths. That is for every position i it tells you what is the longest prefix that can be read starting from position i (of course to i = 0 the longetst prefix is n).
Now notice that if you have a square string starting at the beginning, then there is a position k in this prefix length array such that the longest length is >=k. So you can count the number of those in linear time again.
Then remove the first letter of you string and do the same thing.
The total complexity of this would be O(n^2).

Resources