Time complexity analysis of function with recursion inside loop - algorithm

I am trying to analysis time complexity of below function. This function is used to check if a string is made of other strings.
set<string> s; // s has been initialized and stores all the strings
bool fun(string word) {
int len = word.size();
// something else that can also return true or false with O(1) complexity
for (int i=1; i<=len; ++i) {
string prefix = word.substr(0,i);
string suffix = word.substr(i);
if (prefix in s && fun(suffix))
return true;
else
return false;
}
}
I think the time complexity is O(n) where n is the length of word (am I right?). But as the recursion is inside the loop, I don't know how to prove it.
Edit:
This code is not a correct C++ code (e.g., prefix in s). I just show the idea of this function, and want to know how to analysis its time complexity

The way to analyze this is by developing a recursion relationship based on the length of the input and the (unknown) probability that a prefix is in s. Let's assume that the probability of a prefix being in s is given by some function pr(L) of the length L of the prefix. Let the complexity (number of operations) be given by T(len).
If len == 0 (word is the empty string), then T = 1. (The function is missing a final return statement after the loop, but we're assuming that the actual code is only a sketch of the idea, not what's actually executing).
For each loop iteration, denote the loop body complexity by T(len; i). If the prefix is not in s, then the body has constant complexity (T(len; i) = 1). This event has probability 1 - pr(i).
If the prefix is in s, then the function returns true or false according to the recursive call to fun(suffix), which has complexity T(len - i). This event has probability pr(i).
So for each value of i, the loop body complexity is:
T(len; i) = 1 * (1 - pr(i)) + T(len - i) * pr(i)
Finally (and this depends on the intended logic, not the posted code), we have
T(len) = sum i=1...len(T(len; i))
For simplicity, let's treat pr(i) as a constant function with value 0.5. Then the recursive relationship for T(len) is (up to a constant factor, which is unimportant for O() calculations):
T(len) = sum i=1...len(1 + T(len - i)) = len + sum i=0...len-1(T(i))
As noted above, the boundary condition is T(0) = 1. This can be solved by standard recursive function methods. Let's look at the first few terms:
len T(len)
0 1
1 1 + 1 = 2
2 2 + 2 + 1 = 5
3 3 + (4 + 2 + 1) = 11
4 4 + (11 + 5 + 2 + 1) = 23
5 5 + (23 + 11 + 5 + 2 + 1) = 47
The pattern is clearly T(len) = 2 * T(len - 1) + 1. This corresponds to exponential complexity:
T(n) = O(2n)
Of course, this result depends on the assumption we made about pr(i). (For instance, if pr(i) = 0 for all i, then T(n) = O(1). There would also be non-exponential growth if pr(i) had a maximum prefix length—pr(i) = 0 for all i > M for some M.) The assumption that pr(i) is independent of i is probably unrealistic, but this really depends on how s is populated.

Assuming that you've fixed the bugs others have noted, then the i values are the places that the string is being split (each i is the leftmost splitpoint, and then you recurse on everything to the right of i). This means that if you were to unwind the recursion, you are looking at up to n-1 different split points, and asking if each substring is a valid word. Things are ok if the beginning of word doesn't have a lot of elements from your set, since then you can skip the recursion. But in the worst case, prefix in s is always true, and you try every possible subset of the n-1 split points. This gives 2^{n-1} different splitting sets, multiplied by the length of each such set.

Related

Time complexity for recursive algorithm mapping digits to letters, phone keypad style

The following code returns all possible letter sequences that a sequence of digits could represent, using a phone keypad to map digits to letters, as in the following image.
Here are some example inputs and outputs:
Input: "2"
Output: ["a","b","c"]
Input: "23"
Output: ["ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf"]
If n is the number of digits in the input, and k is maximum number of characters mapped to an individual digit, what would be time complexity?
I have come up with the following recurrence relation (correct me if I am wrong): T(n) = T(n-1) + k^(n-1) * k
But I am not able to figure out the time complexity. Could someone help me understand how to calculate the time complexity of this type of solution?
class Solution {
public List<String> letterCombinations(String digits) {
if(digits == null || digits.length() == 0) {
return Collections.emptyList();
}
Map<Integer, List<Character>> map = new HashMap<>();
map.put(2, Arrays.asList('a','b','c'));
map.put(3, Arrays.asList('d','e','f'));
map.put(4, Arrays.asList('g','h','i'));
map.put(5, Arrays.asList('j','k','l'));
map.put(6, Arrays.asList('m','n','o'));
map.put(7, Arrays.asList('p','q','r','s'));
map.put(8, Arrays.asList('t','u','v'));
map.put(9, Arrays.asList('w','x','y','z'));
List<String> result = new ArrayList<>();
recurse(digits, result,"", map, 0);
return result;
}
public void recurse(String digits, List<String> result, String temp, Map<Integer, List<Character>> map, int index) {
if(index == digits.length()) {
result.add(temp);
} else {
Integer ch = Character.getNumericValue(digits.charAt(index));
List<Character> chars = map.get(ch);
for(int i=0; i < chars.size(); i++) {
recurse(digits, result, temp + chars.get(i), map, index + 1);
}
}
}
}
One way to find the time complexity for enumerative algorithms like this (where you gather all ways of doing something) is to think about how many outputs there are, and how long it takes to compute an output.
If you have n characters, each of which maps to k options, then the number of possible results is k^n. The complexity of your algorithm is therefore at least k^n, or Omega(k^n), because O(k^n) outputs are enumerated.
We still need to consider how long it takes to compute each input. Notice that you're building a String of length n by adding one character at a time. Since Strings are immutable, every time you add a character, an entirely new String must be created. The work involved to produce a String of length n by appending characters is 1 + 2 + ... + n = O(n^2).
Thankfully, the work done to create the result is the same for all results. Therefore, we can just multiply the number of results by the work for each result, to arrive at the final complexity O(n^2 * k^n), or more specifically, Theta(n^2 * k^n).
We can also obtain a recurrence relation as follows. Let i be the same as index in your code, which counts up from 0 to n. Let j be n-i, which means "the number of digits left to process".
We then have T(j) = k*((i+1) + T(j-1)) = k*((n-j+1) + T(j-1)) and T(0) = 1. Your overall time complexity is given by T(j) where j=n.
Explanation: Suppose you have j digits left to process, which corresponds to a single call to recurse. We need to loop over k characters, and on every iteration of that loop, we do i+1 work (to add a char to temp) as well as T(j-1) work (to recurse, having one fewer digit left to process).
If we "unroll" the recurrence, we find:
T(n) = k*(1 + T(n-1))
= k*(1 + k*(2 + T(n-2)))
= k*(1 + k*(2 + k*(3 + ... k*(n + 1)...)))
= 1*k + 2*k^2 + 3*k^3 + ... + n*k^n + 1*k^n
Then we need to upper-bound and lower-bound this sum.
T(n) <= 1*k^n + 2*k^n + ... + n*k^n + 1*k^n
= (1 + 2 + ... + n)*k^n + 1*k^n
= O(n^2 * k^n)
A lower bound is trickier. Let a be any constant with 0 < a < 1. Let m = floor(a * n).
T(n) >= m*k^m + (m+1)*k^(m+1) + ... + n*k^n (there are n-m+1 terms)
>= (n-m+1) * m*k^m (replace each term with m*k^m)
= (constant*n) * (a*n)*k^(a*n)
= constant * n^2 * k^(a*n)
This means, for any constant 0 < a < 1, we have T(n) = Omega(n^2 * k^(a*n)), so we can prove that our lower bound for T(n) is arbitrarily close to Omega(n^2 * k^n).
Combined with the upper bound, we have shown that T(n) = Theta(n^2 * k^n).
The complexity in Big-O notation is O(4^n).
This is not exactly what you asked for - you asked for the time complexity - but Big-O is a common way of expressing that.
Note that I did not use k at all - this is because k varies depending on the exact composition of your input string, and Big-O takes the worst possible running time (which in your case is if all the digits mapped to four different letters).
If you want to use k, you could do this:
O(k1^n1 * k2^n2 * k3^n3 ...)
Where k1 is a particular value of k, and n1 is the number of digits in the input string that map to that value of k. In your example, there would only be k1=3 and k2=4, since no other values of k are possible: O(3^n1 * 4^n2)
In my opinion this is as easy as:
if N = digits.length, and k = number of letters for this button, then k*N, so O(kN).

Why Time complexity of permutation function is O(n!)

Consider following code.
public class Permutations {
static int count=0;
static void permutations(String str, String prefix){
if(str.length()==0){
System.out.println(prefix);
}
else{
for(int i=0;i<str.length();i++){
count++;
String rem = str.substring(0,i) + str.substring(i+1);
permutations(rem, prefix+str.charAt(i));
}
}
}
public static void main(String[] args) {
permutations("abc", "");
System.out.println(count);
}
}
here the logic, that i think is followed is- it considers each character of the string as a possible prefix and permutes the remaining n-1 characters.
so by this logic recurrence relation comes out to be
T(n) = n( c1 + T(n-1) ) // ignoring the print time
which is obviously O(n!). but when i used a count variable to see wheather algo really grows in order of n!, i found different results.
for 2-length string for count++(inside for loop) runs 4 times, for 3-length string value of count comes 15 and for 4 and 5-length string its 64 and 325.
It means it grows worse than n!. then why its said that this(and similar algos which generate permuatations) are O(n!) in terms of run time.
People say this algorithm is O(n!) because there are n! permutations, but what you are counting here are (in a sense) function calls - And there are more function calls than n!:
When str.length() == n, you do n calls;
For each of these n calls with str.length() == n - 1, you do n - 1 calls;
For each of these n * (n - 1) calls with str.length() == n - 2 you do n - 2 calls;
...
You do n! / k! calls with an input str of length k1, and since the length goes from n to 0, the total number of calls is:
sum k = 0 ... n (n! / k!) = n! sum k = 0 ... n (1 / k!)
But as you may know:
sum k = 0 ... +oo 1 / k! = e1 = e
So basically, this sum is always less than the constant e (and greater than 1), so you can say that the number of calls is O(e.n!) which is O(n!).
Runtime complexity is often different from theoretical complexity. In theoretical complexity, people want to know the number of permutations because the algorithm is probably going to check each of these permutations (so there are effectively n! check done), but in reality there is much more thing going on.
1 This formula will actually give you one compared to the values you got since you did not account for the initial function call.
this answer is for people like me who doesn't remember e=1/0!+1/1!+1/2!+1/3!...
I can explain using a simple example, say we want all the permutation of "abc"
/ / \ <--- for first position, there are 3 choices
/\ /\ /\ <--- for second position, there are 2 choices
/ \ / \ / \ <--- for third position, there is only 1 choice
above is the recursion tree, and we know that there are 3! leaf nodes, which represents all possible permutations of "abc" (which is also where we perform an action on the result, ie print()), but since you are counting all function calls, we need to know how many tree nodes in total (leaf + internal)
if it was a complete binary tree, we know there are 2^n leaf nodes...how many internal nodes?
x = |__________leaf_____________|------------------------|
let this represent 2^n leaf nodes, |----| represents the max number of
nodes in the level above, since each node has 1 parent, 2nd last level
cannot have more nodes than leaf
since its binary, we know second last level = (1/2)leaf
x = |__________leaf_____________|____2nd_____|-----------|
same for the third last level...which is (1/2)sec
x = |__________leaf_____________|____2nd_____|__3rd_|----|
x can be used to represent total number of tree nodes, and since we are always cutting half on the initial |-----| we know that total <= 2*leaf
now for permutation tree
x = |____leaf____|------------|
let this represent n! leaf nodes
since its second last level has 1 branch, we know second last level = x
x = |____leaf____|____2nd_____|-------------|
but third last level has 2 branches for each node, thus = (1/2)second
x = |____leaf____|____2nd_____|_3rd_|-------|
fourth last level has 3 branches for each node, thus = (1/3)third
x = |____leaf____|____2nd_____|_3rd_|_4|--| |
| | means we will no longer consider it
here we see that total < 3*leaf, this is as expected (e = 2.718)

Recursive Algorithm into Iterative

How can I turn the following recursive algorithm into an iterative algorithm?
count(integer: n)
for i = 1...n
return count(n-i) + count(n-i)
return 1
Essentially this algorithm computes the following:
count(n-1) + count(n-2) + ... + count(1)
This is not a tail recursion, so it is not trivial to transform it into iterative.
However, a recursion can be simulated using a stack and loop pretty easily, by pushing to the stack rather than recursing.
stack = Stack()
stack.push(n)
count = 0
while (stack.empty() == false):
current = stack.pop()
count++
for i from current-1 to 1 inclusive (and descending):
stack.push(i)
return count
Another solution is doing it with Dynamic Programming, since you don't need to calculate the same thing multiple times:
DP = new int[n+1]
DP[0] = 1
for i from 1 to n:
DP[i] = 0
for j from 0 to i-1:
DP[i] += DP[j]
return DP[n]
Note that you can even optimize it to run in O(n) rather than O(n^2), by remembering the "so far sum":
sum = 1
current = 1
for i from 1 to n:
current = sum
sum = sum + current
return current
Lastly, this actually sums to something you can easily pre-calculate: count(n) = 2^(n-1), count(0) = 1 (You can suspect it from seeing the last iterative solution we have...)
base: count(0) automatically yields 1, as the loop's body is not reached.
Hypothesis: T(k) = 2^(k-1) for all k < n
Proof:
T(n) = T(n-1) + T(n-2) + ... + T(1) + T(0) = (induction hypothesis)
= 2^(n-2) + 2^(n-3) + ... + 2^0 + 1 =
= sum { 2^i | i=0,...,n-2 } + 1 = (sum of geometric series)
= (1-2^(n-1)/(1-2)) + 1 = (2^(n-1) - 1) + 1 = 2^(n-1)
If you define your problem in the following recursive way:
count(integer : n)
if n==0 return 1
return count(n-1)+count(n-1)
Converting to an iterative algorithm is a typical application of backwards induction where you should keep all previous results:
count(integer : n):
result[0] = 1
for i = 1..n
result[i] = result[i-1] + result[i-1]
return result[n]
Ir is clear that this is more complicated than it should be because the point is to exemplify backwards induction. I could be accumulating into a single place but I wanted to provide a more general concept that could be extended to other cases. In my opinion the idea is clearer this way.
The pseudocode can be improved after the key idea is clear. In fact, there are two very simple improvements that are applicable only to this specific case:
instead of keeping all previous values, only the last one is necessary
there is no need for two identical calls as there are no side-effects expected
Going beyond, it is possible to calculate that based on the definition of the function, count(n)= 2^n
The statement return count(n-i) + count(n-i) appears to be equivalent to return 2 * count(n-i). In that case:
count(integer: n)
result = 1
for i = 1...n
result = 2 * result
return result
What am I missing here?

Complexity analysis of SelectionSort

Here's a SelectionSort routine I wrote. Is my complexity analysis that follows correct?
public static void selectionSort(int[] numbers) {
// Iterate over each cell starting from the last one and working backwards
for (int i = numbers.length - 1; i >=1; i--)
{
// Always set the max pos to 0 at the start of each iteration
int maxPos = 0;
// Start at cell 1 and iterate up to the second last cell
for (int j = 1; j < i; j++)
{
// If the number in the current cell is larger than the one in maxPos,
// set a new maxPos
if (numbers[j] > numbers[maxPos])
{
maxPos = j;
}
}
// We now have the position of the maximum number. If the maximum number is greater
// than the number in the current cell swap them
if (numbers[maxPos] > numbers[i])
{
int temp = numbers[i];
numbers[i] = numbers[maxPos];
numbers[maxPos] = temp;
}
}
}
Complexity Analysis
Outter Loop (comparison & assignment): 2 ops performed n times = 2n ops
Assigning maxPos: n ops
Inner Loop (comparison & assignment): 2 ops performed 2n^2 times = 2n² ops
Comparison of array elements (2 array references & a comparison): 3n² ops
Assigning new maxPos: n² ops
Comparison of array elements (2 array references & a comparison): 3n² ops
Assignment & array reference: 2n² ops
Assignment & 2 array references: 3n² ops
Assignment & array reference: 2n² ops
Total number of primitive operations
2n + n + 2n² + 3n² + n^2 + 3n² + 2n² + 3n² + 2n² = 16n² + 3n
Leading to Big Oh(n²)
Does that look correct? Particularly when it comes to the inner loop and the stuff inside it...
Yes, O(N2) is correct.
Edit: It's a little hard to guess at exactly what they may want as far as "from first principles" goes, but I would guess that they're looking for (in essence) something on the order of a proof (or at least indication) that the basic definition of big-O is met:
there exist positive constants c and n0 such that:
0 ≤ f(n) ≤ cg(n) for all n ≥ n0.
So, the next step after finding 16N2+3N would be to find the correct values for n0 and c. At least at first glance, c appears to be 16, and n0, -3, (which is probably treated as 0, negative numbers of elements having no real meaning).
Generally it is pointless (and incorrect) to add up actual operations, because operations take various numbers of processor cycles, some of them dereference values from memory which takes a lot more time, then it gets even more complex because compilers optimize code, then you have stuff like cache locality, etc, so unless you know really, really well how everything works underneath, you are adding up apples and oranges. You can't just add up "j < i", "j++", and "numbers[i] = numbers[maxPos]" as if they were equal, and you don't need to do so - for the purpose of complexity analysis, a constant time block is a constant time block. You are not doing low level code optimization.
The complexity is indeed N^2, but your coefficients are meaningless.

Time complexity

The Problem is finding majority elements in an array.
I understand how this algorithm works, but i don't know why this has O(nlogn) as a time complexity.....
a. Both return \no majority." Then neither half of the array has a majority
element, and the combined array cannot have a majority element. Therefore,
the call returns \no majority."
b. The right side is a majority, and the left isn't. The only possible majority for
this level is with the value that formed a majority on the right half, therefore,
just compare every element in the combined array and count the number of
elements that are equal to this value. If it is a majority element then return
that element, else return \no majority."
c. Same as above, but with the left returning a majority, and the right returning
\no majority."
d. Both sub-calls return a majority element. Count the number of elements equal
to both of the candidates for majority element. If either is a majority element
in the combined array, then return it. Otherwise, return \no majority."
The top level simply returns either a majority element or that no majority element
exists in the same way.
Therefore, T(1) = 0 and T(n) = 2T(n/2) + 2n = O(nlogn)
I think,
Every recursion it compares the majority element to whole array which takes 2n.
T(n) = 2T(n/2) + 2n = 2(2T(n/4) + 2n) +
2n = ..... = 2^kT(n/2^k) + 2n + 4n + 8n........ 2^kn = O(n^2)
T(n) = 2T(n/2) + 2n
The question is how many iterations does it take for n to get to 1.
We divide by 2 in each iteration so we get a series: n , n/2 , n/4 , n/8 ... n/(n^k)
So, let's find k that will bring us to 1 (last iteration):
n/(2^k)=1 .. n=2^k ... k=log(n)
So we got log(n) iterations.
Now, in each iteration we do 2n operations (less because we divide n by 2 each time) but in worth case scenario lets say 2n.
So in total, we got log(n) iterations with O(n) operations: nlog(n)
I'm not sure if I understand, but couldn't you just create a hash map, walk over the array, incrementing hash[value] at every step, then sort the hash map (xlogx time complexity) and compare the top two elements? This would cost you O(n) + O(mlogm) + 2 = O(n + mlogm), with n the size of the array and m the amount of different elements in the vector.
Am I mistaken here? Or ...?
When you do this recursively, you split the array in two for each level, make a call for each half, then makes one of the tests a - d. The test a requires no looping, the other tests requires looping through the entire array. By average you will loop through (0 + 1 + 1 + 1) / 4 = 3 / 4 of the array for each level in the recursion.
The number of levels in the recursion is based on the size of the array. As you split the array in half each level, the number of levels will be log2(n).
So, the total work is (n * 3/4) * log2(n). As constants are irrelevant to the time complexity, and all logarithms are the same, the complexity is O(n * log n).
Edit:
If someone is wondering about the algorithm, here's a C# implementation. :)
private int? FindMajority(int[] arr, int start, int len) {
if (len == 1) return arr[start];
int len1 = len / 2, len2 = len - len1;
int? m1 = FindMajority(arr, start, len1);
int? m2 = FindMajority(arr, start + len1, len2);
int cnt1 = m1.HasValue ? arr.Skip(start).Take(len).Count(n => n == m1.Value) : 0;
if (cnt1 * 2 >= len) return m1;
int cnt2 = m2.HasValue ? arr.Skip(start).Take(len).Count(n => n == m2.Value) : 0;
if (cnt2 * 2 >= len) return m2;
return null;
}
This guy has a lot of videos on recurrence relation, and the different techniques you can use to solve them:
https://www.youtube.com/watch?v=TEzbkIggJfo&list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
Basically for this problem I would use the Master Theorem:
https://youtu.be/i5kTZof1LRY
T(1) = 0 and T(n) = 2T(n/2) + 2n
Master Theorem ==> AT(n/B) + 2n^D, so in this case A=2, B=3, D=1
So according to the Master Theorem this is O(nlogn)
You can also use another method to solve this (below) it would just take a little bit more time:
https://youtu.be/TEzbkIggJfo?list=PLj68PAxAKGoyyBwi6qrfcsqE_4trSO1yL
I hope this helps you out !

Resources