Time complexity for combination of parentheses - algorithm

I tried to do the classical problem to implement an algorithm to print all valid combinations of n pairs of parentheses.
I found this program (which works perfectly) :
public static void addParen(ArrayList<String> list, int leftRem, int rightRem, char[] str, int count) {
if (leftRem < 0 || rightRem < leftRem) return; // invalid state
if (leftRem == 0 && rightRem == 0) { /* all out of left and right parentheses */
String s = String.copyValueOf(str);
list.add(s);
} else {
if (leftRem > 0) { // try a left paren, if there are some available
str[count] = '(';
addParen(list, leftRem - 1, rightRem, str, count + 1);
}
if (rightRem > leftRem) { // try a right paren, if there’s a matching left
str[count] = ')';
addParen(list, leftRem, rightRem - 1, str, count + 1);
}
}
}
public static ArrayList<String> generateParens(int count) {
char[] str = new char[count*2];
ArrayList<String> list = new ArrayList<String>();
addParen(list, count, count, str, 0);
return list;
}
As I understand, the idea is that we will add left brackets whenever possible. For a right bracket, we will add it only if the remaining number of right brackets is greater than the left one. If we had used all the left and right parentheses, we will add the new combination to the result. We can be sure that there will not be any duplicate constructed string.
For me, this recursion, is like when we work with a tree for example and we do the pre order traversal for example : we go to a left node EACH time is possible, if not we go right, and then we try to go left just after this step. If we can’t, we "come back" and go right and we repeat the traversal. In my opinion, it's exactly the same idea here.
So, naively, I thought that the time complexity will be something like O(log(n)), O(n.log(n)) or something like that with logarithm. But, when I tried to search about that, I found something called "number of CATALAN", which we can use to count the number of combination of parentheses....(https://anonymouscoders.wordpress.com/2015/07/20/its-all-about-catalan/)
What are the time complexity in your opinion? Can we apply the master theorem here or not?

The complexity of this code is O(n * Cat(n)) where Cat(n) is the nth Catalan number. There are Cat(n) possible valid strings that are valid combinations of parenthesis (see https://en.wikipedia.org/wiki/Catalan_number), and for each a string of length n is created.
Since Cat(n) = choose(2n, n) / (n + 1), O(n * Cat(n)) = O(choose(2n, n)) = O(4^n / sqrt(n)) (see https://en.wikipedia.org/wiki/Central_binomial_coefficient).
There's two main flaws with your reasoning. The first is that the search tree is not balanced: the tree that you search when you close the right brace is not the same size as the tree you search when you add another left brace, so more common methods for computing complexity don't work. A second mistake is that even if you assume the tree is balanced, the height of the search tree would be n, and the number of leaves found O(2^n). This differs from analysis of a binary search tree, where you usually have n things in the tree and the height is O(log n).
I don't think there's any standard way to compute the time complexity here -- ultimately you're going to be reproducing something like the math done when you count valid parenthetical strings -- and the Master theorem isn't going to power you through that.
But there is a useful insight here: if a program generates f(n) things, and the cost of generating each if c(n), then the program's complexity can't be better than O(c(n)f(n)). Here, f(n) = Cat(n) and c(n) = 2n, so you can quickly get a lower bound for the complexity even if analyzing the code is difficult. This trick would have immediately led you to discard the idea that the complexity is O(log n) or O(n log n).

Related

How do I find the complexity of this recursive algorithm? Replace pattern in string with binary number

This algorithm essentially finds the star (*) inside a given binary string, and replaces it with 0 and also 1 to output all the different combinations of the binary string.
I originally thought this algorithm is O(2^n), however, it seems to me that that only takes into account the number of stars (*) inside the string. What about the length of the string? Since if there are no stars in the given string, it should technically still be linear, because the amount of recursive calls depends on string length, but my original O(2^n) does not seem to take that into account as it would become O(1) if n = 0.
How should I go about finding out its time and space complexity? Thanks.
Code:
static void RevealStr(StringBuilder str, int i) {
//base case: prints each possibility when reached
if(str.length() == i) {
System.out.println(str);
return;
}
//recursive step if wild card (*) found
if(str.charAt(i) == '*') {
//exploring permutations for 0 and 1 in the stack frame
for(char ch = '0'; ch <= '1'; ch++) {
//making the change to our string
str.setCharAt(i, ch);
//recur to reach permutations
RevealStr(str, i+1);
//undo changes to backtrack
str.setCharAt(i, '*');
}
return;
}
else
//if no wild card found, recur next char
RevealStr(str, i+1);
}
Edit: I am currently thinking of something like, O(2^s + l) where s is the number of stars and l the length of the string.
The idea of Big-O notation is to give an estimate of an upperbound, i.e. if the order of an algorithm is O(N^4) runtime it simply means that algorithm can't do any worse than that.
Lets say, there maybe an algorithm of order O(N) runtime but we can still say it is O(N^2). Since O(N) never does any worse than O(N^2). But then in computational sense we want the estimate to be as close and tight as it will give us a better idea of how well an algorithm actually performs.
In your current example, both O(2^N) and O(2^L), N is length of string and L is number of *, are valid upperbounds. But since O(2^L) gives a better idea about algorithm and its dependence on the presence of * characters, O(2^L) is better and tighter estimate (as L<=N) of the algorithm.
Update: The space complexity is implementation dependant. In your current implementation, assuming StringBuilder is passed by reference and there are no copies of strings made in each recursive call, the space complexity is indeed O(N), i.e. the size of recursive call stack. If it is passed by value and it is copied to stack every time before making call, the overall complexity would then be O(N * N), i.e. (O(max_number_of_recursive_calls * size_of_string)), since copy operation cost is O(size_of_string).
To resolve this we can do a manual run:
Base: n=1
RevealStr("*", 1)
It meets the criteria for the first if, we only ran this once for output *
Next: n=2
RevealStr("**", 1)
RevealStr("0*", 2)
RevealStr("00", 2)
RevealStr("01", 2)
RevealStr("1*", 2)
RevealStr("10", 2)
RevealStr("11", 2)
Next: n=3
RevealStr("***", 1)
RevealStr("0**", 2)
RevealStr("00*", 2)
RevealStr("000", 3)
RevealStr("001", 3)
RevealStr("01*", 2)
RevealStr("010", 3)
RevealStr("011", 3)
RevealStr("1**", 2)
RevealStr("10*", 2)
RevealStr("100", 3)
RevealStr("101", 3)
RevealStr("11*", 2)
RevealStr("110", 3)
RevealStr("111", 3)
You can see that with n=2, RevealStr was called 7 times, while with n=3 it was called 15. This follows the function F(n)=2^(n+1)-1
For the worst case scenario, the complexity seems to be O(2^n) being n the number of stars

O(n^2) or O(n log n)?

Algorithm
Basically, is the below algorithm O(n log n) or O(n^2). I'm sure the algorithm has a name; but I'm not sure what it is.
pseudo-code:
def sort(list):
dest = new list
for each element in list (call it a):
for each element in dest (call it c):
if a <= c, insert a into dest directly before c
return dest
in Java:
public static List<Integer> destSort(List<Integer> list) {
List<Integer> dest = new ArrayList<>();
for (Integer a : list) {
if (dest.isEmpty()) {
dest.add(a);
} else {
boolean added = false;
for (int j = 0; j < dest.size(); j++) {
int b = dest.get(j);
if (a <= b) {
dest.add(j, a);
added = true;
break;
}
}
if(!added) {
dest.add(a);
}
}
}
return dest;
}
Simply speaking, this algorithm walks a list, and inserts each element into a newly created list in its correct location.
Complexity
This is how I think about the complexity of this algorithm:
For each element in the list, dest increases in size by 1
This means that, at each step, the algorithm has a worst-case time of the size of dest
Summing those up, we'd get 0 + 1 + 2 + 3 + ... + n
The sum of all natural numbers up to n = n(n+1)/2
This simplifies to (n^2 - n)/2, and by removing constant & low degree terms, we get O(n^2)
Therefore the complexity is O(n^2).
However, I was recently browsing this answer, in which the author states:
O(n log n): There was a mix-up at the printer's office, and our phone
book had all its pages inserted in a random order. Fix the ordering so
that it's correct by looking at the first name on each page and then
putting that page in the appropriate spot in a new, empty phone book.
This, to me, sounds like the same algorithm, so my question is:
Is the algorithm I described the same as the one described by #John Feminella?
If it is, why is my calculation of O(n^2) incorrect?
If it isn't, how do they differ?
The algorithm you have described is different than the O(n log n) algorithm described in the linked answer. Your algorithm is, in fact, O(n^2).
The key difference is in the way the correct location for each element is determined. In you algorithm, each element is searched in order, meaning that you check every element against every other already-sorted element. The linked algorithm is predicated on the O(log n) method used for finding a person's name:
O(log n): Given a person's name, find the phone number by picking a random point about halfway through the part of the book you haven't searched yet, then checking to see whether the person's name is at that point. Then repeat the process about halfway through the part of the book where the person's name lies. (This is a binary search for a person's name.)
If you use this method to find where each page should go in the new book, you only end up doing O(log n) operations for each page, instead of O(n) operations per page as in your algorithm.
Incidentally, the algorithm you have described is essentially an insertion sort, although it uses two lists instead of sorting in-place.

Amortized worst case complexity of binary search

For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}
I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!
iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.

Sort name & time complexity

I "invented" "new" sort algorithm. Well, I understand that I can't invent something good, so I tried to search it on wikipedia, but all sort algorithms seems like not my. So I have three questions:
What is name of this algorithm?
Why it sucks? (best, average and worst time complexity)
Can I make it more better still using this idea?
So, idea of my algorithm: if we have an array, we can count number of sorted elements and if this number is less that half of length we can reverse array to make it more sorted. And after that we can sort first half and second half of array. In best case, we need only O(n) - if array is totally sorted in good or bad direction. I have some problems with evaluation of average and worst time complexity.
Code on C#:
public static void Reverse(int[] array, int begin, int end) {
int length = end - begin;
for (int i = 0; i < length / 2; i++)
Algorithms.Swap(ref array[begin+i], ref array[begin + length - i - 1]);
}
public static bool ReverseIf(int[] array, int begin, int end) {
int countSorted = 1;
for (int i = begin + 1; i < end; i++)
if (array[i - 1] <= array[i])
countSorted++;
int length = end - begin;
if (countSorted <= length/2)
Reverse(array, begin, end);
if (countSorted == 1 || countSorted == (end - begin))
return true;
else
return false;
}
public static void ReverseSort(int[] array, int begin, int end) {
if (begin == end || begin == end + 1)
return;
// if we use if-operator (not while), then array {2,3,1} transforms in array {2,1,3} and algorithm stop
while (!ReverseIf(array, begin, end)) {
int pivot = begin + (end - begin) / 2;
ReverseSort(array, begin, pivot + 1);
ReverseSort(array, pivot, end);
}
}
public static void ReverseSort(int[] array) {
ReverseSort(array, 0, array.Length);
}
P.S.: Sorry for my English.
The best case is Theta(n), for, e.g., a sorted array. The worst case is Theta(n^2 log n).
Upper bound
Secondary subproblems have a sorted array preceded or succeeded by an arbitrary element. These are O(n log n). If preceded, we do O(n) work, solve a secondary subproblem on the first half and then on the second half, and then do O(n) more work – O(n log n). If succeeded, do O(n) work, sort the already sorted first half (O(n)), solve a secondary subproblem on the second half, do O(n) work, solve a secondary subproblem on the first half, sort the already sorted second half (O(n)), do O(n) work – O(n log n).
Now, in the general case, we solve two primary subproblems on the two halves and then slowly exchange elements over the pivot using secondary invocations. There are O(n) exchanges necessary, so a straightforward application of the Master Theorem yields a bound of O(n^2 log n).
Lower bound
For k >= 3, we construct an array A(k) of size 2^k recursively using the above analysis as a guide. The bad cases are the arrays [2^k + 1] + A(k).
Let A(3) = [1, ..., 8]. This sorted base case keeps Reverse from being called.
For k > 3, let A(k) = [2^(k-1) + A(k-1)[1], ..., 2^(k-1) + A(k-1)[2^(k-1)]] + A(k-1). Note that the primary subproblems of [2^k + 1] + A(k) are equivalent to [2^(k-1) + 1] + A(k-1).
After the primary recursive invocations, the array is [2^(k-1) + 1, ..., 2^k, 1, ..., 2^(k-1), 2^k + 1]. There are Omega(2^k) elements that have to move Omega(2^k) positions, and each of the secondary invocations that moves an element so far has O(1) sorted subproblems and thus is Omega(n log n).
Clearly more coffee is required – the primary subproblems don't matter. This makes it not too bad to analyze the average case, which is Theta(n^2 log n) as well.
With constant probability, the first half of the array contains at least half of the least quartile and at least half of the greatest quartile. In this case, regardless of whether Reverse happens, there are Omega(n) elements that have to move Omega(n) positions via secondary invocations.
It seems this algorithm, even if it performs horribly with "random" data (as demonstrated by Per in their answer), is quite efficient for "fixing up" arrays which are "nearly-sorted". Thus if you chose to develop this idea further (I personally wouldn't, but if you wanted to think about it as an exercise), you would do well to focus on this strength.
this reference on Wikipedia in the Inversion article alludes to the issue very well. Mahmoud's book is quite insightful, noting that there are various ways to measure "sortedness". For example if we use the number of inversions to characterize a "nearly-sorted array" then we can use insertion sort to sort it extremely quickly. However if your arrays are "nearly-sorted" in slightly different ways (e.g. a deck of cards which is cut or loosely shuffled) then insertion sort will not be the best sort to "fix up" the list.
Input: an array that has already been sorted of size N, with roughly N/k inversions.
I might do something like this for an algorithm:
Calculate number of inversions. (O(N lg(lg(N))), or can assume is small and skip step)
If number of inversions is < [threshold], sort array using insertion sort (it will be fast).
Otherwise the array is not close to being sorted; resort to using your favorite comparison (or better) sorting algorithm
There are better ways to do this though; one can "fix up" such an array in at least O(log(N)*(# new elements)) time if you preprocess your array enough or use the right data-structure, like an array with linked-list properties or similar which supports binary search.
You can generalize this idea even further. Whether "fixing up" an array will work depends on the kind of fixing-up that is required. Thus if you update these statistics whenever you add an element to the list or modify it, you can dispatch onto a good "fix-it-up" algorithm.
But unfortunately this would all be a pain to code. You might just be able to get away with want is a priority queue.

What would be the time complexity of counting the number of all structurally different binary trees?

Using the method presented here: http://cslibrary.stanford.edu/110/BinaryTrees.html#java
12. countTrees() Solution (Java)
/**
For the key values 1...numKeys, how many structurally unique
binary search trees are possible that store those keys?
Strategy: consider that each value could be the root.
Recursively find the size of the left and right subtrees.
*/
public static int countTrees(int numKeys) {
if (numKeys <=1) {
return(1);
}
else {
// there will be one value at the root, with whatever remains
// on the left and right each forming their own subtrees.
// Iterate through all the values that could be the root...
int sum = 0;
int left, right, root;
for (root=1; root<=numKeys; root++) {
left = countTrees(root-1);
right = countTrees(numKeys - root);
// number of possible trees with this root == left*right
sum += left*right;
}
return(sum);
}
}
I have a sense that it might be n(n-1)(n-2)...1, i.e. n!
If using a memoizer, is the complexity O(n)?
The number of full binary trees with number of nodes n is the nth Catalan number. Catalan Numbers are calculated as
which is complexity O(n).
http://mathworld.wolfram.com/BinaryTree.html
http://en.wikipedia.org/wiki/Catalan_number#Applications_in_combinatorics
It's easy enough to count the number of calls to countTrees this algorithm uses for
a given node count. After a few trial runs, it looks to me like it requires 5*3^(n-2) calls for n >= 2, which grows much more slowly than n!. The proof of this assertion is left as an exercise for the reader. :-)
A memoized version required O(n) calls, as you suggested.
Incidentally, the number of binary trees with n nodes equals the n-th Catalan number.
The obvious approaches to calculating Cn all seem to be linear in n, so a memoized implementation of countTrees is probably the best one can do.
Not sure of how many hits to the look-up table is the memoized version going to make (which is definitely super-linear and will have the overheads of function calling) but with the mathematical proof yielding the result to be the same as nth Catalan number, one can quickly cook up a linear-time tabular method:
int C=1;
for (int i=1; i<=n; i++)
{
C = (2*(2*(i-1)+1)*C/((i-1)+2));
}
return C;
Note the difference between Memoization and Tabulation here

Resources