Balanced Parenthesis Order number - algorithm

Suppose if you consider the case of length-six strings, the order would be: “()()()”, “()(())”, “(())()”, “(()())”, “((()))”.
In the above example, if we see that the strings in which the first opening parenthesis is closed the earliest come first, and if that is the same for two strings, the rule is recursively applied to the next opening parenthesis.
If particular balanced parenthesis sequence is given how to find the order number? Suppose ()(())--> Output is 2....In O(n) where n is the length of balanced parenthesis i.e 3 in above case...The input can be around 100000 balanced parenthesis

First let g(n,k) be the number of length 2n + k strings there are with n pairs of balanced parentheses, which close k more parentheses. Can we calculate g(n,k)?
Let's try recursion. For that we first need a base case. It is clear that if there are no balanced parentheses, then we can only have one possibility - only closing parentheses. So g(0,k) = 1. There is our base case.
Next the recursive case. The first character is either an opening parenthesis, or a closing parenthesis. If it is an opening parenthesis, then there are g(n-1,k+1) ways to finish. If it is a closing parenthesis, then there are g(n,k-1) ways to finish. But we can't have a negative number of open
g(0,k) = 1
g(n,-1) = 0
g(n,k) = g(n-1, k+1)
This lets us calculate g but is not efficient - we are effectively going to list every possible string in the recursive calls. However there is a trick, memoize the results. Meaning that every time you call g(n, k) see if you've ever called it before, and if you have just return that answer. Otherwise you calculate the answer, cache it, and then return it. (For more on this trick, and alternate strategies, look up dynamic programming.)
OK, so now we can generate counts of something related, but how can we use this to get your answer?
Well note the following. Suppose that partway through your string you find an open parenthesis where there logically could be a close parenthesis instead. Suppose that at that point there are n pairs of parentheses needed and k open parentheses. Then there were g(n, k-1) possible strings that are the same as yours until then, then have a close parenthesis there (so they come before yours) and do whatever afterwards. So summing g(n, k-1) over all of the close parentheses gives you the number of strings before yours. Adding one to that gives you your position.

I got the answer from the Ruskey thesis. This algorithm specified about the Ranking & unranking of binary trees.
http://webhome.cs.uvic.ca/~ruskey/Publications/Thesis/ThesisPage16.png

Related

Check if string includes part of Fibonacci Sequence

Which way should I follow to create an algorithm to find out whether fibonacci sequence exists in a given string ?
The string includes only digits with no whitespaces and there may be more than one sequence, I need to find all of them.
If as your comment says the first number must have less than 6 digits, you can simply search for all positions there one of the 25 fibonacci numbers (there are only 25 with less than 6 digits) and than try to expand this 1 number sequence in both directions.
After your update:
You can even speed things up when you are only looking for sequences of at least 3 numbers.
Prebuild all 25 3-number-Strings that start with one of the 25 first fibonnaci-numbers this should give much less matches than the search for the single fibonacci-numbers I suggested above.
Than search for them (like described above and try to expand the found 3-number-sequences).
here's how I would approach this.
The main algorithm could search for triplets then try to extend them to as long a sequence as possible.
This leaves us with the subproblem of finding triplets. So if you are scanning through a string to look for fibonacci numbers, one thing you can take advantage of is that the next number must have the same number of digits or one more digit.
e.g. if you have the string "987159725844" and are considering "[987]159725844" then the next thing you need to look at is "987[159]725844" and "987[1597]25844". Then the next part you would find is "[2584]4" or "[25844]".
Once you have the 3 numbers you can check if they form an arithmetic progression with C - B == B - A. If they do you can now check if they are from the fibonacci sequence by seeing if the ratio is roughly 1.6 and then running the fibonacci iteration backwards down to the initial conditions 1,1.
The overall algorithm would then work by scanning through looking for all triples starting with width 1, then width 2, width 3 up to 6.
I'd say you should first find all interesting Fibonacci items (which, having 6 or less digits, are no more than 30) and store them into an array.
Then, loop every position in your input string, and try to find upon there the longest possible Fibonacci number (that is, you must browse the array backwards).
If some Fib number is found, then you must bifurcate to a secondary algorithm, consisting of merely going through the array from current position to the end, trying to match every item in the following substring. When the matching ends, you must get back to the main algorithm to keep searching in the input string from the current position.
None of these two algorithms is recursive, nor too expensive.
update
Ok. If no tables are allowed, you could still use this approach replacing in the first loop the way to get the bext Fibo number: Instead of indexing, apply your formula.

Other Ways of Verifying Balanced Parenthesis?

A Classical example of how stacks are quite important is in the problem of verifying whether a string of parenthesis is balanced or not. You start with an empty stack and you keep pushing/popping elements in the stack, at the end, you check if your stack is empty, and if so return that the string is indeed balanced.
However, I am looking for other less efficient approaches to solve this problem. I want to show my students the usefulness of the stack data structure by first coming up with an exponential/non linear algorithm that solves the problem, then introduce the stack solution. Is anyone familiar with other methods other than the stack based approach?
find the last opening-parenthesizes, and look whether it closes, and whether there is no other type of parenthesis after it.
If it does, repeat the process until the string is empty.
If the string is not empty in the end of the process, or you find a different kind of parenthesis - it means it is not balanced.
example:
([[{}]])
the last opening is {, so look for }, after you find it- delete it from the string and continue with:
([[]])
etc.
if the string looks like that:
([[{]}])
so after you find the last open ({) - you see there is parenthesis from a different kind (]) before the closing parenthesis - so it is not balanced.
worst case complexity: O(n^2)
I assume that, for pedagogical purposes, it would be best to show a simple algorithm that they might actually have come up with themselves? If so, then I think a very intuitive algorithm is to just remove occurrences of () until there aren't any more to remove:
boolean isBalancedParens(String s) {
while (s.contains("()")) {
s = s.replace("()", "");
}
return s.isEmpty();
}
Under reasonable assumptions about the performance of the various methods called, this takes worst-case O(n2) time and O(n) extra space.
This problem raises a number of interesting questions in algorithm analysis which are possibly at too high a level for your class, but were fun to think about. I sketch the worst-case and expected runtimes for all the algorithms, which are somewhere between log-linear and quadratic.
The only exponential time algorithm I could think of was the equivalent of Bogosort: generate all possible balanced strings until you find one which matches. That seemed to weird even for a class exercise. Even weirder would be the modified Bogocheck, which only generates all ()-balanced strings and uses some cleverness to figure out which actual parenthesis to use in the comparison. (If you're interested, I could expand on this possibility.)
In most of the algorithms presented here, I use a procedure called "scan maintaining paren depth". This procedure examines characters one at a time in the order specified (forwards or backwards) maintaining a total count of observed open parentheses (of all types) less observed close parentheses (again, of all types). When scanning backwards, the meaning of "open" and "close" are reversed. If the count ever becomes negative, the string is not balanced and the entire procedure can immediately return failure.
Here are two algorithms which use constant space, both of which are worst-case quadratic in string length.
Algorithm 1: Find matching paren
Scan left-to-right. For each close encountered, scan backwards starting with the close maintaining paren depth. When the paren depth reaches zero, compare the character which caused the depth to reach 0 with the close which started the backwards scan; if they don't match, immediately fail. Also fail if the backwards scan hits the beginning of the string without the paren depth reaching zero.
If the end of the string is reached without failure being detected, the string is balanced.
Algorithm 2: Depthwise scan
Set depth to 1.
LOOP: Scan left-to-right from the first character, maintaining paren depth. If an open is encountered and the paren depth is incremented to depth, remember the open. If the paren depth is depth and a close is encountered, check to see if it matches the remembered open; if it does not, fail immediately.
If the end of the string is reached before any open is remembered, report success. If the end of the string is reached and the last remembered open was never matched by a close, report failure. Otherwise, increment depth and repeat the LOOP.
Both of the above have worst case (quadratic) performance on a completely nested string ((…()…)). However, the average time complexity is trickier to compute.
Each loop in Algorithm 2 takes precisely &Theta(N) time. If the total paren depth of the string is not 0 or there is any point in the string where the cumulative paren depth is negative, then failure will be reported in the first scan, taking linear time. That accounts for the vast majority of strings if the inputs are randomly selected from among all strings containing parenthesis characters. Of the strings which are not trivially rejected -- that is, the strings which would match if all opens were replaced with ( and all closes with ), including strings which are correctly balanced -- the expected number of scans is the expected maximum parenthesis depth of the string, which is Θ(log N) (proving this is an interesting exercise, but I think it's not too difficult), so the total expected time is Θ(N log N).
Algorithm 1 is rather more difficult to analyse in the general case, but for completely random strings it seems safe to guess that the first mismatch will be found in expected linear time. I don't have a proof for this, though. If the string is actually balanced, success will be reported at the termination of the scan, and the work performed is the sum of the span lengths of each pair of balanced parentheses. I believe this is approximately Θ(N log N), but I'd like to do some analysis before committing to this fact.
Here is an algorithm which is guaranteed to be O(N log N) for any input, but which requires Θ(N) additional space:
Algorithm 3: Sort matching pairs
Create an auxiliary vector of length N, whose ith element is the 2-tuple consisting of the cumulative paren depth of the character at position i, and the index i itself. The paren depth of an open is defined as the paren depth just before the open is counted, and the paren depth of a close is the paren depth just after the close is counted; the consequence is that matching open and close have the same paren depth.
Now sort the auxiliary vector in ascending order using lexicographic comparison of the tuples. Any O(N log N) sorting algorithm can be used; note that a stable sort is not necessary because all the tuples are distinct. [Note 1].
Finally iterate over the sorted vector, selecting two elements at a time. Reject the string if the two elements do not have the same depth, or are not a matching pair of open and close (using the index in the tuple to look up the character in the original string).
If the entire sorted vector can be scanned without failure, then the string was balanced.
Finally, a regex-based solution, because everyone loves regexes. :) This algorithm destroys the input string (unless a copy is made), but requires only constant additional storage.
Algorithm 4: Regex to the rescue!
Do the following search and replace until the search fails to find anything: (I wrote it for sed using Posix BREs, but in case that's too obscure, the pattern consists precisely of an alternation of each possible matched open-close pair.)
s/()\|[]\|{}//g
When the above loop terminates, if the string is not empty then it was not originally balanced; if it is empty, it was.
Note the g, which means that the search-and-replace is performed across the entire string on each pass. Each pass will take time proportional to the remaining length of the string at the beginning of the pass, but for simplicity we can say that the cost of a pass is O(N). The number of passes performed is the maximum paren depth of the string, which is Θ(N) in the worst case, but has an expected value of Θ(log N). So in the worst case, the execution time is Θ(N2) but the expected time is Θ(N log N).
Notes
An O(N) stable counting sort on the paren depth is possible. In that case, the total algorithm would be O(N) instead of O(N log N), but that wasn't what you wanted, right? You could also use a stable sort just on the paren depth, in which case you could replace the second element of the tuple with the character itself. That would still be O(N log N), if the sort was O(N log N).
If your students are already familiar with recursion, here's a simple idea: look at the first parenthesis, find all matching closing parentheses, and for each of these pairs, recurse with the substring inside them and the substring after them; e.g.:
input: "{(){[]}()}[]"
option 1: ^ ^
recurse with: "(){[]" and "()}[]"
"{(){[]}()}[]"
option 2: ^ ^
recurse with: "(){[]}()" and "[]"
If the input is an empty string, return true. If the input starts with a closing parenthesis, or if the input does not contain a closing parenthesis matching the first parenthesis, return false.
function balanced(input) {
var opening = "{([", closing = "})]";
if (input.length == 0)
return true;
var type = opening.indexOf(input.charAt(0));
if (type == -1)
return false;
for (var pos = 1; pos < input.length; pos++) { // forward search
if (closing.indexOf(input.charAt(pos)) == type) {
var inside = input.slice(1, pos);
var after = input.slice(pos + 1);
if (balanced(inside) && balanced(after))
return true;
}
}
return false;
}
document.write(balanced("{(({[][]}()[{}])({[[]]}()[{}]))}"));
Using forward search is better for concatenations of short balanced substrings; using backward search is better for deeply nested strings. But the worst case for both is O(n2).

Maximize evaluation of expression with one parenthesis insertion

I encountered this problem in a programming contest:
Given expression x1 op x2 op x3 op . . . op xn, where op is either addition '+' or multiplication '*' and xi are digits between 1 to 9. The goal is to insert just one set of parenthesis within the expression such that it maximizes the result of the expression.
The n is maximum 2500.
Eg.:
Input:
3+5*7+8*4
Output:
303
Explanation:
3+5*(7+8)*4
There was another constraint given in the problem that at max only 15 '*' sign will be present. This simplified the problem. As we will have just 17 options of brackets insertion and brute force would work in O(17*n).
I have been thinking if this constraint was not present, then can I theoretically solve the problem in O(n^2)? It seemed to me a DP problem. I am saying theoretically because the answers will be quite big (9^2500 possible). So if I ignore the time complexity of working with big numbers then is O(n^2) possible?
If there is no multiplication, you are finished.
If there is no addition, you are finished.
The leading and trailing operation of subterms that have to be evaluated always are additions, because parenthesis around a multiplication does not alter the outcome.
If you have subterms with only additions, you do not need to evaluate subparts of them. Multiplication of the full subterm will always be bigger. (Since we only have positiv numbers/digits.)
Travers the term once, trying to place the opening parenthesis after (worst case) each * that is succeeded with a +, and within that loop a second time, trying to place the closing parenthesis before (worst case) each succeeding * that immediately follows an +.
You can solve the problem in O(ma/2), with m: number of multiplications and a: number of additions. This is smaller than n^2.
Possible places for parenthesis shown with ^:
1*2*^3+4+5^*6*^7+8^

Make palindrome from given word

I have given word like abca. I want to know how many letters do I need to add to make it palindrome.
In this case its 1, because if I add b, I get abcba.
First, let's consider an inefficient recursive solution:
Suppose the string is of the form aSb, where a and b are letters and S is a substring.
If a==b, then f(aSb) = f(S).
If a!=b, then you need to add a letter: either add an a at the end, or add a b in the front. We need to try both and see which is better. So in this case, f(aSb) = 1 + min(f(aS), f(Sb)).
This can be implemented with a recursive function which will take exponential time to run.
To improve performance, note that this function will only be called with substrings of the original string. There are only O(n^2) such substrings. So by memoizing the results of this function, we reduce the time taken to O(n^2), at the cost of O(n^2) space.
The basic algorithm would look like this:
Iterate over the half the string and check if a character exists at the appropriate position at the other end (i.e., if you have abca then the first character is an a and the string also ends with a).
If they match, then proceed to the next character.
If they don't match, then note that a character needs to be added.
Note that you can only move backwords from the end when the characters match. For example, if the string is abcdeffeda then the outer characters match. We then need to consider bcdeffed. The outer characters don't match so a b needs to be added. But we don't want to continue with cdeffe (i.e., removing/ignoring both outer characters), we simply remove b and continue with looking at cdeffed. Similarly for c and this means our algorithm returns 2 string modifications and not more.

Algorithm to find lenth of longest sequence of blanks in a given string

Looking for an algorithm to find the length of longest sequence of blanks in a given string examining as few characters as possible?
Hint : Your program should become faster as the length of the sequence of blanks increases.
I know the solution which is O(n).. But looking for more optimal solution
You won't be able to find a solution which is a smaller complexity than O(n) because you need to pass through every character in the worst case with an input string that has at most 0 or 1 consecutive whitespace, or is completely whitespace.
You can do some optimizations though, but it'll still be considered O(n).
For example:
Let M be the current longest match so far as you go through your list. Also assume you can access input elements in O(1), for example you have an array as input.
When you see a non-whitespace you can skip M elements if the current + M is non whitespace. Surely no whitespace longer than M can be contained inside.
And when you see a whitepsace character, if current + M-1 is not whitespace you know you don't have the longest runs o you can skip in that case as well.
But in the worst case (when all characters are blank) you have to examine every character. So it can't be better than O(n) in complexity.
Rationale: assume the whole string is blank, you haven't examined N characters and your algorithms outputs n. Then if any non-examined character is not blank, your answer would be wrong. So for this particular input you have to examine the whole string.
There's no way to make it faster than O(N) in the worst case. However, here are a few optimizations, assuming 0-based indexing.
If you already have a complete sequence of L blanks (by complete I mean a sequence that is not a subsequence of a larger sequence), and L is at least as large as half the size of your string, you can stop.
If you have a complete sequence of L blanks, once you hit a space at position i check if the character at position i + L is also a space. If it is, continue scanning from position i forwards as you might find a larger sequence - however, if you encounter a non-space until position i + L, then you can skip directly to i + L + 1. If it isn't a space, there's no way you can build a larger sequence starting at i, so scan forwards starting from i + L + 1.
If you have a complete sequence of blanks of length L, and you are at position i and you have k positions left to examine, and k <= L, you can stop your search, as obviously there's no way you'll be able to find anything better anymore.
To prove that you can't make it faster than O(N), consider a string that contains no spaces. You will have to access each character once, so it's O(N). Same with a string that contains nothing but spaces.
The obvious idea: you can jump by K+1 places (where K is the current longest space sequence) and scan back if you found a space.
This way you have something about (n + n/M)/2 = n(M+1)/2M positions checked.
Edit:
Another idea would be to apply a kind of binary search. This is like follows: for a given k you make a procedure that checks whether there is a sequence of spaces with length >= k. This can be achieved in O(n/k) steps. Then, you try to find the maximal k with binary search.
Edit:
During the consequent searches, you can utilize the knowledge that the sequence of some length k already exist, and start skipping at k from the very beginning.
What ever you do, the worst case will always be o(n) - if those blanks are on the last part of the string... (or the last "checked" part of the string).

Resources