Algorithm to locate unbalanced parentheses in a string - algorithm

PostScript/PDF string literals are surrounded by parentheses, and are allowed to contain unescaped parentheses as long as the parentheses are fully balanced. So for instance
( () ) % valid string constant
( ( ) % invalid string constant, the inner ( should be escaped
I know an algorithm to tell me if there are any unbalanced parentheses in a string; what I'm looking for is an algorithm that will locate a minimal set of parentheses that are unbalanced, so that I can then stick backslashes in front of them to make the whole a valid string literal. More examples:
( ⟶ \(
() ⟶ ()
(() ⟶ \(() or (\()
()) ⟶ ()\) or (\))
()( ⟶ ()\(

A modification of the standard stack based algorithm to detect imbalanced parenthesis should work for you. Here's some pseudo code:
void find_unbalaned_indices(string input)
{
// initialize 'stack' containing of ints representing index at
// which a lparen ( was seen
stack<int index> = NIL
for (i=0 to input.size())
{
// Lparen. push into the stack
if (input[i] == '(')
{
// saw ( at index=i
stack.push(i);
}
else if (input[i] == ')')
{
out = stack.pop();
if (out == NIL)
{
// stack was empty. Imbalanced RParen.
// index=i needs to be escaped
...
}
// otherwise, this rparen has a balanced lparen.
// nothing to do.
}
}
// check if we have any imbalanced lparens
while (stack.size() != 0)
{
out = stack.pop();
// out is imbalanced
// index = out.index needs to be escaped.
}
}
Hope this helps.

def escape(s):
return ''.join(r(')(', r('()', s)))
def r(parens, chars):
return reversed(list(escape_oneway(parens, chars)))
def escape_oneway(parens, chars):
"""Given a sequence of characters (possibly already escaped),
escape those close-parens without a matching open-paren."""
depth = 0
for x in chars:
if x == parens[0]:
depth += 1
if x == parens[1]:
if depth == 0:
yield '\\' + x
continue
else:
depth -= 1
yield x

Related

Find the minimum number of edits to balance parentheses?

I was very confused about this question. I know about finding the edit distance between 2 strings using recursion and dynamic programming as an improvement, however am confused about how to go with this one.
Not sure if my thinking is correct. But we have a string of parenthesis which is unbalanced say
String s = "((())))";
How to find the String with balanced Parenthesis which requires minimum number of edits ?
Can some one explain this with an example ?
I am still not sure if I am explaining it correctly.
Given a string consisting of left and right parentheses, we are asked to balance it by performing a minimal number of delete, insert, and replace operations.
To begin with, let's look at the input string and distinguish matched pairs from unmatched characters. We can mark all the characters belonging to matched pairs by executing the following algorithm:
Find an unmarked '(' that is followed by an unmarked ')', with zero or more marked characters between the two.
If there is no such pair of characters, terminate the algorithm.
Otherwise, mark the '(' and the ')'.
Return to step 1.
The marked pairs are already balanced at zero cost, so the optimal course of action is to do nothing further with them.
Now let's consider the unmarked characters. Notice that no unmarked '(' is followed by an unmarked ')', or else the pair would have been marked. Therefore, if we scan the unmarked characters from left to right, we will find zero or more ')' characters followed by zero or more '(' characters.
To balance the sequence of ')' characters, it is optimal to rewrite every other one to '(', starting with the first one and excluding the last one. If there is an odd number of ')' characters, it is optimal to delete the last one.
As for the sequence of '(' characters, it is optimal to rewrite every other one to ')', starting with the second one. If there is a leftover '(' character, we delete it.
The following Python code implements the steps described above and displays the intermediate results.
def balance(s): # s is a string of '(' and ')' characters in any order
n = len(s)
print('original string: %s' % s)
# Mark all matched pairs
marked = n * [ False ]
left_parentheses = []
for i, ch in enumerate(s):
if ch == '(':
left_parentheses.append(i)
else:
if len(left_parentheses) != 0:
marked[i] = True
marked[left_parentheses.pop()] = True
# Display the matched pairs and unmatched characters.
matched, remaining = [], []
for i, ch in enumerate(s):
if marked[i]:
matched.append(ch)
remaining.append(' ')
else:
matched.append(' ')
remaining.append(ch)
print(' matched pairs: %s' % ''.join(matched))
print(' unmatched: %s' % ''.join(remaining))
cost = 0
deleted = n * [ False ]
new_chars = list(s)
# Balance the unmatched ')' characters.
right_count, last_right = 0, -1
for i, ch in enumerate(s):
if not marked[i] and ch == ')':
right_count += 1
if right_count % 2 == 1:
new_chars[i] = '('
cost += 1
last_right = i
if right_count % 2 == 1: # Delete the last ')' if we couldn't match it.
deleted[last_right] = True # The cost was incremented during replacement.
# Balance the unmatched '(' characters.
left_count, last_left = 0, -1
for i, ch in enumerate(s):
if not marked[i] and ch == '(':
left_count += 1
if left_count % 2 == 0:
new_chars[i] = ')'
cost += 1
else:
last_left = i
if left_count % 2 == 1: # Delete the last '(' if we couldn't match it.
deleted[last_left] = True # This character wasn't replaced, so we must
cost += 1 # increment the cost now.
# Display the outcome of replacing and deleting.
balanced = []
for i, ch in enumerate(new_chars):
if marked[i] or deleted[i]:
balanced.append(' ')
else:
balanced.append(ch)
print(' balance: %s' % ''.join(balanced))
# Display the cost of balancing and the overall balanced string.
print(' cost: %d' % cost)
result = []
for i, ch in enumerate(new_chars):
if not deleted[i]: # Skip deleted characters.
result.append(ch)
print(' new string: %s' % ''.join(result))
balance(')()(()())))()((())((')
For the test case ')()(()())))()((())((', the output is as follows.
original string: )()(()())))()((())((
matched pairs: ()(()()) () (())
unmatched: ) )) ( ((
balance: ( ) ( )
cost: 4
new string: (()(()()))()((()))
The idea is simple:
Find final string having left over open and close brackets which couldn't make pair. Remember that in this final string, close brackets will be present 1st and then open brackets.
Now we will have to edit open brackets and close brackets separately.
eg: for close brackets:
(1) if it is of even length:
min edit to balance will be to change half close brackets to open brackets.
So minEdit = closeBracketCount/2 .
(2) If it is of odd length:
min edit to balance will be to do above step 1 and remove the remaining 1 bracket.
So minEdit = closeBracketCount/2 + 1
For open brackets:
(1) if it is of even length:
min edit to balance will be to change half open brackets to close brackets.
So minEdit = openBracketCount/2.
(2) If it is of odd length:
min edit to balance will be to do above step 1 and remove the remaining 1 bracket.
So minEdit = openBracketCount/2 + 1
Here is the running code: http://codeshare.io/bX1Dt
Let me know your thoughts.
While this interesting problem can be solved with dynamic programming as mentioned in the comments, there exists an easier solution to it. You can solve it with the greedy algorithm.
Idea for this greedy algorithm comes from how we check the validity of parentheses expression. You set counter to 0 and traverse the parentheses string, add 1 at "(" and substract 1 at ")". If counter always stays above or at 0 and finishes at 0, you have a valid string.
This implies that if the lowest value that we encountered while traversing is -maxi, we need to add exactly -maxi "(" at the start. Adjust final counter value for added "(" and add enough ")" at the end to finish at 0.
Here is the pseudo-code for the algorithm:
counter = 0
mini = 0
for each p in string:
if p == "(":
counter++
else:
counter--
mini = min(counter, mini)
add -mini "(" at the start of the string
counter -= mini
add counter ")" at the end of the string
I tired to solve the problem with DP algorithm and it passed a few test cases made up by myself. Let me know if you think it's correct.
Let P(i,j) be the minimum number of edits to make string S[i..j] balanced.
When S[i] equals S[j], the number of minimum edits is obviously P(i+1,j-1)
There are a few options to make the string balanced when S[i] != S[j], but in the end we could either add '(' to the front of i or ')' at the end of j, or remove the parenthesis at i or j. In all these cases, the minimum number of edits is min{P(i+1, j), P(i, j-1)} + 1.
We therefore have below DP formula:
P(i,j) = 0 if i > j
= P(i + 1, j - 1) if S[i] matches S[j] OR S[i] and S[j] are not parenthesis
= min{P(i + 1, j), P(i, j - 1)} + 1
I would use stack to balance them efficiently. Here is python code:
a=['(((((','a(b)c)','((())))',')()(()())))()((())((']
def balance(s):
st=[]
l=len(s)
i=0
while i<l:
if s[i]=='(':
st.append(i)
elif s[i]==')':
if st:
st.pop()
else:
del s[i]
i-=1
l-=1
i+=1
while st:
del s[st.pop()]
return ''.join(s)
for i in a:
print balance(list(i))
Output:
Empty
a(b)c
((()))
()(()())()(())
//fisher
public int minInsertions(String s) {
Stack<Character> stack = new Stack<>();
int insertionsNeeded = 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '(') {
if (stack.isEmpty()) {
stack.add(c);
} else {
if (stack.peek() == ')') {
//in this case, we need to add one more ')' to get two consecutive right paren, then we could pop the one ')' and one '(' off the stack
insertionsNeeded++;
stack.pop();
stack.pop();
stack.add(c);
} else {
stack.add(c);
}
}
} else if (c == ')') {
if (stack.isEmpty()) {
//in this case, we need to add one '(' before we add this ')' onto this stack
insertionsNeeded++;
stack.add('(');
stack.add(c);
} else {
if (stack.peek() == ')') {
//in this case, we could pop the one ')' and one '(' off the stack
stack.pop();
stack.pop();
} else {
stack.add(c);
}
}
}
}
if (stack.isEmpty()) {
return insertionsNeeded;
} else {
while (!stack.isEmpty()) {
char pop = stack.pop();
if (pop == '(') {
insertionsNeeded += 2;
} else {
insertionsNeeded++;
stack.pop();
}
}
return insertionsNeeded;
}
}
}

Algorithm to check matching parenthesis

This relates to the Coursera Scala course so I want to directly ask you NOT to give me the answer to the problem, but rather to help me debug why something is happening, as a direct answer would violate the Coursera honor code.
I have the following code:
def balance(chars: List[Char]): Boolean = {
val x = 0
def loop(list: List[Char]): Boolean = {
println(list)
if (list.isEmpty) if(x == 0) true
else if (list.head == '(') pushToStack(list.tail)
else if (list.head == ')') if(x <= 0) false else decreaseStack(list.tail)
else loop(list.tail)
true
}
def pushToStack(myList: List[Char]) { x + 1; loop(myList)}
def decreaseStack(myList: List[Char]) { x - 1; loop(myList)}
loop(chars)
}
A simple explanation:
If the code sees a "(" then it adds 1 to a variable. If it sees a ")" then it first checks whether the variable is equal to or smaller than 0. If this is the case, it returns false. If the value is bigger than 0 then it simply decreases one from the variable.
I have tried running the following:
if(balance("This is surely bad :-( ) (".toList)) println("balanced") else println("not balanced");
Clearly this is not balanced, but my code is returning balanced.
Again: I am not asking for help in writing this program, but rather help in explained why the code is returning "balanced" when clearly the string is not balanced
--EDIT--
def balance(chars: List[Char]): Boolean = {
val temp = 0;
def loop(list: List[Char], number: Int): Boolean = {
println(list)
if (list.isEmpty) if(number == 0) true
else if (list.head == '(') loop(list.tail, number + 1)
else if (list.head == ')') if(number <= 0) false else loop(list.tail, number - 1)
else loop(list.tail,number)
true
}
loop(chars,0)
}
^^ Still prints out balanced
You are using an immutable x when you really want a mutable x.
Here, let me rewrite it for you in a tail recursive style to show you what you're actually doing:
#tailrec def loop(myList: List[Char], cur: Int = 0): Boolean = myList match{
case "(" :: xs =>
val tempINeverUse = cur+1
loop(xs, cur) //I passed in 0 without ever changing "cur"!
case ")" :: xs if cur < 0 => false //This is a bug, regardless if you fix the rest of it
case ")" :: xs =>
val tempINeverUse = cur-1
loop(xs, cur) //Passed in 0 again!
case x :: xs => loop(xs, cur)
case Nil => cur == 0 //Since I've never changed it, it will be 0.
}
You need to keep a context of parenthesis in comments or in quotes as well. You can use a counter to achieve that. If the counter is set for a comment or a double quote then ignore any parenthesis that comes your way. Reset the counter whenever you find a finishing comment or double quote

Remove redundant parentheses from an arithmetic expression

This is an interview question, for which I did not find any satisfactory answers on stackoverflow or outside. Problem statement:
Given an arithmetic expression, remove redundant parentheses. E.g.
((a*b)+c) should become a*b+c
I can think of an obvious way of converting the infix expression to post fix and converting it back to infix - but is there a better way to do this?
A pair of parentheses is necessary if and only if they enclose an unparenthesized expression of the form X % X % ... % X where X are either parenthesized expressions or atoms, and % are binary operators, and if at least one of the operators % has lower precedence than an operator attached directly to the parenthesized expression on either side of it; or if it is the whole expression. So e.g. in
q * (a * b * c * d) + c
the surrounding operators are {+, *} and the lowest precedence operator inside the parentheses is *, so the parentheses are unnecessary. On the other hand, in
q * (a * b + c * d) + c
there is a lower precedence operator + inside the parentheses than the surrounding operator *, so they are necessary. However, in
z * q + (a * b + c * d) + c
the parentheses are not necessary because the outer * is not attached to the parenthesized expression.
Why this is true is that if all the operators inside an expression (X % X % ... % X) have higher priority than a surrounding operator, then the inner operators are anyway calculated out first even if the parentheses are removed.
So, you can check any pair of matching parentheses directly for redundancy by this algorithm:
Let L be operator immediately left of the left parenthesis, or nil
Let R be operator immediately right of the right parenthesis, or nil
If L is nil and R is nil:
Redundant
Else:
Scan the unparenthesized operators between the parentheses
Let X be the lowest priority operator
If X has lower priority than L or R:
Not redundant
Else:
Redundant
You can iterate this, removing redundant pairs until all remaining pairs are non-redundant.
Example:
((a * b) + c * (e + f))
(Processing pairs from left to right):
((a * b) + c * (e + f)) L = nil R = nil --> Redundant
^ ^
(a * b) + c * (e + f) L = nil R = nil --> Redundant
^ ^ L = nil R = + X = * --> Redundant
a * b + c * (e + f) L = * R = nil X = + --> Not redundant
^ ^
Final result:
a * b + c * (e + f)
I just figured out an answer:
the premises are:
1. the expression has been tokenized
2. no syntax error
3. there are only binary operators
input:
list of the tokens, for example:
(, (, a, *, b, ), +, c, )
output:
set of the redundant parentheses pairs (the orders of the pairs are not important),
for example,
0, 8
1, 5
please be aware of that : the set is not unique, for instance, ((a+b))*c, we can remove outer parentheses or inner one, but the final expression is unique
the data structure:
a stack, each item records information in each parenthese pair
the struct is:
left_pa: records the position of the left parenthese
min_op: records the operator in the parentheses with minimum priority
left_op: records current operator
the algorithm
1.push one empty item in the stack
2.scan the token list
2.1 if the token is operand, ignore
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthese
2.4 if the token is right parenthese,
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthese)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
example one
((a*b)+c)
after scanning to b, we have stack:
index left_pa min_op left_op
0
1 0
2 1 * * <-stack top
now we meet the first ')'(at pos 5), we pop the item
left_pa = 1
min_op = *
left_op = *
and pre-read operator '+', since min_op priority '*' > '+', so the pair(1,5) is redundant, so output it.
then scan till we meet last ')', at the moment, we have stack
index left_pa min_op left_op
0
1 0 + +
we pop this item(since we meet ')' at pos 8), and pre-read next operator, since there is no operator and at index 0, there is no left_op, so output the pair(0, 8)
example two
a*(b+c)
when we meet the ')', the stack is like:
index left_pa min_op left_op
0 * *
1 2 + +
now, we pop the item at index = 1, compare the min_op '+' with the left_op '*' at index 0, we can find out the '(',')' are necessary
This solutions works if the expression is a valid. We need mapping of the operators to priority values.
a. Traverse from two ends of the array to figure out matching parenthesis from both ends.
Let the indexes be i and j respectively.
b. Now traverse from i to j and find out the lowest precedence operator which is not contained inside any parentheses.
c. Compare the priority of this operator with the operators to left of open parenthesis and right of closing parenthesis. If no such operator exists, treat its priority as -1. If the priority of the operator is higher than these two, remove the parenthesis at i and j.
d. Continue the steps a to c until i<=j.
Push one empty item in the stack
Scan the token list
2.1 if the token is operand, ignore.
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority.
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthesis.
2.4 if the token is right parenthesis:
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthesis)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
The code below implements a straightforward solution. It is limited to +, -, *, and /, but it can be extended to handle other operators if needed.
#include <iostream>
#include <set>
#include <stack>
int loc;
std::string parser(std::string input, int _loc) {
std::set<char> support = {'+', '-', '*', '/'};
std::string expi;
std::set<char> op;
loc = _loc;
while (true) {
if (input[loc] == '(') {
expi += parser(input, loc + 1);
} else if (input[loc] == ')') {
if ((input[loc + 1] != '*') && (input[loc + 1] != '/')) {
return expi;
} else {
if ((op.find('+') == op.end()) && (op.find('-') == op.end())) {
return expi;
} else {
return '(' + expi + ')';
}
}
} else {
char temp = input[loc];
expi = expi + temp;
if (support.find(temp) != support.end()) {
op.insert(temp);
}
}
loc++;
if (loc >= input.size()) {
break;
}
}
return expi;
}
int main() {
std::string input("(((a)+((b*c)))+(d*(f*g)))");
std::cout << parser(input, 0);
return 0;
}
I coded it previously in https://calculation-test.211368e.repl.co/trim.html. This doesn't have some errors in other answers.
(6 / (-2454) ** (((234)))) + (-5435) --> 6 / (-2454) ** 234 + (-5435)
const format = expression => {
var change = [], result = expression.replace(/ /g, "").replace(/\*\*/g, "^"), _count;
function replace(index, string){result = result.slice(0, index) + string + result.slice(index + 1)}
function add(index, string){result = result.slice(0, index) + string + result.slice(index)}
for (var count = 0; count < result.length; count++){
if (result[count] == "-"){
if ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890)".includes(result[count - 1])){
change.push(count);
}else if (result[count - 1] != "("){
add(count, "(");
count++;
_count = count + 1;
while ("1234567890.".includes(result[_count])) _count++;
if (_count < result.length - 1){
add(_count, ")");
}else{
add(_count + 2, ")");
}
}
}
}
change = change.sort(function(a, b){return a - b});
const len = change.length;
for (var count = 0; count < len; count++){replace(change[0] + count * 2, " - "); change.shift()}
return result.replace(/\*/g, " * ").replace(/\^/g, " ** ").replace(/\//g, " / ").replace(/\+/g, " + ");
}
const trim = expression => {
var result = format(expression).replace(/ /g, "").replace(/\*\*/g, "^"), deleting = [];
const brackets = bracket_pairs(result);
function bracket_pairs(){
function findcbracket(str, pos){
const rExp = /\(|\)/g;
rExp.lastIndex = pos + 1;
var depth = 1;
while ((pos = rExp.exec(str))) if (!(depth += str[pos.index] == "(" ? 1 : -1 )) {return pos.index}
}
function occurences(searchStr, str){
var startIndex = 0, index, indices = [];
while ((index = str.indexOf(searchStr, startIndex)) > -1){
indices.push(index);
startIndex = index + 1;
}
return indices;
}
const obrackets = occurences("(", result);
var cbrackets = [];
for (var count = 0; count < obrackets.length; count++) cbrackets.push(findcbracket(result, obrackets[count]));
return obrackets.map((e, i) => [e, cbrackets[i]]);
}
function remove(deleting){
function _remove(index){result = result.slice(0, index) + result.slice(index + 1)}
const len = deleting.length;
var deleting = deleting.sort(function(a, b){return a - b});
for (var count = 0; count < len; count++){
_remove(deleting[0] - count);
deleting.shift()
}
}
function precedence(operator, position){
if (!"^/*-+".includes(operator)) return "^/*-+";
if (position == "l" || position == "w") return {"^": "^", "/": "^", "*": "^/*", "-": "^/*", "+": "^/*-+"}[operator];
if (position == "r") return {"^": "^", "/": "^/*", "*": "^/*", "-": "^/*-+", "+": "^/*-+"}[operator];
}
function strip_bracket(string){
var result = "", level = 0;
for (var count = 0; count < string.length; count++){
if (string.charAt(count) == "(") level++;
if (level == 0) result += string.charAt(count);
if (string.charAt(count) == ")") level--;
}
return result.replace(/\s{2,}/g, " ");
}
for (var count = 0; count < brackets.length; count++){
const pair = brackets[count];
if (result[pair[0] - 1] == "(" && result[pair[1] + 1] == ")"){
deleting.push(...pair);
}else{
const left = precedence(result[pair[0] - 1], "l"), right = precedence(result[pair[1] + 1], "r");
var contents = strip_bracket(result.slice(pair[0] + 1, pair[1])), within = "+";
for (var _count = 0; _count < contents.length; _count++) if (precedence(contents[_count], "w").length < precedence(within, "w").length) within = contents[_count];
if (/^[0-9]+$/g.test(contents) || contents == ""){
deleting.push(...pair);
continue;
}
if (left.includes(within) && right.includes(within)){
if (!isNaN(result.slice(pair[0] + 1, pair[1]))){
if (Number(result.slice(pair[0] + 1, pair[1])) >= 0 && !"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}else if (!"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}
}
}
remove(deleting);
result = format(result);
return result;
}
<input id="input">
<button onclick="document.getElementById('result').innerHTML = trim(document.getElementById('input').value)">Remove and format</button>
<div id="result"></div>
I think that you are looking for kind of algorithm as seen in the following photo.
This algorithm is "almost" ready, since a lot of bugs arise once the more complex it becomes, the more complicated it gets. The way I work on this thing, is 'build-and-write-code-on-the-fly', which means that for up to 4 parentheses, things are easy. But after the expression goes more complex, there are things that I cannot predict while writing down thoughts on paper. And there comes the compiler to tell me what to correct. It would not be a lie if I state that it is not me to have written the algorithm, but the (C#) compiler instead! So far, it took me 1400 lines. It is not that the commands were difficult to write. It was their arrangement that was a real puzzle. This program you are looking for, is characterized by a really high grade of complexity. Well, if you need any primary ideas, please let me know and I will reply. Thanx!
Algorithm

Algorithm to determine if a range contains a case-insensitive search phrase?

Let's say you have thousands of files organized in the following way: First you sort them by their filename (case sensitive, so that upper case files come before lower case), then you grouped them into folders that contain the name of the first and the last file in that folder. E.g., the folders may look like:
Abel -> Cain
Camel -> Sloth
Stork -> basket
basking -> sleuth
tiger -> zebra
Now, given a case-insensitive search string s, determine which folders that can contain a file that matches s. You cannot and do not have to look inside a folder - the file does not actually have to exist.
Some examples:
("Abel", "Cain") matches s = "blue", since it contains "Blue"
("Stork", "basket") matches s = "arctic", since it contains "arctic"
("FA", "Fb") matches s = "foo", since it contains "FOo"
("Fa", "Fb") does NOT match s = "foo"
Formally: Given a closed range [a,b] and a lower case string s, determine if there's any string c in [a,b] such that lower(c) = s.
My first hunch was to do a case-insensitive search against the bounds of the range. But it can be easily seen from the last example that this is not correct.
A bruce-force solution is to generate all potential file names. For example, the input string "abc" would produce the candidates "ABC", "ABc", "AbC", "Abc", "aBC", "aBc", "abC", "abc". Then you just test each against the bounds. An example of this brute-force solution will follow below. This is O(2^n) though.
My question is if there's an algorithm for this that is both fast and correct?
Brute-force solution in Clojure:
(defn range-contains
[first last string]
(and (<= (compare first string) 0)
(>= (compare last string) 0)))
(defn generate-cases
"Generates all lowercase/uppercase combinations of a word"
[string]
(if (empty? string)
[nil]
(for [head [(java.lang.Character/toUpperCase (first string))
(java.lang.Character/toLowerCase (first string))]
tail (generate-cases (rest string))]
(cons head tail))))
(defn range-contains-insensitive
[first last string]
(let [f (fn [acc candidate] (or acc (range-contains first last (apply str candidate))))]
(reduce f false (generate-cases string))))
(fact "Range overlapping case insensitive"
(range-contains-insensitive "A" "Z" "g") => true
(range-contains-insensitive "FA" "Fa" "foo") => true
(range-contains-insensitive "b" "z" "a") => false
(range-contains-insensitive "B" "z" "a") => true)
I think that instead of creating all the upper-lower case combinations, this can be solved by checking upper, then lower for each character separately, which changes 2^N into 2N.
The idea is the following:
keep "lowdone" and "highdone" flags, which indicate whether s can definitely come after the low limit while still potentially coming before the high limit, and vice versa
go character by character through the string
check if the uppercase version of the current letter can come after the corresponding low limit letter while at the same time coming before the high limit letter, then check the same for the lowercase version of the letter, if neither letter satisfies both conditions, return false (don't check low limit if "lowdone" is true, don't check high limit if "highdone" is true - when comparing ABC and ACA, once we are past the second letter, we don't care about the third letter)
if a case satisfies both conditions, check if it comes strictly after the low limit letter or the low limit is too short to have a corresponding letter, if so, lowdone = true
analogous for highdone = true
Does this sound good? Code in C# (could probably be written more concisely):
public Bracket(string l, string u)
{
Low = l;
High = u;
}
public bool IsMatch(string s)
{
string su = s.ToUpper();
string sl = s.ToLower();
bool lowdone = false;
bool highdone = false;
for (int i = 0; i < s.Length; i++)
{
char[] c = new char[]{su[i], sl[i]};
bool possible = false;
bool ld = lowdone;
bool hd = highdone;
for (int j = 0; j < 2; j++)
{
if ((lowdone || i >= Low.Length || c[j] >= Low[i]) && (highdone || i >= High.Length || c[j] <= High[i]))
{
if (i >= Low.Length || c[j] > Low[i])
ld = true;
if (i >= High.Length || c[j] < High[i])
hd = true;
possible = true;
}
}
lowdone = ld;
highdone = hd;
if (!possible)
return false;
}
if (!lowdone && Low.Length > s.Length)
return false;
return true;
}
}
In the spirit of full disclosure, I guess I should also add the algorithm I came up with (Java, uses Guava):
public static boolean inRange(String search, String first, String last) {
int len = search.length();
if (len == 0) {
return true;
}
char low = Strings.padEnd(first, len, (char) 0).charAt(0);
char high = Strings.padEnd(last, len, (char) 0).charAt(0);
char capital = Character.toLowerCase(search.charAt(0));
char small = Character.toUpperCase(search.charAt(0));
if (low == high) {
if (capital == low || small == low) {
// All letters equal - remove first letter and restart
return inRange(search.substring(1), first.substring(1), last.substring(1));
}
return false;
}
if (containsAny(Ranges.open(low, high), capital, small)) {
return true; // Definitely inside
}
if (!containsAny(Ranges.closed(low, high), capital, small)) {
return false; // Definitely outside
}
// Edge case - we are on a bound and the bounds are different
if (capital == low || small == low) {
return Ranges.atLeast(first.substring(1)).contains(search.substring(1).toLowerCase());
}
else {
return Ranges.lessThan(last.substring(1)).contains(search.substring(1).toUpperCase());
}
}
private static <T extends Comparable<T>> boolean containsAny(Range<T> range, T value1, T value2) {
return range.contains(value1) || range.contains(value2);
}

Distributed algorithm to compute the balance of the parentheses

This is an interview question: "How to build a distributed algorithm to compute the balance of the parentheses ?"
Usually he balance algorithm scans a string form left to right and uses a stack to make sure that the number of open parentheses always >= the number of close parentheses and finally the number of open parentheses == the number of close parentheses.
How would you make it distributed ?
You can break the string into chunks and process each separately, assuming you can read and send to the other machines in parallel. You need two numbers for each string.
The minimum nesting depth achieved relative to the start of the string.
The total gain or loss in nesting depth across the whole string.
With these values, you can compute the values for the concatenation of many chunks as follows:
minNest = 0
totGain = 0
for p in chunkResults
minNest = min(minNest, totGain + p.minNest)
totGain += p.totGain
return new ChunkResult(minNest, totGain)
The parentheses are matched if the final values of totGain and minNest are zero.
I would apply the map-reduce algorithm in which the map function would compute a part of the string return either an empty string if parentheses are balanced or a string with the last parenthesis remaining.
Then the reduce function would concatenate the result of two returned strings by map function and compute it again returning the same result than map. At the end of all computations, you'd either obtain an empty string or a string containing the un-balanced parenthesis.
I'll try to have a more detailed explain on #jonderry's answer. Code first, in Scala
def parBalance(chars: Array[Char], chunkSize: Int): Boolean = {
require(chunkSize > 0, "chunkSize must be greater than 0")
def traverse(from: Int, until: Int): (Int, Int) = {
var count = 0
var stack = 0
var nest = 0
for (n <- from until until) {
val cur = chars(c)
if (cur == '(') {
count += 1
stack += 1
}
else if (cur == ')') {
count -= 1
if (stack > 0) stack -= 1
else nest -= 1
}
}
(nest, count)
}
def reduce(from: Int, until: Int): (Int, Int) = {
val m = (until + from) / 2
if (until - from <= chunkSize) {
traverse(from, until)
} else {
parallel(reduce(from, m), reduce(m, until)) match {
case ((minNestL, totGainL), (minNestR, totGainR)) => {
((minNestL min (minNestR + totGainL)), (totGainL + totGainR))
}
}
}
}
reduce(0, chars.length) == (0,0)
}
Given a string, if we remove balanced parentheses, what's left will be in a form )))(((, give n for number of ) and m for number of (, then m >= 0, n <= 0(for easier calculation). Here n is minNest and m+n is totGain. To make a true balanced string, we need m+n == 0 && n == 0.
In a parallel operation, how to we derive those for node from it's left and right? For totGain we just needs to add them up. When calculating n for node, it can just be n(left) if n(right) not contribute or n(right) + left.totGain whichever is smaller.

Resources