Remove redundant parentheses from an arithmetic expression - algorithm

This is an interview question, for which I did not find any satisfactory answers on stackoverflow or outside. Problem statement:
Given an arithmetic expression, remove redundant parentheses. E.g.
((a*b)+c) should become a*b+c
I can think of an obvious way of converting the infix expression to post fix and converting it back to infix - but is there a better way to do this?

A pair of parentheses is necessary if and only if they enclose an unparenthesized expression of the form X % X % ... % X where X are either parenthesized expressions or atoms, and % are binary operators, and if at least one of the operators % has lower precedence than an operator attached directly to the parenthesized expression on either side of it; or if it is the whole expression. So e.g. in
q * (a * b * c * d) + c
the surrounding operators are {+, *} and the lowest precedence operator inside the parentheses is *, so the parentheses are unnecessary. On the other hand, in
q * (a * b + c * d) + c
there is a lower precedence operator + inside the parentheses than the surrounding operator *, so they are necessary. However, in
z * q + (a * b + c * d) + c
the parentheses are not necessary because the outer * is not attached to the parenthesized expression.
Why this is true is that if all the operators inside an expression (X % X % ... % X) have higher priority than a surrounding operator, then the inner operators are anyway calculated out first even if the parentheses are removed.
So, you can check any pair of matching parentheses directly for redundancy by this algorithm:
Let L be operator immediately left of the left parenthesis, or nil
Let R be operator immediately right of the right parenthesis, or nil
If L is nil and R is nil:
Redundant
Else:
Scan the unparenthesized operators between the parentheses
Let X be the lowest priority operator
If X has lower priority than L or R:
Not redundant
Else:
Redundant
You can iterate this, removing redundant pairs until all remaining pairs are non-redundant.
Example:
((a * b) + c * (e + f))
(Processing pairs from left to right):
((a * b) + c * (e + f)) L = nil R = nil --> Redundant
^ ^
(a * b) + c * (e + f) L = nil R = nil --> Redundant
^ ^ L = nil R = + X = * --> Redundant
a * b + c * (e + f) L = * R = nil X = + --> Not redundant
^ ^
Final result:
a * b + c * (e + f)

I just figured out an answer:
the premises are:
1. the expression has been tokenized
2. no syntax error
3. there are only binary operators
input:
list of the tokens, for example:
(, (, a, *, b, ), +, c, )
output:
set of the redundant parentheses pairs (the orders of the pairs are not important),
for example,
0, 8
1, 5
please be aware of that : the set is not unique, for instance, ((a+b))*c, we can remove outer parentheses or inner one, but the final expression is unique
the data structure:
a stack, each item records information in each parenthese pair
the struct is:
left_pa: records the position of the left parenthese
min_op: records the operator in the parentheses with minimum priority
left_op: records current operator
the algorithm
1.push one empty item in the stack
2.scan the token list
2.1 if the token is operand, ignore
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthese
2.4 if the token is right parenthese,
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthese)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
example one
((a*b)+c)
after scanning to b, we have stack:
index left_pa min_op left_op
0
1 0
2 1 * * <-stack top
now we meet the first ')'(at pos 5), we pop the item
left_pa = 1
min_op = *
left_op = *
and pre-read operator '+', since min_op priority '*' > '+', so the pair(1,5) is redundant, so output it.
then scan till we meet last ')', at the moment, we have stack
index left_pa min_op left_op
0
1 0 + +
we pop this item(since we meet ')' at pos 8), and pre-read next operator, since there is no operator and at index 0, there is no left_op, so output the pair(0, 8)
example two
a*(b+c)
when we meet the ')', the stack is like:
index left_pa min_op left_op
0 * *
1 2 + +
now, we pop the item at index = 1, compare the min_op '+' with the left_op '*' at index 0, we can find out the '(',')' are necessary

This solutions works if the expression is a valid. We need mapping of the operators to priority values.
a. Traverse from two ends of the array to figure out matching parenthesis from both ends.
Let the indexes be i and j respectively.
b. Now traverse from i to j and find out the lowest precedence operator which is not contained inside any parentheses.
c. Compare the priority of this operator with the operators to left of open parenthesis and right of closing parenthesis. If no such operator exists, treat its priority as -1. If the priority of the operator is higher than these two, remove the parenthesis at i and j.
d. Continue the steps a to c until i<=j.

Push one empty item in the stack
Scan the token list
2.1 if the token is operand, ignore.
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority.
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthesis.
2.4 if the token is right parenthesis:
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthesis)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples

The code below implements a straightforward solution. It is limited to +, -, *, and /, but it can be extended to handle other operators if needed.
#include <iostream>
#include <set>
#include <stack>
int loc;
std::string parser(std::string input, int _loc) {
std::set<char> support = {'+', '-', '*', '/'};
std::string expi;
std::set<char> op;
loc = _loc;
while (true) {
if (input[loc] == '(') {
expi += parser(input, loc + 1);
} else if (input[loc] == ')') {
if ((input[loc + 1] != '*') && (input[loc + 1] != '/')) {
return expi;
} else {
if ((op.find('+') == op.end()) && (op.find('-') == op.end())) {
return expi;
} else {
return '(' + expi + ')';
}
}
} else {
char temp = input[loc];
expi = expi + temp;
if (support.find(temp) != support.end()) {
op.insert(temp);
}
}
loc++;
if (loc >= input.size()) {
break;
}
}
return expi;
}
int main() {
std::string input("(((a)+((b*c)))+(d*(f*g)))");
std::cout << parser(input, 0);
return 0;
}

I coded it previously in https://calculation-test.211368e.repl.co/trim.html. This doesn't have some errors in other answers.
(6 / (-2454) ** (((234)))) + (-5435) --> 6 / (-2454) ** 234 + (-5435)
const format = expression => {
var change = [], result = expression.replace(/ /g, "").replace(/\*\*/g, "^"), _count;
function replace(index, string){result = result.slice(0, index) + string + result.slice(index + 1)}
function add(index, string){result = result.slice(0, index) + string + result.slice(index)}
for (var count = 0; count < result.length; count++){
if (result[count] == "-"){
if ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890)".includes(result[count - 1])){
change.push(count);
}else if (result[count - 1] != "("){
add(count, "(");
count++;
_count = count + 1;
while ("1234567890.".includes(result[_count])) _count++;
if (_count < result.length - 1){
add(_count, ")");
}else{
add(_count + 2, ")");
}
}
}
}
change = change.sort(function(a, b){return a - b});
const len = change.length;
for (var count = 0; count < len; count++){replace(change[0] + count * 2, " - "); change.shift()}
return result.replace(/\*/g, " * ").replace(/\^/g, " ** ").replace(/\//g, " / ").replace(/\+/g, " + ");
}
const trim = expression => {
var result = format(expression).replace(/ /g, "").replace(/\*\*/g, "^"), deleting = [];
const brackets = bracket_pairs(result);
function bracket_pairs(){
function findcbracket(str, pos){
const rExp = /\(|\)/g;
rExp.lastIndex = pos + 1;
var depth = 1;
while ((pos = rExp.exec(str))) if (!(depth += str[pos.index] == "(" ? 1 : -1 )) {return pos.index}
}
function occurences(searchStr, str){
var startIndex = 0, index, indices = [];
while ((index = str.indexOf(searchStr, startIndex)) > -1){
indices.push(index);
startIndex = index + 1;
}
return indices;
}
const obrackets = occurences("(", result);
var cbrackets = [];
for (var count = 0; count < obrackets.length; count++) cbrackets.push(findcbracket(result, obrackets[count]));
return obrackets.map((e, i) => [e, cbrackets[i]]);
}
function remove(deleting){
function _remove(index){result = result.slice(0, index) + result.slice(index + 1)}
const len = deleting.length;
var deleting = deleting.sort(function(a, b){return a - b});
for (var count = 0; count < len; count++){
_remove(deleting[0] - count);
deleting.shift()
}
}
function precedence(operator, position){
if (!"^/*-+".includes(operator)) return "^/*-+";
if (position == "l" || position == "w") return {"^": "^", "/": "^", "*": "^/*", "-": "^/*", "+": "^/*-+"}[operator];
if (position == "r") return {"^": "^", "/": "^/*", "*": "^/*", "-": "^/*-+", "+": "^/*-+"}[operator];
}
function strip_bracket(string){
var result = "", level = 0;
for (var count = 0; count < string.length; count++){
if (string.charAt(count) == "(") level++;
if (level == 0) result += string.charAt(count);
if (string.charAt(count) == ")") level--;
}
return result.replace(/\s{2,}/g, " ");
}
for (var count = 0; count < brackets.length; count++){
const pair = brackets[count];
if (result[pair[0] - 1] == "(" && result[pair[1] + 1] == ")"){
deleting.push(...pair);
}else{
const left = precedence(result[pair[0] - 1], "l"), right = precedence(result[pair[1] + 1], "r");
var contents = strip_bracket(result.slice(pair[0] + 1, pair[1])), within = "+";
for (var _count = 0; _count < contents.length; _count++) if (precedence(contents[_count], "w").length < precedence(within, "w").length) within = contents[_count];
if (/^[0-9]+$/g.test(contents) || contents == ""){
deleting.push(...pair);
continue;
}
if (left.includes(within) && right.includes(within)){
if (!isNaN(result.slice(pair[0] + 1, pair[1]))){
if (Number(result.slice(pair[0] + 1, pair[1])) >= 0 && !"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}else if (!"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}
}
}
remove(deleting);
result = format(result);
return result;
}
<input id="input">
<button onclick="document.getElementById('result').innerHTML = trim(document.getElementById('input').value)">Remove and format</button>
<div id="result"></div>

I think that you are looking for kind of algorithm as seen in the following photo.
This algorithm is "almost" ready, since a lot of bugs arise once the more complex it becomes, the more complicated it gets. The way I work on this thing, is 'build-and-write-code-on-the-fly', which means that for up to 4 parentheses, things are easy. But after the expression goes more complex, there are things that I cannot predict while writing down thoughts on paper. And there comes the compiler to tell me what to correct. It would not be a lie if I state that it is not me to have written the algorithm, but the (C#) compiler instead! So far, it took me 1400 lines. It is not that the commands were difficult to write. It was their arrangement that was a real puzzle. This program you are looking for, is characterized by a really high grade of complexity. Well, if you need any primary ideas, please let me know and I will reply. Thanx!
Algorithm

Related

Algorithm to get all the possible combinations of operations from a given numbers

I want to write a function that given a set of numbers, for example:
2, 3
It returns all the combinations of operations with +, -, *, and /.
The result for these two numbers would be:
2+3
2-3
2*3
2/3
For the numbers:
2, 3, 4
it would be:
(2+3)+4
(2+3)-4
(2+3)*4
(2+3)/4
(2-3)+4
(2-3)-4
(2-3)*4
(2-3)/4
...
2+(3+4)
2+(3*4)
2+(3-4)
2+(3/4)
...
3+(2+4)
3+(2*4)
3+(2-4)
3+(2/4)
...
and so on
The order of the operators doesn't matter, the point is to obtain all the results from all the possible combinations of operations.
I would tackle this by using Reverse Polish Notation, where you can just append operators and operands to a string while being considerate to a few simple rules.
For example, the expression 2 + (3 * 4) would be 2 3 4 * + in Reverse Polish Notation. On the other hand, (2 + 3) * 4 would be 2 3 + 4 *.
If we already have a partial expression, we can either add an operand or an operator.
Adding an operand can always be done and will increase the size of the stack by 1. On the other hand, adding an operator decreases the size of the stack by 1 (remove the two top-most operands and add the result) and can therefore only be done if the stack has at least two entries. At the end, to form a valid expression, the stack size has to be exactly 1.
This motivates a recursive function with the following interface:
getSubexpressions(remainingOperands, currentStackSize)
The function returns a list of subexpressions that can be appended to a partial expression with stack size currentStackSize and using the operands remainingOperands.
The base case of this recursive function is when there are no more remaining operands and the stack size is 1:
if remainingOperands = ∅ and currentStackSize = 1
return { "" }
In this case, we can only add the empty string to the expression.
In all other cases, we need to gather a set of subexpressions
subexpressions = { } // initialize an empty set
If we can add an operator, we can simply append it:
if currentStackSize >= 2
for each possible operator o
subexpressions.add(o + getSubexpressions(remainingOperands, currentStackSize - 1))
The notation o + getSubexpressions(remainingOperands, currentStackSize - 1) is shorthand for concatenating the operand o with all subexpressions returned from the call to getSubexpressions().
We are almost there. The last remaining bit is to add potential operands:
for each o in remainingOperands
subexpressions.add(o + getSubexpressions(remainingOperands \ { o }, currentStackSize + 1))
The notation remainingOperands \ { o } stands for set difference, i.e., the set of remaining operands without o.
That's it. In full:
getSubexpressions(remainingOperands, currentStackSize)
if remainingOperands = ∅ and currentStackSize = 1
return { "" }
subexpressions = { } // initialize an empty set
if currentStackSize >= 2
for each possible operator o
subexpressions.add(o + getSubexpressions(remainingOperands, currentStackSize - 1))
for each o in remainingOperands
subexpressions.add(o + getSubexpressions(remainingOperands \ { o }, currentStackSize + 1))
return subexpressions
This recursive call will usually have overlapping subcalls. Therefore, you can use memoization to cache intermediate results instead of re-calculating them over and over.
Here is a proof-of-concept implementation without memoization in C#. Expecially the operand management can be designed more efficiently with more appropriate data structures:
static void Main(string[] args)
{
foreach (var expr in GetSubexpressions(new List<string> { "1", "2", "3" }, 0, new StringBuilder()))
{
Console.WriteLine(expr);
}
}
static char[] operators = { '+', '-', '*', '/' };
static IEnumerable<StringBuilder> GetSubexpressions(IList<string> remainingOperands, int currentStackSize, StringBuilder sb)
{
if (remainingOperands.Count() == 0 && currentStackSize == 1)
{
yield return sb;
yield break;
}
if(currentStackSize >= 2)
{
foreach (var o in operators)
{
sb.Append(o);
foreach (var expr in GetSubexpressions(remainingOperands, currentStackSize - 1, sb))
yield return expr;
sb.Remove(sb.Length - 1, 1);
}
}
for (int i = 0; i < remainingOperands.Count; ++i)
{
var operand = remainingOperands[i];
remainingOperands.RemoveAt(i);
sb.Append(operand);
foreach (var expr in GetSubexpressions(remainingOperands, currentStackSize + 1, sb))
yield return expr;
sb.Remove(sb.Length - operand.Length, operand.Length);
remainingOperands.Insert(i, operand);
}
}
The program prints the following output:
12+3+
12-3+
12*3+
12/3+
12+3-
12-3-
12*3-
12/3-
12+3*
12-3*
12*3*
12/3*
12+3/
12-3/
12*3/
12/3/
123++
123-+
123*+
123/+
123+-
123--
123*-
123/-
123+*
123-*
123**
123/*
123+/
123-/
123*/
123//
13+2+
13-2+
13*2+
13/2+
13+2-
13-2-
13*2-
13/2-
13+2*
13-2*
13*2*
13/2*
13+2/
13-2/
13*2/
13/2/
132++
132-+
132*+
132/+
132+-
132--
132*-
132/-
132+*
132-*
132**
132/*
132+/
132-/
132*/
132//
21+3+
21-3+
21*3+
21/3+
21+3-
21-3-
21*3-
21/3-
21+3*
21-3*
21*3*
21/3*
21+3/
21-3/
21*3/
21/3/
213++
213-+
213*+
213/+
213+-
213--
213*-
213/-
213+*
213-*
213**
213/*
213+/
213-/
213*/
213//
23+1+
23-1+
23*1+
23/1+
23+1-
23-1-
23*1-
23/1-
23+1*
23-1*
23*1*
23/1*
23+1/
23-1/
23*1/
23/1/
231++
231-+
231*+
231/+
231+-
231--
231*-
231/-
231+*
231-*
231**
231/*
231+/
231-/
231*/
231//
31+2+
31-2+
31*2+
31/2+
31+2-
31-2-
31*2-
31/2-
31+2*
31-2*
31*2*
31/2*
31+2/
31-2/
31*2/
31/2/
312++
312-+
312*+
312/+
312+-
312--
312*-
312/-
312+*
312-*
312**
312/*
312+/
312-/
312*/
312//
32+1+
32-1+
32*1+
32/1+
32+1-
32-1-
32*1-
32/1-
32+1*
32-1*
32*1*
32/1*
32+1/
32-1/
32*1/
32/1/
321++
321-+
321*+
321/+
321+-
321--
321*-
321/-
321+*
321-*
321**
321/*
321+/
321-/
321*/
321//

DNA subsequence dynamic programming question

I'm trying to solve DNA problem which is more of improved(?) version of LCS problem.
In the problem, there is string which is string and semi-substring which allows part of string to have one or no letter skipped. For example, for string "desktop", it has semi-substring {"destop", "dek", "stop", "skop","desk","top"}, all of which has one or no letter skipped.
Now, I am given two DNA strings consisting of {a,t,g,c}. I"m trying to find longest semi-substring, LSS. and if there is more than one LSS, print out the one in the fastest order.
For example, two dnas {attgcgtagcaatg, tctcaggtcgatagtgac} prints out "tctagcaatg"
and aaaattttcccc, cccgggggaatatca prints out "aattc"
I'm trying to use common LCS algorithm but cannot solve it with tables although I did solve the one with no letter skipped. Any advice?
This is a variation on the dynamic programming solution for LCS, written in Python.
First I'm building up a Suffix Tree for all the substrings that can be made from each string with the skip rule. Then I'm intersecting the suffix trees. Then I'm looking for the longest string that can be made from that intersection tree.
Please note that this is technically O(n^2). Its worst case is when both strings are the same character, repeated over and over again. Because you wind up with a lot of what logically is something like, "an 'l' at position 42 in the one string could have matched against position l at position 54 in the other". But in practice it will be O(n).
def find_subtree (text, max_skip=1):
tree = {}
tree_at_position = {}
def subtree_from_position (position):
if position not in tree_at_position:
this_tree = {}
if position < len(text):
char = text[position]
# Make sure that we've populated the further tree.
subtree_from_position(position + 1)
# If this char appeared later, include those possible matches.
if char in tree:
for char2, subtree in tree[char].iteritems():
this_tree[char2] = subtree
# And now update the new choices.
for skip in range(max_skip + 1, 0, -1):
if position + skip < len(text):
this_tree[text[position + skip]] = subtree_from_position(position + skip)
tree[char] = this_tree
tree_at_position[position] = this_tree
return tree_at_position[position]
subtree_from_position(0)
return tree
def find_longest_common_semistring (text1, text2):
tree1 = find_subtree(text1)
tree2 = find_subtree(text2)
answered = {}
def find_intersection (subtree1, subtree2):
unique = (id(subtree1), id(subtree2))
if unique not in answered:
answer = {}
for k, v in subtree1.iteritems():
if k in subtree2:
answer[k] = find_intersection(v, subtree2[k])
answered[unique] = answer
return answered[unique]
found_longest = {}
def find_longest (tree):
if id(tree) not in found_longest:
best_candidate = ''
for char, subtree in tree.iteritems():
candidate = char + find_longest(subtree)
if len(best_candidate) < len(candidate):
best_candidate = candidate
found_longest[id(tree)] = best_candidate
return found_longest[id(tree)]
intersection_tree = find_intersection(tree1, tree2)
return find_longest(intersection_tree)
print(find_longest_common_semistring("attgcgtagcaatg", "tctcaggtcgatagtgac"))
Let g(c, rs, rt) represent the longest common semi-substring of strings, S and T, ending at rs and rt, where rs and rt are the ranked occurences of the character, c, in S and T, respectively, and K is the number of skips allowed. Then we can form a recursion which we would be obliged to perform on all pairs of c in S and T.
JavaScript code:
function f(S, T, K){
// mapS maps a char to indexes of its occurrences in S
// rsS maps the index in S to that char's rank (index) in mapS
const [mapS, rsS] = mapString(S)
const [mapT, rsT] = mapString(T)
// h is used to memoize g
const h = {}
function g(c, rs, rt){
if (rs < 0 || rt < 0)
return 0
if (h.hasOwnProperty([c, rs, rt]))
return h[[c, rs, rt]]
// (We are guaranteed to be on
// a match in this state.)
let best = [1, c]
let idxS = mapS[c][rs]
let idxT = mapT[c][rt]
if (idxS == 0 || idxT == 0)
return best
for (let i=idxS-1; i>=Math.max(0, idxS - 1 - K); i--){
for (let j=idxT-1; j>=Math.max(0, idxT - 1 - K); j--){
if (S[i] == T[j]){
const [len, str] = g(S[i], rsS[i], rsT[j])
if (len + 1 >= best[0])
best = [len + 1, str + c]
}
}
}
return h[[c, rs, rt]] = best
}
let best = [0, '']
for (let c of Object.keys(mapS)){
for (let i=0; i<(mapS[c]||[]).length; i++){
for (let j=0; j<(mapT[c]||[]).length; j++){
let [len, str] = g(c, i, j)
if (len > best[0])
best = [len, str]
}
}
}
return best
}
function mapString(s){
let map = {}
let rs = []
for (let i=0; i<s.length; i++){
if (!map[s[i]]){
map[s[i]] = [i]
rs.push(0)
} else {
map[s[i]].push(i)
rs.push(map[s[i]].length - 1)
}
}
return [map, rs]
}
console.log(f('attgcgtagcaatg', 'tctcaggtcgatagtgac', 1))
console.log(f('aaaattttcccc', 'cccgggggaatatca', 1))
console.log(f('abcade', 'axe', 1))

Generate all valid combinations of N pairs of parentheses

UPDATE (task detailed Explanation):
We have a string consist of numbers 0 and 1, divided by operators |, ^ or &. The task is to create all fully parenthesized expressions. So the final expressions should be divided into "2 parts"
For example
0^1 -> (0)^(1) but not extraneously: 0^1 -> (((0))^(1))
Example for expression 1|0&1:
(1)|((0)&(1))
((1)|(0))&(1)
As you can see both expressions above have left and write part:
left: (1); right: ((0)&(1))
left: ((1)|(0)); right: (1)
I tried the following code, but it does not work correctly (see output):
// expression has type string
// result has type Array (ArrayList in Java)
function setParens(expression, result) {
if (expression.length === 1) return "(" + expression + ")";
for (var i = 0; i < expression.length; i++) {
var c = expression[i];
if (c === "|" || c === "^" || c === "&") {
var left = expression.substring(0, i);
var right = expression.substring(i + 1);
leftParen = setParens(left, result);
rightParen = setParens(right, result);
var newExp = leftParen + c + rightParen;
result.push(newExp);
}
}
return expression;
}
function test() {
var r = [];
setParens('1|0&1', r);
console.log(r);
}
test();
code output: ["(0)&(1)", "(0)|0&1", "(1)|(0)", "1|0&(1)"]
Assuming the input expression is not already partially parenthesized and you want only fully parenthesized results:
FullyParenthesize(expression[1...n])
result = {}
// looking for operators
for p = 1 to n do
// binary operator; parenthesize LHS and RHS
// parenthesize the binary operation
if expression[p] is a binary operator then
lps = FullyParenthesize(expression[1 ... p - 1])
rps = FullyParenthesize(expression[p + 1 ... n])
for each lp in lps do
for each rp in rps do
result = result U {"(" + lp + expression[p] + rp + ")"}
// no binary operations <=> single variable
if result == {} then
result = {"(" + expression + ")")}
return result
Example: 1|2&3
FullyParenthesize("1|2&3")
result = {}
binary operator | at p = 2;
lps = FullyParenthesize("1")
no operators
result = {"(" + "1" + ")"}
return result = {"(1)"}
rps = Parenthesize("2&3")
result = {"2&3", "(2&3)"}
binary operator & at p = 2
lps = Parenthesize("2")
no operators
result = {"(" + "2" + ")"}
return result = {"(2)"}
rps = Parenthesize("3")
no operators
result = {"(" + "3" + ")"}
return result = {"(3)"}
lp = "(2)"
rp = "(3)"
result = result U {"(" + "(2)" + "&" + "(3)" + ")"}
return result = {"((2)&(3))"}
lp = "(1)"
rp = "((2)&(3))"
result = result U {"(" + "(1)" + "|" + "((2)&(3))" + ")"}
binary operator & at p = 4
...
result = result U {"(" + "((1)|(2))" + "&" + "(3)" + ")"}
return result {"((1)|((2)&(3)))", "(((1)|(2))&(3))"}
You will have 2^k unique fully parenthesized expressions (without repeated parentheses) given an input expression with k binary operators.

Algorithm for simple string compression

I would like to find the shortest possible encoding for a string in the following form:
abbcccc = a2b4c
[NOTE: this greedy algorithm does not guarantee shortest solution]
By remembering all previous occurrences of a character it is straight forward to find the first occurrence of a repeating string (minimal end index including all repetitions = maximal remaining string after all repetitions) and replace it with a RLE (Python3 code):
def singleRLE_v1(s):
occ = dict() # for each character remember all previous indices of occurrences
for idx,c in enumerate(s):
if not c in occ: occ[c] = []
for c_occ in occ[c]:
s_c = s[c_occ:idx]
i = 1
while s[idx+(i-1)*len(s_c) : idx+i*len(s_c)] == s_c:
i += 1
if i > 1:
rle_pars = ('(',')') if len(s_c) > 1 else ('','')
rle = ('%d'%i) + rle_pars[0] + s_c + rle_pars[1]
s_RLE = s[:c_occ] + rle + s[idx+(i-1)*len(s_c):]
return s_RLE
occ[c].append(idx)
return s # no repeating substring found
To make it robust for iterative application we have to exclude a few cases where a RLE may not be applied (e.g. '11' or '))'), also we have to make sure the RLE is not making the string longer (which can happen with a substring of two characters occurring twice as in 'abab'):
def singleRLE(s):
"find first occurrence of a repeating substring and replace it with RLE"
occ = dict() # for each character remember all previous indices of occurrences
for idx,c in enumerate(s):
if idx>0 and s[idx-1] in '0123456789': continue # no RLE for e.g. '11' or other parts of previous inserted RLE
if c == ')': continue # no RLE for '))...)'
if not c in occ: occ[c] = []
for c_occ in occ[c]:
s_c = s[c_occ:idx]
i = 1
while s[idx+(i-1)*len(s_c) : idx+i*len(s_c)] == s_c:
i += 1
if i > 1:
print("found %d*'%s'" % (i,s_c))
rle_pars = ('(',')') if len(s_c) > 1 else ('','')
rle = ('%d'%i) + rle_pars[0] + s_c + rle_pars[1]
if len(rle) <= i*len(s_c): # in case of a tie prefer RLE
s_RLE = s[:c_occ] + rle + s[idx+(i-1)*len(s_c):]
return s_RLE
occ[c].append(idx)
return s # no repeating substring found
Now we can safely call singleRLE on the previous output as long as we find a repeating string:
def iterativeRLE(s):
s_RLE = singleRLE(s)
while s != s_RLE:
print(s_RLE)
s, s_RLE = s_RLE, singleRLE(s_RLE)
return s_RLE
With the above inserted print statements we get e.g. the following trace and result:
>>> iterativeRLE('xyabcdefdefabcdefdef')
found 2*'def'
xyabc2(def)abcdefdef
found 2*'def'
xyabc2(def)abc2(def)
found 2*'abc2(def)'
xy2(abc2(def))
'xy2(abc2(def))'
But this greedy algorithm fails for this input:
>>> iterativeRLE('abaaabaaabaa')
found 3*'a'
ab3abaaabaa
found 3*'a'
ab3ab3abaa
found 2*'b3a'
a2(b3a)baa
found 2*'a'
a2(b3a)b2a
'a2(b3a)b2a'
whereas one of the shortest solutions is 3(ab2a).
Since a greedy algorithm does not work, some search is necessary. Here is a depth first search with some pruning (if in a branch the first idx0 characters of the string are not touched, to not try to find a repeating substring within these characters; also if replacing multiple occurrences of a substring do this for all consecutive occurrencies):
def isRLE(s):
"is this a well nested RLE? (only well nested RLEs can be further nested)"
nestCnt = 0
for c in s:
if c == '(':
nestCnt += 1
elif c == ')':
if nestCnt == 0:
return False
nestCnt -= 1
return nestCnt == 0
def singleRLE_gen(s,idx0=0):
"find all occurrences of a repeating substring with first repetition not ending before index idx0 and replace each with RLE"
print("looking for repeated substrings in '%s', first rep. not ending before index %d" % (s,idx0))
occ = dict() # for each character remember all previous indices of occurrences
for idx,c in enumerate(s):
if idx>0 and s[idx-1] in '0123456789': continue # sub-RLE cannot start after number
if not c in occ: occ[c] = []
for c_occ in occ[c]:
s_c = s[c_occ:idx]
if not isRLE(s_c): continue # avoid RLEs for e.g. '))...)'
if idx+len(s_c) < idx0: continue # pruning: this substring has been tried before
if c_occ-len(s_c) >= 0 and s[c_occ-len(s_c):c_occ] == s_c: continue # pruning: always take all repetitions
i = 1
while s[idx+(i-1)*len(s_c) : idx+i*len(s_c)] == s_c:
i += 1
if i > 1:
rle_pars = ('(',')') if len(s_c) > 1 else ('','')
rle = ('%d'%i) + rle_pars[0] + s_c + rle_pars[1]
if len(rle) <= i*len(s_c): # in case of a tie prefer RLE
s_RLE = s[:c_occ] + rle + s[idx+(i-1)*len(s_c):]
#print(" replacing %d*'%s' -> %s" % (i,s_c,s_RLE))
yield s_RLE,c_occ
occ[c].append(idx)
def iterativeRLE_depthFirstSearch(s):
shortestRLE = s
candidatesRLE = [(s,0)]
while len(candidatesRLE) > 0:
candidateRLE,idx0 = candidatesRLE.pop(0)
for rle,idx in singleRLE_gen(candidateRLE,idx0):
if len(rle) <= len(shortestRLE):
shortestRLE = rle
print("new optimum: '%s'" % shortestRLE)
candidatesRLE.append((rle,idx))
return shortestRLE
Sample output:
>>> iterativeRLE_depthFirstSearch('tctttttttttttcttttttttttctttttttttttct')
looking for repeated substrings in 'tctttttttttttcttttttttttctttttttttttct', first rep. not ending before index 0
new optimum: 'tc11tcttttttttttctttttttttttct'
new optimum: '2(tctttttttttt)ctttttttttttct'
new optimum: 'tctttttttttttc2(ttttttttttct)'
looking for repeated substrings in 'tc11tcttttttttttctttttttttttct', first rep. not ending before index 2
new optimum: 'tc11tc10tctttttttttttct'
new optimum: 'tc11t2(ctttttttttt)tct'
new optimum: 'tc11tc2(ttttttttttct)'
looking for repeated substrings in 'tc5(tt)tcttttttttttctttttttttttct', first rep. not ending before index 2
...
new optimum: '2(tctttttttttt)c11tct'
...
new optimum: 'tc11tc10tc11tct'
...
new optimum: 'tc11t2(c10t)tct'
looking for repeated substrings in 'tc11tc2(ttttttttttct)', first rep. not ending before index 6
new optimum: 'tc11tc2(10tct)'
...
new optimum: '2(tc10t)c11tct'
...
'2(tc10t)c11tct'
Following is my C++ implementation to do it in-place with O(n) time complexity and O(1) space complexity.
class Solution {
public:
int compress(vector<char>& chars) {
int n = (int)chars.size();
if(chars.empty()) return 0;
int left = 0, right = 0, currCharIndx = left;
while(right < n) {
if(chars[currCharIndx] != chars[right]) {
int len = right - currCharIndx;
chars[left++] = chars[currCharIndx];
if(len > 1) {
string freq = to_string(len);
for(int i = 0; i < (int)freq.length(); i++) {
chars[left++] = freq[i];
}
}
currCharIndx = right;
}
right++;
}
int len = right - currCharIndx;
chars[left++] = chars[currCharIndx];
if(len > 1) {
string freq = to_string(len);
for(int i = 0; i < freq.length(); i++) {
chars[left++] = freq[i];
}
}
return left;
}
};
You need to keep track of three pointers - right is to iterate, currCharIndx is to keep track the first position of current character and left is to keep track the write position of the compressed string.

Find the minimum number of edits to balance parentheses?

I was very confused about this question. I know about finding the edit distance between 2 strings using recursion and dynamic programming as an improvement, however am confused about how to go with this one.
Not sure if my thinking is correct. But we have a string of parenthesis which is unbalanced say
String s = "((())))";
How to find the String with balanced Parenthesis which requires minimum number of edits ?
Can some one explain this with an example ?
I am still not sure if I am explaining it correctly.
Given a string consisting of left and right parentheses, we are asked to balance it by performing a minimal number of delete, insert, and replace operations.
To begin with, let's look at the input string and distinguish matched pairs from unmatched characters. We can mark all the characters belonging to matched pairs by executing the following algorithm:
Find an unmarked '(' that is followed by an unmarked ')', with zero or more marked characters between the two.
If there is no such pair of characters, terminate the algorithm.
Otherwise, mark the '(' and the ')'.
Return to step 1.
The marked pairs are already balanced at zero cost, so the optimal course of action is to do nothing further with them.
Now let's consider the unmarked characters. Notice that no unmarked '(' is followed by an unmarked ')', or else the pair would have been marked. Therefore, if we scan the unmarked characters from left to right, we will find zero or more ')' characters followed by zero or more '(' characters.
To balance the sequence of ')' characters, it is optimal to rewrite every other one to '(', starting with the first one and excluding the last one. If there is an odd number of ')' characters, it is optimal to delete the last one.
As for the sequence of '(' characters, it is optimal to rewrite every other one to ')', starting with the second one. If there is a leftover '(' character, we delete it.
The following Python code implements the steps described above and displays the intermediate results.
def balance(s): # s is a string of '(' and ')' characters in any order
n = len(s)
print('original string: %s' % s)
# Mark all matched pairs
marked = n * [ False ]
left_parentheses = []
for i, ch in enumerate(s):
if ch == '(':
left_parentheses.append(i)
else:
if len(left_parentheses) != 0:
marked[i] = True
marked[left_parentheses.pop()] = True
# Display the matched pairs and unmatched characters.
matched, remaining = [], []
for i, ch in enumerate(s):
if marked[i]:
matched.append(ch)
remaining.append(' ')
else:
matched.append(' ')
remaining.append(ch)
print(' matched pairs: %s' % ''.join(matched))
print(' unmatched: %s' % ''.join(remaining))
cost = 0
deleted = n * [ False ]
new_chars = list(s)
# Balance the unmatched ')' characters.
right_count, last_right = 0, -1
for i, ch in enumerate(s):
if not marked[i] and ch == ')':
right_count += 1
if right_count % 2 == 1:
new_chars[i] = '('
cost += 1
last_right = i
if right_count % 2 == 1: # Delete the last ')' if we couldn't match it.
deleted[last_right] = True # The cost was incremented during replacement.
# Balance the unmatched '(' characters.
left_count, last_left = 0, -1
for i, ch in enumerate(s):
if not marked[i] and ch == '(':
left_count += 1
if left_count % 2 == 0:
new_chars[i] = ')'
cost += 1
else:
last_left = i
if left_count % 2 == 1: # Delete the last '(' if we couldn't match it.
deleted[last_left] = True # This character wasn't replaced, so we must
cost += 1 # increment the cost now.
# Display the outcome of replacing and deleting.
balanced = []
for i, ch in enumerate(new_chars):
if marked[i] or deleted[i]:
balanced.append(' ')
else:
balanced.append(ch)
print(' balance: %s' % ''.join(balanced))
# Display the cost of balancing and the overall balanced string.
print(' cost: %d' % cost)
result = []
for i, ch in enumerate(new_chars):
if not deleted[i]: # Skip deleted characters.
result.append(ch)
print(' new string: %s' % ''.join(result))
balance(')()(()())))()((())((')
For the test case ')()(()())))()((())((', the output is as follows.
original string: )()(()())))()((())((
matched pairs: ()(()()) () (())
unmatched: ) )) ( ((
balance: ( ) ( )
cost: 4
new string: (()(()()))()((()))
The idea is simple:
Find final string having left over open and close brackets which couldn't make pair. Remember that in this final string, close brackets will be present 1st and then open brackets.
Now we will have to edit open brackets and close brackets separately.
eg: for close brackets:
(1) if it is of even length:
min edit to balance will be to change half close brackets to open brackets.
So minEdit = closeBracketCount/2 .
(2) If it is of odd length:
min edit to balance will be to do above step 1 and remove the remaining 1 bracket.
So minEdit = closeBracketCount/2 + 1
For open brackets:
(1) if it is of even length:
min edit to balance will be to change half open brackets to close brackets.
So minEdit = openBracketCount/2.
(2) If it is of odd length:
min edit to balance will be to do above step 1 and remove the remaining 1 bracket.
So minEdit = openBracketCount/2 + 1
Here is the running code: http://codeshare.io/bX1Dt
Let me know your thoughts.
While this interesting problem can be solved with dynamic programming as mentioned in the comments, there exists an easier solution to it. You can solve it with the greedy algorithm.
Idea for this greedy algorithm comes from how we check the validity of parentheses expression. You set counter to 0 and traverse the parentheses string, add 1 at "(" and substract 1 at ")". If counter always stays above or at 0 and finishes at 0, you have a valid string.
This implies that if the lowest value that we encountered while traversing is -maxi, we need to add exactly -maxi "(" at the start. Adjust final counter value for added "(" and add enough ")" at the end to finish at 0.
Here is the pseudo-code for the algorithm:
counter = 0
mini = 0
for each p in string:
if p == "(":
counter++
else:
counter--
mini = min(counter, mini)
add -mini "(" at the start of the string
counter -= mini
add counter ")" at the end of the string
I tired to solve the problem with DP algorithm and it passed a few test cases made up by myself. Let me know if you think it's correct.
Let P(i,j) be the minimum number of edits to make string S[i..j] balanced.
When S[i] equals S[j], the number of minimum edits is obviously P(i+1,j-1)
There are a few options to make the string balanced when S[i] != S[j], but in the end we could either add '(' to the front of i or ')' at the end of j, or remove the parenthesis at i or j. In all these cases, the minimum number of edits is min{P(i+1, j), P(i, j-1)} + 1.
We therefore have below DP formula:
P(i,j) = 0 if i > j
= P(i + 1, j - 1) if S[i] matches S[j] OR S[i] and S[j] are not parenthesis
= min{P(i + 1, j), P(i, j - 1)} + 1
I would use stack to balance them efficiently. Here is python code:
a=['(((((','a(b)c)','((())))',')()(()())))()((())((']
def balance(s):
st=[]
l=len(s)
i=0
while i<l:
if s[i]=='(':
st.append(i)
elif s[i]==')':
if st:
st.pop()
else:
del s[i]
i-=1
l-=1
i+=1
while st:
del s[st.pop()]
return ''.join(s)
for i in a:
print balance(list(i))
Output:
Empty
a(b)c
((()))
()(()())()(())
//fisher
public int minInsertions(String s) {
Stack<Character> stack = new Stack<>();
int insertionsNeeded = 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '(') {
if (stack.isEmpty()) {
stack.add(c);
} else {
if (stack.peek() == ')') {
//in this case, we need to add one more ')' to get two consecutive right paren, then we could pop the one ')' and one '(' off the stack
insertionsNeeded++;
stack.pop();
stack.pop();
stack.add(c);
} else {
stack.add(c);
}
}
} else if (c == ')') {
if (stack.isEmpty()) {
//in this case, we need to add one '(' before we add this ')' onto this stack
insertionsNeeded++;
stack.add('(');
stack.add(c);
} else {
if (stack.peek() == ')') {
//in this case, we could pop the one ')' and one '(' off the stack
stack.pop();
stack.pop();
} else {
stack.add(c);
}
}
}
}
if (stack.isEmpty()) {
return insertionsNeeded;
} else {
while (!stack.isEmpty()) {
char pop = stack.pop();
if (pop == '(') {
insertionsNeeded += 2;
} else {
insertionsNeeded++;
stack.pop();
}
}
return insertionsNeeded;
}
}
}

Resources