Tree from balanced parenthesis - algorithm

I have to find height of tree and find protection number (or just to generate a tree) from balanced parentheses.
For example:
()()()() creates tree like a list with height 3.
I have no idea how to convert parentheses to tree. I found some 'answers':
http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf (second page contains all examples for tree with 4 nodes)
paragraph - Binary Trees, Forests, Non-Crossing Pairs :
https://sahandsaba.com/interview-question-generating-all-balanced-parentheses.html
However, I still don't know how to create a tree from such defined parentheses. I have some impression that in Knuth, authors treat it as something obvious.
Do I miss something or it's not that simple?
Is it necessary to create a forest and then a binary tree?

A pair of parentheses represents a node. What appears within those parentheses represents its left child's subtree (according to the same rules). What appears at the right of those parentheses represents the node's right child's subtree (again, according to the same rules).
The conversion of this encoding into a binary tree can be done recursively like this:
function makeBinaryTree(input):
i = 0 # character index in input
function recur():
if i >= input.length or input[i] == ")":
i = i + 1
return NIL
i = i + 1
node = new Node
node.left = recur()
if i >= input.length or input[i] == ")":
i = i + 1
return node
node.right = recur()
return node
return recur()
Here is an implementation in JavaScript that performs the conversion for each of those 4-noded trees, and pretty prints the resulting trees:
function makeBinaryTree(input) {
let i = 0; // character index in input
return recur();
function recur() {
if (i >= input.length || input[i++] === ")") return null;
let node = { left: recur(), right: null };
if (i >= input.length || input[i] === ")") {
i++;
return node;
}
node.right = recur();
return node;
}
}
// Helper function to pretty print a tree
const disc = "\u2B24";
function treeAsLines(node) {
let left = [""], right = [""];
if (node.left) left = treeAsLines(node.left);
if (node.right) right = treeAsLines(node.right);
while (left.length < right.length) left.push(" ".repeat(left[0].length));
while (left.length > right.length) right.push(" ".repeat(left[0].length));
let topLeft = "", topRight = "";
let i = left[0].indexOf(disc);
if (i > -1) topLeft = "┌".padEnd(left[0].length-i+1, "─");
i = right[0].indexOf(disc);
if (i > -1) topRight = "┐".padStart(i+2, "─");
return [topLeft.padStart(left[0].length+1) + disc + topRight.padEnd(right[0].length+1)]
.concat(left.map((line, i) => line + " " + right[i]));
}
// The trees as listed in Table 1 of http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf
let inputs = [
"()()()()",
"()()(())",
"()(())()",
"()(()())",
"()((()))",
"(())()()",
"(())(())",
"(()())()",
"(()()())",
"(()(()))",
"((()))()",
"((())())",
"((()()))",
"(((())))"
];
for (let input of inputs) {
let tree = makeBinaryTree(input);
console.log(input);
console.log(treeAsLines(tree).join("\n"));
}

If I understood Knuth correctly, the representation works as the following: A pair of matching parentheses represents a node, e.g. () = A. Two consecutive pairs of matching parentheses means that the second node is the right child of the first one, e.g. ()() = A -> B. And two pairs of embedded parentheses means the inside node is the left child of the outside node, i.e. (()) = B <- A. Therefore, ()()()() = A -> B -> C -> D.
A possible algorithm to convert parentheses to binary tree would be:
convert(parentheses):
if parentheses is empty:
return Nil
root = Node()
left_start = 1
left_end = Nil
open = 0
for p = 0 to |parentheses|-1:
if parentheses[p] == '(':
open += 1
else
open -= 1
if open == 0:
left_end = p
break
root.left = convert(parentheses[left_start:left_end] or empty if index out of bound)
root.right = convert(parentheses[left_end+1:] or empty if index out of bound)
return root
It works by converting parentheses (L)R in the binary tree L <- A -> R recursively.

Related

check whether binary tree is balanced or not?

I was working on the problem of checking the height of the balanced binary tree and found one tutorial like
if (height of left subtree == -1)
return -1.
public boolean isBalanced(TreeNode root) {
if (root == null){
return true;
}
return height(root) != -1;
}
int height(TreeNode node){
`if (node == null){`
return 0;
}
int left = height(node.left);
int right = height(node.right);
int bf = Math.abs(left- right);
if (bf \> 1 || left == -1 || right == -1){
return -1;
}
return Math.max(left,right)+1;
}
So, I was wondering in what case the height of the left subtree of the binary tree will be -1?
Thank you
I looked over the tutorial and it was not explained properly.
Consider a tree. Which has a root and a child in left. Its child on left has only child on left and no right child and so on.
R
/
C0
/
C1
/
C2
/
C3
Height is called with root R. Now Root R calls height on its left child C0 and right child (NULL).
C0 does this to C1 then C1 does this same thing on C2 and so on.
So we'll calculate from C3 then to C2 then to C1, because that's how recursion is going to get executed.
In your function height(C3) now we call its height(C3->left) and height(C3->right). Both are NULL. Left and Right for them = 0.
Height(C3) returns 1. As
(Math.max(left,right)+1; => Math.max(0,0)+1; => 1)
Now For function call height(C2)
left = Height(C2->left) i.e. C3
//hence left = 1;
right = Height(C2->right) i.e. NULL
//hence right = 0;
bf = Math.abs(left-right); => Math.abs(1-0); => 1
still all conditions are false and we return:
Math.max(left,right)+1; => Math.max(1,0)+1; => 2
Now let's come to height(C1)
left = height(C1->left) i.e. height(C2)
//left = 2
right = height(C1->right) i.e. height(NULL)
//right = 0
Now,
bf = Math.abs(left-right); => bf = Math.abs(2-0); => bf = 2
since bf > 1 now, the if condition is executed and you return -1.
Now, for height(C0)
left = height(C0->left); => height(C1)
left = -1 //this is where you get height = -1
Now from this point above all your returns will be -1 because of the condition. Saying that this tree is not balanced.

Is there a way to find a path in a tree?

Let's say we have a tree like the one below. Is there an algorithm that given 2 nodes find the path that connects them. For example, given (A,E) it will return [A,B,E], or given (D,G) it will return [D,B,A,C,G]
A
/ \
B C
/ \ / \
D E F G
You will need to have a tree implementation where a child node has a link to its parent.
Then for both nodes you can build the path from the node to the root, just by following the parent link.
Then compare the two paths, starting from the ends of the paths (where the root is): as long as they are the same, remove that common node from both paths.
Finally you are left with two diverting paths. Reverse the second, and join the two paths, putting the last removed node in between the two.
Here is an implementation in JavaScript:
function getPathToRoot(a) {
if (a.parent == null) return [a];
return [a].concat(getPathToRoot(a.parent));
}
function findPath(a, b) {
let p = getPathToRoot(a);
let q = getPathToRoot(b);
let common = null;
while (p.length > 0 && q.length > 0 && p[p.length-1] == q[q.length-1]) {
common = p.pop();
q.pop();
}
return p.concat(common, q.reverse());
}
// Create the example tree
let nodes = {};
for (let label of "ABCDEFG") {
nodes[label] = { label: label, parent: null };
}
nodes.B.parent = nodes.C.parent = nodes.A;
nodes.D.parent = nodes.E.parent = nodes.B;
nodes.F.parent = nodes.G.parent = nodes.C;
// Perform the algorithm
let path = findPath(nodes.E, nodes.G);
// Output the result
console.log("Path from E to G is:");
for (let node of path) {
console.log(node.label);
}

DNA subsequence dynamic programming question

I'm trying to solve DNA problem which is more of improved(?) version of LCS problem.
In the problem, there is string which is string and semi-substring which allows part of string to have one or no letter skipped. For example, for string "desktop", it has semi-substring {"destop", "dek", "stop", "skop","desk","top"}, all of which has one or no letter skipped.
Now, I am given two DNA strings consisting of {a,t,g,c}. I"m trying to find longest semi-substring, LSS. and if there is more than one LSS, print out the one in the fastest order.
For example, two dnas {attgcgtagcaatg, tctcaggtcgatagtgac} prints out "tctagcaatg"
and aaaattttcccc, cccgggggaatatca prints out "aattc"
I'm trying to use common LCS algorithm but cannot solve it with tables although I did solve the one with no letter skipped. Any advice?
This is a variation on the dynamic programming solution for LCS, written in Python.
First I'm building up a Suffix Tree for all the substrings that can be made from each string with the skip rule. Then I'm intersecting the suffix trees. Then I'm looking for the longest string that can be made from that intersection tree.
Please note that this is technically O(n^2). Its worst case is when both strings are the same character, repeated over and over again. Because you wind up with a lot of what logically is something like, "an 'l' at position 42 in the one string could have matched against position l at position 54 in the other". But in practice it will be O(n).
def find_subtree (text, max_skip=1):
tree = {}
tree_at_position = {}
def subtree_from_position (position):
if position not in tree_at_position:
this_tree = {}
if position < len(text):
char = text[position]
# Make sure that we've populated the further tree.
subtree_from_position(position + 1)
# If this char appeared later, include those possible matches.
if char in tree:
for char2, subtree in tree[char].iteritems():
this_tree[char2] = subtree
# And now update the new choices.
for skip in range(max_skip + 1, 0, -1):
if position + skip < len(text):
this_tree[text[position + skip]] = subtree_from_position(position + skip)
tree[char] = this_tree
tree_at_position[position] = this_tree
return tree_at_position[position]
subtree_from_position(0)
return tree
def find_longest_common_semistring (text1, text2):
tree1 = find_subtree(text1)
tree2 = find_subtree(text2)
answered = {}
def find_intersection (subtree1, subtree2):
unique = (id(subtree1), id(subtree2))
if unique not in answered:
answer = {}
for k, v in subtree1.iteritems():
if k in subtree2:
answer[k] = find_intersection(v, subtree2[k])
answered[unique] = answer
return answered[unique]
found_longest = {}
def find_longest (tree):
if id(tree) not in found_longest:
best_candidate = ''
for char, subtree in tree.iteritems():
candidate = char + find_longest(subtree)
if len(best_candidate) < len(candidate):
best_candidate = candidate
found_longest[id(tree)] = best_candidate
return found_longest[id(tree)]
intersection_tree = find_intersection(tree1, tree2)
return find_longest(intersection_tree)
print(find_longest_common_semistring("attgcgtagcaatg", "tctcaggtcgatagtgac"))
Let g(c, rs, rt) represent the longest common semi-substring of strings, S and T, ending at rs and rt, where rs and rt are the ranked occurences of the character, c, in S and T, respectively, and K is the number of skips allowed. Then we can form a recursion which we would be obliged to perform on all pairs of c in S and T.
JavaScript code:
function f(S, T, K){
// mapS maps a char to indexes of its occurrences in S
// rsS maps the index in S to that char's rank (index) in mapS
const [mapS, rsS] = mapString(S)
const [mapT, rsT] = mapString(T)
// h is used to memoize g
const h = {}
function g(c, rs, rt){
if (rs < 0 || rt < 0)
return 0
if (h.hasOwnProperty([c, rs, rt]))
return h[[c, rs, rt]]
// (We are guaranteed to be on
// a match in this state.)
let best = [1, c]
let idxS = mapS[c][rs]
let idxT = mapT[c][rt]
if (idxS == 0 || idxT == 0)
return best
for (let i=idxS-1; i>=Math.max(0, idxS - 1 - K); i--){
for (let j=idxT-1; j>=Math.max(0, idxT - 1 - K); j--){
if (S[i] == T[j]){
const [len, str] = g(S[i], rsS[i], rsT[j])
if (len + 1 >= best[0])
best = [len + 1, str + c]
}
}
}
return h[[c, rs, rt]] = best
}
let best = [0, '']
for (let c of Object.keys(mapS)){
for (let i=0; i<(mapS[c]||[]).length; i++){
for (let j=0; j<(mapT[c]||[]).length; j++){
let [len, str] = g(c, i, j)
if (len > best[0])
best = [len, str]
}
}
}
return best
}
function mapString(s){
let map = {}
let rs = []
for (let i=0; i<s.length; i++){
if (!map[s[i]]){
map[s[i]] = [i]
rs.push(0)
} else {
map[s[i]].push(i)
rs.push(map[s[i]].length - 1)
}
}
return [map, rs]
}
console.log(f('attgcgtagcaatg', 'tctcaggtcgatagtgac', 1))
console.log(f('aaaattttcccc', 'cccgggggaatatca', 1))
console.log(f('abcade', 'axe', 1))

Remove redundant parentheses from an arithmetic expression

This is an interview question, for which I did not find any satisfactory answers on stackoverflow or outside. Problem statement:
Given an arithmetic expression, remove redundant parentheses. E.g.
((a*b)+c) should become a*b+c
I can think of an obvious way of converting the infix expression to post fix and converting it back to infix - but is there a better way to do this?
A pair of parentheses is necessary if and only if they enclose an unparenthesized expression of the form X % X % ... % X where X are either parenthesized expressions or atoms, and % are binary operators, and if at least one of the operators % has lower precedence than an operator attached directly to the parenthesized expression on either side of it; or if it is the whole expression. So e.g. in
q * (a * b * c * d) + c
the surrounding operators are {+, *} and the lowest precedence operator inside the parentheses is *, so the parentheses are unnecessary. On the other hand, in
q * (a * b + c * d) + c
there is a lower precedence operator + inside the parentheses than the surrounding operator *, so they are necessary. However, in
z * q + (a * b + c * d) + c
the parentheses are not necessary because the outer * is not attached to the parenthesized expression.
Why this is true is that if all the operators inside an expression (X % X % ... % X) have higher priority than a surrounding operator, then the inner operators are anyway calculated out first even if the parentheses are removed.
So, you can check any pair of matching parentheses directly for redundancy by this algorithm:
Let L be operator immediately left of the left parenthesis, or nil
Let R be operator immediately right of the right parenthesis, or nil
If L is nil and R is nil:
Redundant
Else:
Scan the unparenthesized operators between the parentheses
Let X be the lowest priority operator
If X has lower priority than L or R:
Not redundant
Else:
Redundant
You can iterate this, removing redundant pairs until all remaining pairs are non-redundant.
Example:
((a * b) + c * (e + f))
(Processing pairs from left to right):
((a * b) + c * (e + f)) L = nil R = nil --> Redundant
^ ^
(a * b) + c * (e + f) L = nil R = nil --> Redundant
^ ^ L = nil R = + X = * --> Redundant
a * b + c * (e + f) L = * R = nil X = + --> Not redundant
^ ^
Final result:
a * b + c * (e + f)
I just figured out an answer:
the premises are:
1. the expression has been tokenized
2. no syntax error
3. there are only binary operators
input:
list of the tokens, for example:
(, (, a, *, b, ), +, c, )
output:
set of the redundant parentheses pairs (the orders of the pairs are not important),
for example,
0, 8
1, 5
please be aware of that : the set is not unique, for instance, ((a+b))*c, we can remove outer parentheses or inner one, but the final expression is unique
the data structure:
a stack, each item records information in each parenthese pair
the struct is:
left_pa: records the position of the left parenthese
min_op: records the operator in the parentheses with minimum priority
left_op: records current operator
the algorithm
1.push one empty item in the stack
2.scan the token list
2.1 if the token is operand, ignore
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthese
2.4 if the token is right parenthese,
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthese)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
example one
((a*b)+c)
after scanning to b, we have stack:
index left_pa min_op left_op
0
1 0
2 1 * * <-stack top
now we meet the first ')'(at pos 5), we pop the item
left_pa = 1
min_op = *
left_op = *
and pre-read operator '+', since min_op priority '*' > '+', so the pair(1,5) is redundant, so output it.
then scan till we meet last ')', at the moment, we have stack
index left_pa min_op left_op
0
1 0 + +
we pop this item(since we meet ')' at pos 8), and pre-read next operator, since there is no operator and at index 0, there is no left_op, so output the pair(0, 8)
example two
a*(b+c)
when we meet the ')', the stack is like:
index left_pa min_op left_op
0 * *
1 2 + +
now, we pop the item at index = 1, compare the min_op '+' with the left_op '*' at index 0, we can find out the '(',')' are necessary
This solutions works if the expression is a valid. We need mapping of the operators to priority values.
a. Traverse from two ends of the array to figure out matching parenthesis from both ends.
Let the indexes be i and j respectively.
b. Now traverse from i to j and find out the lowest precedence operator which is not contained inside any parentheses.
c. Compare the priority of this operator with the operators to left of open parenthesis and right of closing parenthesis. If no such operator exists, treat its priority as -1. If the priority of the operator is higher than these two, remove the parenthesis at i and j.
d. Continue the steps a to c until i<=j.
Push one empty item in the stack
Scan the token list
2.1 if the token is operand, ignore.
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority.
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthesis.
2.4 if the token is right parenthesis:
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthesis)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
The code below implements a straightforward solution. It is limited to +, -, *, and /, but it can be extended to handle other operators if needed.
#include <iostream>
#include <set>
#include <stack>
int loc;
std::string parser(std::string input, int _loc) {
std::set<char> support = {'+', '-', '*', '/'};
std::string expi;
std::set<char> op;
loc = _loc;
while (true) {
if (input[loc] == '(') {
expi += parser(input, loc + 1);
} else if (input[loc] == ')') {
if ((input[loc + 1] != '*') && (input[loc + 1] != '/')) {
return expi;
} else {
if ((op.find('+') == op.end()) && (op.find('-') == op.end())) {
return expi;
} else {
return '(' + expi + ')';
}
}
} else {
char temp = input[loc];
expi = expi + temp;
if (support.find(temp) != support.end()) {
op.insert(temp);
}
}
loc++;
if (loc >= input.size()) {
break;
}
}
return expi;
}
int main() {
std::string input("(((a)+((b*c)))+(d*(f*g)))");
std::cout << parser(input, 0);
return 0;
}
I coded it previously in https://calculation-test.211368e.repl.co/trim.html. This doesn't have some errors in other answers.
(6 / (-2454) ** (((234)))) + (-5435) --> 6 / (-2454) ** 234 + (-5435)
const format = expression => {
var change = [], result = expression.replace(/ /g, "").replace(/\*\*/g, "^"), _count;
function replace(index, string){result = result.slice(0, index) + string + result.slice(index + 1)}
function add(index, string){result = result.slice(0, index) + string + result.slice(index)}
for (var count = 0; count < result.length; count++){
if (result[count] == "-"){
if ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890)".includes(result[count - 1])){
change.push(count);
}else if (result[count - 1] != "("){
add(count, "(");
count++;
_count = count + 1;
while ("1234567890.".includes(result[_count])) _count++;
if (_count < result.length - 1){
add(_count, ")");
}else{
add(_count + 2, ")");
}
}
}
}
change = change.sort(function(a, b){return a - b});
const len = change.length;
for (var count = 0; count < len; count++){replace(change[0] + count * 2, " - "); change.shift()}
return result.replace(/\*/g, " * ").replace(/\^/g, " ** ").replace(/\//g, " / ").replace(/\+/g, " + ");
}
const trim = expression => {
var result = format(expression).replace(/ /g, "").replace(/\*\*/g, "^"), deleting = [];
const brackets = bracket_pairs(result);
function bracket_pairs(){
function findcbracket(str, pos){
const rExp = /\(|\)/g;
rExp.lastIndex = pos + 1;
var depth = 1;
while ((pos = rExp.exec(str))) if (!(depth += str[pos.index] == "(" ? 1 : -1 )) {return pos.index}
}
function occurences(searchStr, str){
var startIndex = 0, index, indices = [];
while ((index = str.indexOf(searchStr, startIndex)) > -1){
indices.push(index);
startIndex = index + 1;
}
return indices;
}
const obrackets = occurences("(", result);
var cbrackets = [];
for (var count = 0; count < obrackets.length; count++) cbrackets.push(findcbracket(result, obrackets[count]));
return obrackets.map((e, i) => [e, cbrackets[i]]);
}
function remove(deleting){
function _remove(index){result = result.slice(0, index) + result.slice(index + 1)}
const len = deleting.length;
var deleting = deleting.sort(function(a, b){return a - b});
for (var count = 0; count < len; count++){
_remove(deleting[0] - count);
deleting.shift()
}
}
function precedence(operator, position){
if (!"^/*-+".includes(operator)) return "^/*-+";
if (position == "l" || position == "w") return {"^": "^", "/": "^", "*": "^/*", "-": "^/*", "+": "^/*-+"}[operator];
if (position == "r") return {"^": "^", "/": "^/*", "*": "^/*", "-": "^/*-+", "+": "^/*-+"}[operator];
}
function strip_bracket(string){
var result = "", level = 0;
for (var count = 0; count < string.length; count++){
if (string.charAt(count) == "(") level++;
if (level == 0) result += string.charAt(count);
if (string.charAt(count) == ")") level--;
}
return result.replace(/\s{2,}/g, " ");
}
for (var count = 0; count < brackets.length; count++){
const pair = brackets[count];
if (result[pair[0] - 1] == "(" && result[pair[1] + 1] == ")"){
deleting.push(...pair);
}else{
const left = precedence(result[pair[0] - 1], "l"), right = precedence(result[pair[1] + 1], "r");
var contents = strip_bracket(result.slice(pair[0] + 1, pair[1])), within = "+";
for (var _count = 0; _count < contents.length; _count++) if (precedence(contents[_count], "w").length < precedence(within, "w").length) within = contents[_count];
if (/^[0-9]+$/g.test(contents) || contents == ""){
deleting.push(...pair);
continue;
}
if (left.includes(within) && right.includes(within)){
if (!isNaN(result.slice(pair[0] + 1, pair[1]))){
if (Number(result.slice(pair[0] + 1, pair[1])) >= 0 && !"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}else if (!"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}
}
}
remove(deleting);
result = format(result);
return result;
}
<input id="input">
<button onclick="document.getElementById('result').innerHTML = trim(document.getElementById('input').value)">Remove and format</button>
<div id="result"></div>
I think that you are looking for kind of algorithm as seen in the following photo.
This algorithm is "almost" ready, since a lot of bugs arise once the more complex it becomes, the more complicated it gets. The way I work on this thing, is 'build-and-write-code-on-the-fly', which means that for up to 4 parentheses, things are easy. But after the expression goes more complex, there are things that I cannot predict while writing down thoughts on paper. And there comes the compiler to tell me what to correct. It would not be a lie if I state that it is not me to have written the algorithm, but the (C#) compiler instead! So far, it took me 1400 lines. It is not that the commands were difficult to write. It was their arrangement that was a real puzzle. This program you are looking for, is characterized by a really high grade of complexity. Well, if you need any primary ideas, please let me know and I will reply. Thanx!
Algorithm

Distributed algorithm to compute the balance of the parentheses

This is an interview question: "How to build a distributed algorithm to compute the balance of the parentheses ?"
Usually he balance algorithm scans a string form left to right and uses a stack to make sure that the number of open parentheses always >= the number of close parentheses and finally the number of open parentheses == the number of close parentheses.
How would you make it distributed ?
You can break the string into chunks and process each separately, assuming you can read and send to the other machines in parallel. You need two numbers for each string.
The minimum nesting depth achieved relative to the start of the string.
The total gain or loss in nesting depth across the whole string.
With these values, you can compute the values for the concatenation of many chunks as follows:
minNest = 0
totGain = 0
for p in chunkResults
minNest = min(minNest, totGain + p.minNest)
totGain += p.totGain
return new ChunkResult(minNest, totGain)
The parentheses are matched if the final values of totGain and minNest are zero.
I would apply the map-reduce algorithm in which the map function would compute a part of the string return either an empty string if parentheses are balanced or a string with the last parenthesis remaining.
Then the reduce function would concatenate the result of two returned strings by map function and compute it again returning the same result than map. At the end of all computations, you'd either obtain an empty string or a string containing the un-balanced parenthesis.
I'll try to have a more detailed explain on #jonderry's answer. Code first, in Scala
def parBalance(chars: Array[Char], chunkSize: Int): Boolean = {
require(chunkSize > 0, "chunkSize must be greater than 0")
def traverse(from: Int, until: Int): (Int, Int) = {
var count = 0
var stack = 0
var nest = 0
for (n <- from until until) {
val cur = chars(c)
if (cur == '(') {
count += 1
stack += 1
}
else if (cur == ')') {
count -= 1
if (stack > 0) stack -= 1
else nest -= 1
}
}
(nest, count)
}
def reduce(from: Int, until: Int): (Int, Int) = {
val m = (until + from) / 2
if (until - from <= chunkSize) {
traverse(from, until)
} else {
parallel(reduce(from, m), reduce(m, until)) match {
case ((minNestL, totGainL), (minNestR, totGainR)) => {
((minNestL min (minNestR + totGainL)), (totGainL + totGainR))
}
}
}
}
reduce(0, chars.length) == (0,0)
}
Given a string, if we remove balanced parentheses, what's left will be in a form )))(((, give n for number of ) and m for number of (, then m >= 0, n <= 0(for easier calculation). Here n is minNest and m+n is totGain. To make a true balanced string, we need m+n == 0 && n == 0.
In a parallel operation, how to we derive those for node from it's left and right? For totGain we just needs to add them up. When calculating n for node, it can just be n(left) if n(right) not contribute or n(right) + left.totGain whichever is smaller.

Resources