Fastest way to get the number in tail of a string - algorithm

Given that we have this kind of string "XXXXXXX XXXXX 756", "XXXXX XXXXXX35665", (X is a character), which is the fasted way to get the number in the end of string?
EDIT: well, this is just-for-fun question. Solve this is quite simple, but I want to know the fastest algorithm to archive this. Rock it on!

In C, a quick O(n), one-pass algorithm (does not see negative signs) is :
int suffixedNumber(char* string) {
int result = 0;
char ch;
while (ch = *string++)
// Check whether <= '9' first, because most characters are > '9'.
result = (ch <= '9' && ch >= '0') ? 10*result + (ch - '0') : 0;
return result;
}
If you're alright with gotos, you can get a ≈20% faster algorithm that (in order of importance) :
returns -1 when there is no number at the end of string
avoids checking for end-of-string when ch >= '0'
avoids resetting result to zero when ch is nonnumeric
avoids multiplying result by ten when a number starts
avoids setting result to zero at the beginning
int suffixedNumber(char* string) {
int result;
char ch;
nonnumber: // STATE: Waiting for the start of a number.
ch = *string++;
if (ch > '9') goto nonnumber; // Decide this boundary first (> '9' most frequent)
if (ch < '0') { // Decide this boundary next
if (ch == '\0') return -1; // Decide this boundary last ('\0' least frequent)
goto nonnumber;
}
result = ch - '0';
number: // STATE: In the middle of a number.
ch = *string++;
if (ch > '9') goto nonnumber; // Decide this boundary first (> '9' most frequent)
if (ch < '0') { // Decide this boundary next
if (ch == '\0') return result; // Decide this boundary last ('\0' least frequent)
goto nonnumber;
}
result = 10*result + (ch - '0');
goto number;
}

Assuming the text can be streamed in reverse order (a reasonable assumption since strings in most languages are backed by an array of characters with O(1) access), construct the number by reading the text backwards until you hit a character that is not a digit or the text has been consumed entirely.
numDigits = 0
number = 0
while(numDigits <> length and characterAt[length - numDigits] is a digit)
number = number + (parseCharacterAt[length - numDigits] * (10 ^ numDigits))
numDigits = numDigits + 1
end while
if(numDigits is 0)
Error ("No digits at the end")
else return number
Note: (10 ^ numDigits) can be trivially optimized with another variable.

Without knowing language or context, if the number of digits or characters is fixed length a simple substring would do, otherwise a regex matching consecutive digits (i.e. /\d+/).
Probably some faster algorithm if you drop down to C++ levels, but I favour expressiveness.

I would just call the lastIndexOf('X') of your String object and proceed from there. Not backward looping mess.

use regular expression:
/^.+?(\d+)$/
and take first match capture group (\d+)
Edit:
if you can't use regular expressions, FASTEST WAY will be something like this:
i = string.len
while i > 0:
break if string[i].isNotNum
i--
end
out = substring(string, i,string.len)

Stating the problem:
We want to find the index of the last non-digit character
Reflexion:
This implies that we check that each character after this point is a digit, which means we will need to perform at least O(k) comparison where k is the number of digits at the end of the string
Implementation:
Linear backward search, possibly involving bitwise trickery to "vectorize" the operations (comparing multiple characters at once) or leveraging a multithreading effort.

definitely dont use regex - way too slow. Just loop backwards until you find the first non numeric character:
string s = "XXXXX XXXXXX35665";
int i = s.Length;
while (--i >= 0 && Char.IsNumber(s[i]));
s=s.Substring(i + 1);
should do the trick.. ??

extracting tail number
str="XXXXXx....XXXX333";
parseInt(str.match(/\d*$/)
if float point wanted then
parseInt(str.match(/\d|\.*$/)
increment tail number:
str.replace(/(\d*)$/,function(){return parseInt(arguments[0])+1;})

Related

Fuzzy string record search algorithm (supporting word transpose and character transpose)

I am trying to find the best algorithm for my particular application. I have searched around on SO, Google, read various articles about Levenshtein distances, etc. but honestly it's a bit out of my area of expertise. And most seem to find how similar two input strings are, like a Hamming distance between strings.
What I'm looking for is different, more of a fuzzy record search (and I'm sure there is a name for it, that I don't know to Google). I am sure someone has solved this problem before and I'm looking for a recommendation to point me in the right direction for my further research.
In my case I am needing a fuzzy search of a database of entries of music artists and their albums. As you can imagine, the database will have millions of entries so an algorithm that scales well is crucial. It's not important to my question that Artist and Album are in different columns, the database could just store all words in one column if that helped the search.
The database to search:
|-------------------|---------------------|
| Artist | Album |
|-------------------|---------------------|
| Alanis Morissette | Jagged Little Pill |
| Moby | Everything is Wrong |
| Air | Moon Safari |
| Pearl Jam | Ten |
| Nirvana | Nevermind |
| Radiohead | OK Computer |
| Beck | Odelay |
|-------------------|---------------------|
The query text will contain from just one word in the entire Artist_Album concatenation up to the entire thing. The query text is coming from OCR and is likely to have single character transpositions but the most likely thing is the words are not guaranteed to have the right order. Additionally, there could be extra words in the search that aren't a part of the album (like cover art text). For example, "OK Computer" might be at the top of the album and "Radiohead" below it, or some albums have text arranged in columns which intermixes the word orders.
Possible search strings:
C0mputer Rad1ohead
Pearl Ten Jan
Alanis Jagged Morisse11e Litt1e Pi11
Air Moon Virgin Records
Moby Everything
Note that with OCR, some letters will look like numbers, or the wrong letter completely (Jan instead of Jam). And in the case of Radiohead's OK Computer and Moby's Everything Is Wrong, the query text doesn't even have all of the words. In the case of Air's Moon Safari, the extra words Virgin Records are searched, but Safari is missing.
Is there a general algorithm that could return the single likeliest result from the database, and if none meet some "likeliness" score threshold, it returns nothing? I'm actually developing this in Python, but that's just a bonus, I'm looking more for where to get started researching.
Let's break the problem down in two parts.
First, you want to define some measure of likeness (this is called a metric). This metric should return a small number if the query text closely matches the album/artist cover, and return a larger number otherwise.
Second, you want a datastructure that speeds up this process. Obviously, you don't want to calculate this metric every single time a query is ran.
part 1: the metric
You already mentioned Levenshtein distance, which is a great place to start.
Think outside the box though.
LD makes certain assumptions (each character replacement is equally likely, deletion is equally likely as insertion, etc). You can obviously improve the performance of this metric by taking into account what faults OCR is likely to introduce.
E.g. turning a '1' into an 'i' should not be penalized as harshly as turning a '0' into an '_'.
I would implement the metric in two stages. For any given two strings:
split both strings in tokens (assume space as the separator)
look for the most similar words (using a modified version of LD)
assign a final score based on 'matching words', 'missing words' and 'added words' (preferably weighted)
This is an example implementation (fiddle around with the constants):
static double m(String a, String b){
String[] aParts = a.split(" ");
String[] bParts = b.split(" ");
boolean[] bUsed = new boolean[bParts.length];
int matchedTokens = 0;
int tokensInANotInB = 0;
int tokensInBNotInA = 0;
for(int i=0;i<aParts.length;i++){
String a0 = aParts[i];
boolean wasMatched = true;
for(int j=0;j<bParts.length;j++){
String b0 = bParts[j];
double d = levenshtein(a0, b0);
/* If we match the token a0 with a token from b0
* update the number of matchedTokens
* escape the loop
*/
if(d < 2){
bUsed[j]=true;
wasMatched = true;
matchedTokens++;
break;
}
}
if(!wasMatched){
tokensInANotInB++;
}
}
for(boolean partUsed : bUsed){
if(!partUsed){
tokensInBNotInA++;
}
}
return (matchedTokens
+ tokensInANotInB * -0.3 // the query is allowed to contain extra words at minimal cost
+ tokensInBNotInA * -0.5 // the album title should not contain too many extra words
) / java.lang.Math.max(aParts.length, bParts.length);
}
This function uses a modified levenshtein function:
static double levenshtein(String x, String y) {
double[][] dp = new double[x.length() + 1][y.length() + 1];
for (int i = 0; i <= x.length(); i++) {
for (int j = 0; j <= y.length(); j++) {
if (i == 0) {
dp[i][j] = j;
}
else if (j == 0) {
dp[i][j] = i;
}
else {
dp[i][j] = min(dp[i - 1][j - 1]
+ costOfSubstitution(x.charAt(i - 1), y.charAt(j - 1)),
dp[i - 1][j] + 1,
dp[i][j - 1] + 1);
}
}
}
return dp[x.length()][y.length()];
}
Which uses the function 'cost of substitution' (which works as explained)
static double costOfSubstitution(char a, char b){
if(a == b)
return 0.0;
else{
// 1 and i
if(a == '1' && b == 'i')
return 0.5;
if(a == 'i' && b == '1')
return 0.5;
// 0 and O
if(a == '0' && b == 'o')
return 0.5;
if(a == 'o' && b == '0')
return 0.5;
if(a == '0' && b == 'O')
return 0.5;
if(a == 'O' && b == '0')
return 0.5;
// default
return 1.0;
}
}
I only included a couple of examples (turning '1' into 'i' or '0' into 'o').
But I'm sure you get the idea.
part 2: the datastructure
Look into BK-trees. They are a specific datastructure to hold metric information. Your metric needs to be a genuine metric (in the mathematical sense of the word). But that's easily arranged.

Dafny insert method, a postcondition might not hold on this return path

I have an array "line" which has a string contained in it of length "l" and an array "nl" which has a string contained in it of length "p".
Note: "l" and "p" don't necessarily have to be the length of each correspondent array.The parameter "at" will be position where the insertion will be made inside "line".
Resuming: An array of length "p" will be inserted into "line", moving all chars of "line" between position (at,i,at+p),'p' positions to the right in order to make the insertion.
My logic for the ensures is to check if the elements inserted in "line" have the same order and are the same that the chars contained in "nl".
Here is the code:
method insert(line:array<char>, l:int, nl:array<char>, p:int, at:int)
requires line != null && nl != null;
requires 0 <= l+p <= line.Length && 0 <= p <= nl.Length ;
requires 0 <= at <= l;
modifies line;
ensures forall i :: (0<=i<p) ==> line[at+i] == nl[i]; // error
{
var i:int := 0;
var positionAt:int := at;
while(i<l && positionAt < l)
invariant 0<=i<l+1;
invariant at<=positionAt<=l;
{
line[positionAt+p] := line[positionAt];
line[positionAt] := ' ';
positionAt := positionAt + 1;
i := i + 1;
}
positionAt := at;
i := 0;
while(i<p && positionAt < l)
invariant 0<=i<=p;
invariant at<=positionAt<=l;
{
line[positionAt] := nl[i];
positionAt := positionAt + 1;
i := i + 1;
}
}
Here are the errors that i am receiving.
Thanks.
I suspect that your algorithm is not correct, because it does not seem to take into account the fact that shifting the characters starting at position at by p places might write them over the end of the string in line.
My experience has been that in order to be successful with verification
Good standards of code development are crucial. Good variable naming, code formatting, and other code conventions are even more important than usual.
Writing code that is logically simple is really helpful. Try to avoid extraneous extra variables. Try to simplify arithmetic and logical expressions wherever practical.
Starting with a correct algorithm makes verification easier. Of course, this is easier said than done!
It is often helpful to write out the strongest loop invariants you can think of.
Working backwards from the postcondition is often helpful. In your case, take the postcondition and the negation of the final loop condition - and use these to work out what the invariant of the final loop must be in order to imply the postcondition. Then work backwards from that to the previous loop, etc
When manipulating arrays, using a ghost variable which contains the original value of the array as a sequence is very often an effective strategy. Ghost variables do not appear in the compiler output so will not effect the performance of your program.
It is often helpful to write down assertions for the exact state of the array, even if the postcondition only requires some weaker property.
Here is a verified implementation of your desired procedure:
// l is length of the string in line
// p is length of the string in nl
// at is the position to insert nl into line
method insert(line:array<char>, l:int, nl:array<char>, p:int, at:int)
requires line != null && nl != null
requires 0 <= l+p <= line.Length // line has enough space
requires 0 <= p <= nl.Length // string in nl is shorter than nl
requires 0 <= at <= l // insert position within line
modifies line
ensures forall i :: (0<=i<p) ==> line[at+i] == nl[i] // ok now
{
ghost var initialLine := line[..];
// first we need to move the characters to the right
var i:int := l;
while(i>at)
invariant line[0..i] == initialLine[0..i]
invariant line[i+p..l+p] == initialLine[i..l]
invariant at<=i<=l
{
i := i - 1;
line[i+p] := line[i];
}
assert line[0..at] == initialLine[0..at];
assert line[at+p..l+p] == initialLine[at..l];
i := 0;
while(i<p)
invariant 0<=i<=p
invariant line[0..at] == initialLine[0..at]
invariant line[at..at+i] == nl[0..i]
invariant line[at+p..l+p] == initialLine[at..l]
{
line[at + i] := nl[i];
i := i + 1;
}
assert line[0..at] == initialLine[0..at];
assert line[at..at+p] == nl[0..p];
assert line[at+p..l+p] == initialLine[at..l];
}
http://rise4fun.com/Dafny/ZoCv

VBScript Failed to filter non ascii characters

I have this function:
Private Sub CheckParams(Values)
Dim Str, Ch
If IsArray(Values) then
Str = Join(Values, "")
Else
Str = Values
End If
For I = 1 To Len(Str)
Ch = Asc(Mid(Str, I, 1))
If Not ((Ch = 9) Or (Ch = 10) Or (Ch = 13) Or ((Ch > 31) And (Ch < 128))) Then
SetError("script result contains illegal characters.")
End If
Next
End Sub
This function throws error if the input value has characters that not on the list according to the If statement in the For loop. The problem is when my input value has Japanese characters, the validation is passed without error. I think the Asc() function, that use to return the ANSI code of the character, don't know how to handle the Japanese characters. What seems to be the problem here? Is the Asc() function returns negative numbers?
Kanji and Kana are most likely represented as 2-Byte Unicode characters, so you could try something like this:
ch = AscW(MidB(str, i, 2))
I found the solution. It is very similar to Ansgar's solution but instead MidB I used Mid with 1:
ch = AscW(Mid(str, i, 1))

What is the best algorithm to find whether an anagram is of a palindrome?

In this problem we consider only strings of lower-case English letters (a-z).
A string is a palindrome if it has exactly the same sequence of characters when traversed left-to-right as right-to-left. For example, the following strings are palindromes:
"kayak"
"codilitytilidoc"
"neveroddoreven"
A string A is an anagram of a string B if it consists of exactly the same characters, but possibly in another order. For example, the following strings are each other's anagrams:
A="mary" B="army" A="rocketboys" B="octobersky" A="codility" B="codility"
Write a function
int isAnagramOfPalindrome(String S);
which returns 1 if the string s is a anagram of some palindrome, or returns 0 otherwise.
For example your function should return 1 for the argument "dooernedeevrvn", because it is an anagram of a palindrome "neveroddoreven". For argument "aabcba", your function should return 0.
'Algorithm' would be too big word for it.
You can construct a palindrome from the given character set if each character occurs in that set even number of times (with possible exception of one character).
For any other set, you can easily show that no palindrome exists.
Proof is simple in both cases, but let me know if that wasn't clear.
In a palindrome, every character must have a copy of itself, a "twin", on the other side of the string, except in the case of the middle letter, which can act as its own twin.
The algorithm you seek would create a length-26 array, one for each lowercase letter, and start counting the characters in the string, placing the quantity of character n at index n of the array. Then, it would pass through the array and count the number of characters with an odd quantity (because one letter there does not have a twin). If this number is 0 or 1, place that single odd letter in the center, and a palindrome is easily generated. Else, it's impossible to generate one, because two or more letters with no twins exist, and they can't both be in the center.
I came up with this solution for Javascript.
This solution is based on the premise that a string is an anagram of a palindrome if and only if at most one character appears an odd number of times in it.
function solution(S) {
var retval = 0;
var sorted = S.split('').sort(); // sort the input characters and store in
// a char array
var array = new Array();
for (var i = 0; i < sorted.length; i++) {
// check if the 2 chars are the same, if so copy the 2 chars to the new
// array
// and additionally increment the counter to account for the second char
// position in the loop.
if ((sorted[i] === sorted[i + 1]) && (sorted[i + 1] != undefined)) {
array.push.apply(array, sorted.slice(i, i + 2));
i = i + 1;
}
}
// if the original string array's length is 1 or more than the length of the
// new array's length
if (sorted.length <= array.length + 1) {
retval = 1;
}
//console.log("new array-> " + array);
//console.log("sorted array-> " + sorted);
return retval;
}
i wrote this code in java. i don't think if its gonna be a good one ^^,
public static int isAnagramOfPalindrome(String str){
ArrayList<Character> a = new ArrayList<Character>();
for(int i = 0; i < str.length(); i++){
if(a.contains(str.charAt(i))){
a.remove((Object)str.charAt(i));
}
else{
a.add(str.charAt(i));
}
}
if(a.size() > 1)
return 0;
return 1;
}
Algorithm:
Count the number of occurrence of each character.
Only one character with odd occurrence is allowed since in a palindrome the maximum number of character with odd occurrence can be '1'.
All other characters should occur in an even number of times.
If (2) and (3) fail, then the given string is not a palindrome.
This adds to the other answers given. We want to keep track of the count of each letter seen. If we have more than one odd count for a letter then we will not be able to form a palindrome. The odd count would go in the middle, but only one odd count can do so.
We can use a hashmap to keep track of the counts. The lookup for a hashmap is O(1) so it is fast. We are able to run the whole algorithm in O(n). Here's it is in code:
if __name__ == '__main__':
line = input()
dic = {}
for i in range(len(line)):
ch = line[i]
if ch in dic:
dic[ch] += 1
else:
dic[ch] = 1
chars_whose_count_is_odd = 0
for key, value in dic.items():
if value % 2 == 1:
chars_whose_count_is_odd += 1
if chars_whose_count_is_odd > 1:
print ("NO")
else:
print ("YES")
I have a neat solution in PHP posted in this question about complexities.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
It uses built-in PHP functions (why not), but you can make it yourself, as those functions are quite simple. First, the count_chars function generates a named array (dictionary in python) with all characters that appear in the string, and their number of occurrences. It can be substituted with a custom function like this:
$count_chars = array();
foreach($S as $char) {
if array_key_exists($char, $count_chars) {
$count_chars[$char]++;
else {
$count_chars[$char] = 1;
}
}
Then, an array_filter with a count function is applied to count how many chars have odd number of occurrences:
$odds = 0;
foreach($count_chars as $char) {
$odds += $char % 2;
}
And then you just apply the comparison in return (explained in the comments of the original function).
return ($odds == strlen($char) % 2)
This runs in O(n). For all chars but one, must be even. the optional odd character can be any odd number.
e.g.
abababa
def anagram_of_pali(str):
char_list = list(str)
map = {}
nb_of_odds = 0
for char in char_list:
if char in map:
map[char] += 1
else:
map[char] = 1
for char in map:
if map[char] % 2 != 0:
nb_of_odds += 1
return True if nb_of_odds <= 1 else False
You just have to count all the letters and check if there are letters with odd counts. If there are more than one letter with odd counts the string does not satisfy the above palindrome condition.
Furthermore, since a string with an even number letters must not have a letter with an odd count it is not necessary to check whether string length is even or not. It will take O(n) time complexity:
Here's the implementation in javascript:
function canRearrangeToPalindrome(str)
{
var letterCounts = {};
var letter;
var palindromeSum = 0;
for (var i = 0; i < str.length; i++) {
letter = str[i];
letterCounts[letter] = letterCounts[letter] || 0;
letterCounts[letter]++;
}
for (var letterCount in letterCounts) {
palindromeSum += letterCounts[letterCount] % 2;
}
return palindromeSum < 2;
}
All right - it's been a while, but as I was asked such a question in a job interview I needed to give it a try in a few lines of Python. The basic idea is that if there is an anagram that is a palindrome for even number of letters each character occurs twice (or something like 2n times, i.e. count%2==0). In addition, for an odd number of characters one character (the one in the middle) may occur only once (or an uneven number - count%2==1).
I used a set in python to get the unique characters and then simply count and break the loop once the condition cannot be fulfilled. Example code (Python3):
def is_palindrome(s):
letters = set(s)
oddc=0
fail=False
for c in letters:
if s.count(c)%2==1:
oddc = oddc+1
if oddc>0 and len(s)%2==0:
fail=True
break
elif oddc>1:
fail=True
break
return(not fail)
def is_anagram_of_palindrome(S):
L = [ 0 for _ in range(26) ]
a = ord('a')
length = 0
for s in S:
length += 1
i = ord(s) - a
L[i] = abs(L[i] - 1)
return length > 0 and sum(L) < 2 and 1 or 0
While you can detect that the given string "S" is a candidate palindrome using the given techniques, it is still not very useful. According to the implementations given,
isAnagramOfPalindrome("rrss") would return true but there is no actual palindrome because:
A palindrome is a word, phrase, number, or other sequence of symbols or elements, whose meaning may be interpreted the same way in either forward or reverse direction. (Wikipedia)
And Rssr or Srrs is not an actual word or phrase that is interpretable. Same with it's anagram. Aarrdd is not an anagram of radar because it is not interpretable.
So, the solutions given must be augmented with a heuristic check against the input to see if it's even a word, and then a verification (via the implementations given), that it is palindrome-able at all. Then there is a heuristic search through the collected buckets with n/2! permutations to search if those are ACTUALLY palindromes and not garbage. The search is only n/2! and not n! because you calculate all permutations of each repeated letter, and then you mirror those over (in addition to possibly adding the singular pivot letter) to create all possible palindromes.
I disagree that algorithm is too big of a word, because this search can be done pure recursively, or using dynamic programming (in the case of words with letters with occurrences greater than 2) and is non trivial.
Here's some code: This is same as the top answer that describes algorithm.
1 #include<iostream>
2 #include<string>
3 #include<vector>
4 #include<stack>
5
6 using namespace std;
7
8 bool fun(string in)
9 {
10 int len=in.size();
11 int myints[len ];
12
13 for(int i=0; i<len; i++)
14 {
15 myints[i]= in.at(i);
16 }
17 vector<char> input(myints, myints+len);
18 sort(input.begin(), input.end());
19
20 stack<int> ret;
21
22 for(int i=0; i<len; i++)
23 {
24 if(!ret.empty() && ret.top()==input.at(i))
25 {
26 ret.pop();
27 }
28 else{
29 ret.push(input.at(i));
30 }
31 }
32
33 return ret.size()<=1;
34
35 }
36
37 int main()
38 {
39 string input;
40 cout<<"Enter word/number"<<endl;
41 cin>>input;
42 cout<<fun(input)<<endl;
43
44 return 0;
45 }

Distributed algorithm to compute the balance of the parentheses

This is an interview question: "How to build a distributed algorithm to compute the balance of the parentheses ?"
Usually he balance algorithm scans a string form left to right and uses a stack to make sure that the number of open parentheses always >= the number of close parentheses and finally the number of open parentheses == the number of close parentheses.
How would you make it distributed ?
You can break the string into chunks and process each separately, assuming you can read and send to the other machines in parallel. You need two numbers for each string.
The minimum nesting depth achieved relative to the start of the string.
The total gain or loss in nesting depth across the whole string.
With these values, you can compute the values for the concatenation of many chunks as follows:
minNest = 0
totGain = 0
for p in chunkResults
minNest = min(minNest, totGain + p.minNest)
totGain += p.totGain
return new ChunkResult(minNest, totGain)
The parentheses are matched if the final values of totGain and minNest are zero.
I would apply the map-reduce algorithm in which the map function would compute a part of the string return either an empty string if parentheses are balanced or a string with the last parenthesis remaining.
Then the reduce function would concatenate the result of two returned strings by map function and compute it again returning the same result than map. At the end of all computations, you'd either obtain an empty string or a string containing the un-balanced parenthesis.
I'll try to have a more detailed explain on #jonderry's answer. Code first, in Scala
def parBalance(chars: Array[Char], chunkSize: Int): Boolean = {
require(chunkSize > 0, "chunkSize must be greater than 0")
def traverse(from: Int, until: Int): (Int, Int) = {
var count = 0
var stack = 0
var nest = 0
for (n <- from until until) {
val cur = chars(c)
if (cur == '(') {
count += 1
stack += 1
}
else if (cur == ')') {
count -= 1
if (stack > 0) stack -= 1
else nest -= 1
}
}
(nest, count)
}
def reduce(from: Int, until: Int): (Int, Int) = {
val m = (until + from) / 2
if (until - from <= chunkSize) {
traverse(from, until)
} else {
parallel(reduce(from, m), reduce(m, until)) match {
case ((minNestL, totGainL), (minNestR, totGainR)) => {
((minNestL min (minNestR + totGainL)), (totGainL + totGainR))
}
}
}
}
reduce(0, chars.length) == (0,0)
}
Given a string, if we remove balanced parentheses, what's left will be in a form )))(((, give n for number of ) and m for number of (, then m >= 0, n <= 0(for easier calculation). Here n is minNest and m+n is totGain. To make a true balanced string, we need m+n == 0 && n == 0.
In a parallel operation, how to we derive those for node from it's left and right? For totGain we just needs to add them up. When calculating n for node, it can just be n(left) if n(right) not contribute or n(right) + left.totGain whichever is smaller.

Resources