Counting bigrams - string-comparison

I do already have a function that counts the number of times that a letter occurs in a text. Now I am trying to do it for bigrams. Until now, I have
CountDi(txt, L1, L2)={
local(ct,len,v);
ct=0;len=#txt;
v=Vecsmall(txt);
for(k=1,len, if(v[k] == Vecsmall(L1)[1] || v[k] == Vecsmall(L1)[1] + 32));
for(j=1, len, if(v[j] == Vecsmall(L2)[1] || v[j] == Vecsmall(L2)[1] && v[j] ==v[k] + 1 , ct = ct +
1));
return(ct)
};
This is in PARI/GP, and L1 and L2 are to be given in the for "A", for example. What happens is that I'm not being succeeded, because for every letters L1 and L2, the function returns always 0, but I'm not getting why. Thanks in advance.

Related

Edge case for finding strictly increasing squares of a number

I'm trying to solve this codewars kata, Square into Squares.
I'm passing most of the tests, but there are two inputs for which my algorithm exceeds the maximum call stack size.
I feel like I'm taking care of all the edge conditions, and I can't figure out what I'm missing.
function sumSquares (n) {
function decompose (num, whatsLeft, result) {
if (whatsLeft === 0) return result
else {
if (whatsLeft < 0 || num === 0) return null
else {
return decompose(num-1, whatsLeft - num * num, [num].concat(result)) || decompose(num-1, whatsLeft, result)
}
}
}
return decompose(n-1, n*n, [])
}
const resA = sumSquares(50) //[1,3,5,8,49]
console.log(resA)
const resB = sumSquares(7) //[2,3,6]
console.log(resB)
const resC = sumSquares(11) //[ 1, 2, 4, 10 ]
console.log(resC)
const res1 = sumSquares(90273)
console.log(res1)
const res2 = sumSquares(123456)
console.log(res2)
It looks like your code is correct, but has two problems: first, your call stack will eventually reach size "num" (which may be causing your failure for large inputs), and second, it may recompute the same values multiple times.
The first problem is easy to fix: you can skip num values which give a negative whatsLeft result. Like this:
while(num * num > whatsLeft) num = num - 1;
You can insert this after the first if statement. This also enables you to remove the check for negative whatsLeft. As a matter of style, I removed the else{} cases for your if statements after a return -- this reduces the indentation and (I think) makes the code easier to read. But that's just a matter of personal taste.
function sumSquares (n) {
function decompose (num, whatsLeft, result) {
if (whatsLeft === 0) return result;
while (num * num > whatsLeft) num -= 1;
if (num === 0) return null;
return decompose(num-1, whatsLeft - num * num, [num].concat(result)) || decompose(num-1, whatsLeft, result);
}
return decompose(n-1, n*n, []);
}
Your test cases run instantly for me with these changes, so the second problem (which would be solved by memoization) isn't necessary to address. I also tried submitting it on the codewars site, and with a little tweaking (the outer function needs to be called decompose, so both the outer and inner functions need renaming), all 113 test cases pass in 859ms.
#PaulHankin's answer offers good insight
Let's look at sumSquares (n) where n = 100000
decompose (1e5 - 1, 1e5 * 1e5, ...)
In the first frame,
num = 99999
whatsLeft = 10000000000
Which spawns
decompose (99999 - 1, 1e10 - 99999 * 99999, ...)
Where the second frame is
num = 99998
whatsLeft = 199999
And here's the problem: num * num above is significantly larger than whatsLeft and each time we recur to try a new num that first, we only decrease by -1 each frame. Without fixing anything, the next process spawned will be
decompose (99998 - 1, 199999 - 99998 * 99998, ...)
Where the third frame is
num = 99997
whatsLeft = -9999500005
See how whatsLeft is significantly negative? It means we'll have to decrease num by a lot before the next value doesn't cause whatsLeft to drop below zero
// [frame #4]
num = 99996
whatsLeft = -9999000017
// [frame #5]
num = 99995
whatsLeft = -9998800026
...
// [frame #99552]
num = 448
whatsLeft = -705
// [frame #99553]
num = 447
whatsLeft = 190
As we can see above, it would take almost 100000 frames just to guess the second digit of sumSquares (100000). This is exactly what Paul Hankin describes as your first problem.
We can also visualize it a little easer if we only look at decompose with num. Below, if a solution cannot be found, the stack will grow to size num and therefore cannot be used to compute solutions where num exceeds the stack limit
// imagine num = 100000
function decompose (num, ...) {
...
decompose (num - 1 ...) || decompose (num - 1, ...)
}
Paul's solution uses a while loop to decrement num using a loop until num is small enough. Another solution would involve calculating the next guess by finding the square root of the remaining whatsLeft
const sq = num * num
const next = whatsLeft - sq
const guess = Math.floor (Math.sqrt (next))
return decompose (guess, next, ...) || decompose (num - 1, whatsLeft, ...)
Now it can be used to calculate values where num is huge
console.log (sumSquares(123456))
// [ 1, 2, 7, 29, 496, 123455 ]
But notice there's a bug for certain inputs. The squares of the solution still sum to the correct amount, but it's allowing some numbers to be repeated
console.log (sumSquares(50))
// [ 1, 1, 4, 9, 49 ]
To enforce the strictly increasing requirement, we have to ensure that a calculated guess is still lower than the previous. We can do that using Math.min
const guess = Math.floor (Math.sqrt (next))
const guess = Math.min (num - 1, Math.floor (Math.sqrt (next)))
Now the bug is fixed
console.log (sumSquares(50))
// [ 1, 1, 4, 9, 49 ]
// [ 1, 3, 5, 8, 49 ]
Full program demonstration
function sumSquares (n) {
function decompose (num, whatsLeft, result) {
if (whatsLeft === 0)
return result;
if (whatsLeft < 0 || num === 0)
return null;
const sq = num * num
const next = whatsLeft - sq
const guess = Math.min (num - 1, Math.floor (Math.sqrt (next)))
return decompose(guess, next, [num].concat(result)) || decompose(num-1, whatsLeft, result);
}
return decompose(n-1, n*n, []);
}
console.log (sumSquares(50))
// [ 1, 3, 5, 8, 49 ]
console.log (sumSquares(123456))
// [ 1, 2, 7, 29, 496, 123455 ]

Remove redundant parentheses from an arithmetic expression

This is an interview question, for which I did not find any satisfactory answers on stackoverflow or outside. Problem statement:
Given an arithmetic expression, remove redundant parentheses. E.g.
((a*b)+c) should become a*b+c
I can think of an obvious way of converting the infix expression to post fix and converting it back to infix - but is there a better way to do this?
A pair of parentheses is necessary if and only if they enclose an unparenthesized expression of the form X % X % ... % X where X are either parenthesized expressions or atoms, and % are binary operators, and if at least one of the operators % has lower precedence than an operator attached directly to the parenthesized expression on either side of it; or if it is the whole expression. So e.g. in
q * (a * b * c * d) + c
the surrounding operators are {+, *} and the lowest precedence operator inside the parentheses is *, so the parentheses are unnecessary. On the other hand, in
q * (a * b + c * d) + c
there is a lower precedence operator + inside the parentheses than the surrounding operator *, so they are necessary. However, in
z * q + (a * b + c * d) + c
the parentheses are not necessary because the outer * is not attached to the parenthesized expression.
Why this is true is that if all the operators inside an expression (X % X % ... % X) have higher priority than a surrounding operator, then the inner operators are anyway calculated out first even if the parentheses are removed.
So, you can check any pair of matching parentheses directly for redundancy by this algorithm:
Let L be operator immediately left of the left parenthesis, or nil
Let R be operator immediately right of the right parenthesis, or nil
If L is nil and R is nil:
Redundant
Else:
Scan the unparenthesized operators between the parentheses
Let X be the lowest priority operator
If X has lower priority than L or R:
Not redundant
Else:
Redundant
You can iterate this, removing redundant pairs until all remaining pairs are non-redundant.
Example:
((a * b) + c * (e + f))
(Processing pairs from left to right):
((a * b) + c * (e + f)) L = nil R = nil --> Redundant
^ ^
(a * b) + c * (e + f) L = nil R = nil --> Redundant
^ ^ L = nil R = + X = * --> Redundant
a * b + c * (e + f) L = * R = nil X = + --> Not redundant
^ ^
Final result:
a * b + c * (e + f)
I just figured out an answer:
the premises are:
1. the expression has been tokenized
2. no syntax error
3. there are only binary operators
input:
list of the tokens, for example:
(, (, a, *, b, ), +, c, )
output:
set of the redundant parentheses pairs (the orders of the pairs are not important),
for example,
0, 8
1, 5
please be aware of that : the set is not unique, for instance, ((a+b))*c, we can remove outer parentheses or inner one, but the final expression is unique
the data structure:
a stack, each item records information in each parenthese pair
the struct is:
left_pa: records the position of the left parenthese
min_op: records the operator in the parentheses with minimum priority
left_op: records current operator
the algorithm
1.push one empty item in the stack
2.scan the token list
2.1 if the token is operand, ignore
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthese
2.4 if the token is right parenthese,
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthese)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
example one
((a*b)+c)
after scanning to b, we have stack:
index left_pa min_op left_op
0
1 0
2 1 * * <-stack top
now we meet the first ')'(at pos 5), we pop the item
left_pa = 1
min_op = *
left_op = *
and pre-read operator '+', since min_op priority '*' > '+', so the pair(1,5) is redundant, so output it.
then scan till we meet last ')', at the moment, we have stack
index left_pa min_op left_op
0
1 0 + +
we pop this item(since we meet ')' at pos 8), and pre-read next operator, since there is no operator and at index 0, there is no left_op, so output the pair(0, 8)
example two
a*(b+c)
when we meet the ')', the stack is like:
index left_pa min_op left_op
0 * *
1 2 + +
now, we pop the item at index = 1, compare the min_op '+' with the left_op '*' at index 0, we can find out the '(',')' are necessary
This solutions works if the expression is a valid. We need mapping of the operators to priority values.
a. Traverse from two ends of the array to figure out matching parenthesis from both ends.
Let the indexes be i and j respectively.
b. Now traverse from i to j and find out the lowest precedence operator which is not contained inside any parentheses.
c. Compare the priority of this operator with the operators to left of open parenthesis and right of closing parenthesis. If no such operator exists, treat its priority as -1. If the priority of the operator is higher than these two, remove the parenthesis at i and j.
d. Continue the steps a to c until i<=j.
Push one empty item in the stack
Scan the token list
2.1 if the token is operand, ignore.
2.2 if the token is operator, records the operator in the left_op,
if min_op is nil, set the min_op = this operator, if the min_op
is not nil, compare the min_op with this operator, set min_op as
one of the two operators with less priority.
2.3 if the token is left parenthese, push one item in the stack,
with left_pa = position of the parenthesis.
2.4 if the token is right parenthesis:
2.4.1 we have the pair of the parentheses(left_pa and the
right parenthesis)
2.4.2 pop the item
2.4.3 pre-read next token, if it is an operator, set it
as right operator
2.4.4 compare min_op of the item with left_op and right operator
(if any of them exists), we can easily get to know if the pair
of the parentheses is redundant, and output it(if the min_op
< any of left_op and right operator, the parentheses are necessary,
if min_op = left_op, the parentheses are necessary, otherwise
redundant)
2.4.5 if there is no left_op and no right operator(which also means
min_op = nil) and the stack is not empty, set the min_op of top
item as the min_op of the popped-up item
examples
The code below implements a straightforward solution. It is limited to +, -, *, and /, but it can be extended to handle other operators if needed.
#include <iostream>
#include <set>
#include <stack>
int loc;
std::string parser(std::string input, int _loc) {
std::set<char> support = {'+', '-', '*', '/'};
std::string expi;
std::set<char> op;
loc = _loc;
while (true) {
if (input[loc] == '(') {
expi += parser(input, loc + 1);
} else if (input[loc] == ')') {
if ((input[loc + 1] != '*') && (input[loc + 1] != '/')) {
return expi;
} else {
if ((op.find('+') == op.end()) && (op.find('-') == op.end())) {
return expi;
} else {
return '(' + expi + ')';
}
}
} else {
char temp = input[loc];
expi = expi + temp;
if (support.find(temp) != support.end()) {
op.insert(temp);
}
}
loc++;
if (loc >= input.size()) {
break;
}
}
return expi;
}
int main() {
std::string input("(((a)+((b*c)))+(d*(f*g)))");
std::cout << parser(input, 0);
return 0;
}
I coded it previously in https://calculation-test.211368e.repl.co/trim.html. This doesn't have some errors in other answers.
(6 / (-2454) ** (((234)))) + (-5435) --> 6 / (-2454) ** 234 + (-5435)
const format = expression => {
var change = [], result = expression.replace(/ /g, "").replace(/\*\*/g, "^"), _count;
function replace(index, string){result = result.slice(0, index) + string + result.slice(index + 1)}
function add(index, string){result = result.slice(0, index) + string + result.slice(index)}
for (var count = 0; count < result.length; count++){
if (result[count] == "-"){
if ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890)".includes(result[count - 1])){
change.push(count);
}else if (result[count - 1] != "("){
add(count, "(");
count++;
_count = count + 1;
while ("1234567890.".includes(result[_count])) _count++;
if (_count < result.length - 1){
add(_count, ")");
}else{
add(_count + 2, ")");
}
}
}
}
change = change.sort(function(a, b){return a - b});
const len = change.length;
for (var count = 0; count < len; count++){replace(change[0] + count * 2, " - "); change.shift()}
return result.replace(/\*/g, " * ").replace(/\^/g, " ** ").replace(/\//g, " / ").replace(/\+/g, " + ");
}
const trim = expression => {
var result = format(expression).replace(/ /g, "").replace(/\*\*/g, "^"), deleting = [];
const brackets = bracket_pairs(result);
function bracket_pairs(){
function findcbracket(str, pos){
const rExp = /\(|\)/g;
rExp.lastIndex = pos + 1;
var depth = 1;
while ((pos = rExp.exec(str))) if (!(depth += str[pos.index] == "(" ? 1 : -1 )) {return pos.index}
}
function occurences(searchStr, str){
var startIndex = 0, index, indices = [];
while ((index = str.indexOf(searchStr, startIndex)) > -1){
indices.push(index);
startIndex = index + 1;
}
return indices;
}
const obrackets = occurences("(", result);
var cbrackets = [];
for (var count = 0; count < obrackets.length; count++) cbrackets.push(findcbracket(result, obrackets[count]));
return obrackets.map((e, i) => [e, cbrackets[i]]);
}
function remove(deleting){
function _remove(index){result = result.slice(0, index) + result.slice(index + 1)}
const len = deleting.length;
var deleting = deleting.sort(function(a, b){return a - b});
for (var count = 0; count < len; count++){
_remove(deleting[0] - count);
deleting.shift()
}
}
function precedence(operator, position){
if (!"^/*-+".includes(operator)) return "^/*-+";
if (position == "l" || position == "w") return {"^": "^", "/": "^", "*": "^/*", "-": "^/*", "+": "^/*-+"}[operator];
if (position == "r") return {"^": "^", "/": "^/*", "*": "^/*", "-": "^/*-+", "+": "^/*-+"}[operator];
}
function strip_bracket(string){
var result = "", level = 0;
for (var count = 0; count < string.length; count++){
if (string.charAt(count) == "(") level++;
if (level == 0) result += string.charAt(count);
if (string.charAt(count) == ")") level--;
}
return result.replace(/\s{2,}/g, " ");
}
for (var count = 0; count < brackets.length; count++){
const pair = brackets[count];
if (result[pair[0] - 1] == "(" && result[pair[1] + 1] == ")"){
deleting.push(...pair);
}else{
const left = precedence(result[pair[0] - 1], "l"), right = precedence(result[pair[1] + 1], "r");
var contents = strip_bracket(result.slice(pair[0] + 1, pair[1])), within = "+";
for (var _count = 0; _count < contents.length; _count++) if (precedence(contents[_count], "w").length < precedence(within, "w").length) within = contents[_count];
if (/^[0-9]+$/g.test(contents) || contents == ""){
deleting.push(...pair);
continue;
}
if (left.includes(within) && right.includes(within)){
if (!isNaN(result.slice(pair[0] + 1, pair[1]))){
if (Number(result.slice(pair[0] + 1, pair[1])) >= 0 && !"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}else if (!"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890".includes(result[pair[0] - 1])) deleting.push(...pair);
}
}
}
remove(deleting);
result = format(result);
return result;
}
<input id="input">
<button onclick="document.getElementById('result').innerHTML = trim(document.getElementById('input').value)">Remove and format</button>
<div id="result"></div>
I think that you are looking for kind of algorithm as seen in the following photo.
This algorithm is "almost" ready, since a lot of bugs arise once the more complex it becomes, the more complicated it gets. The way I work on this thing, is 'build-and-write-code-on-the-fly', which means that for up to 4 parentheses, things are easy. But after the expression goes more complex, there are things that I cannot predict while writing down thoughts on paper. And there comes the compiler to tell me what to correct. It would not be a lie if I state that it is not me to have written the algorithm, but the (C#) compiler instead! So far, it took me 1400 lines. It is not that the commands were difficult to write. It was their arrangement that was a real puzzle. This program you are looking for, is characterized by a really high grade of complexity. Well, if you need any primary ideas, please let me know and I will reply. Thanx!
Algorithm

Find all possible combinations of a String representation of a number

Given a mapping:
A: 1
B: 2
C: 3
...
...
...
Z: 26
Find all possible ways a number can be represented. E.g. For an input: "121", we can represent it as:
ABA [using: 1 2 1]
LA [using: 12 1]
AU [using: 1 21]
I tried thinking about using some sort of a dynamic programming approach, but I am not sure how to proceed. I was asked this question in a technical interview.
Here is a solution I could think of, please let me know if this looks good:
A[i]: Total number of ways to represent the sub-array number[0..i-1] using the integer to alphabet mapping.
Solution [am I missing something?]:
A[0] = 1 // there is only 1 way to represent the subarray consisting of only 1 number
for(i = 1:A.size):
A[i] = A[i-1]
if(input[i-1]*10 + input[i] < 26):
A[i] += 1
end
end
print A[A.size-1]
To just get the count, the dynamic programming approach is pretty straight-forward:
A[0] = 1
for i = 1:n
A[i] = 0
if input[i-1] > 0 // avoid 0
A[i] += A[i-1];
if i > 1 && // avoid index-out-of-bounds on i = 1
10 <= (10*input[i-2] + input[i-1]) <= 26 // check that number is 10-26
A[i] += A[i-2];
If you instead want to list all representations, dynamic programming isn't particularly well-suited for this, you're better off with a simple recursive algorithm.
First off, we need to find an intuitive way to enumerate all the possibilities. My simple construction, is given below.
let us assume a simple way to represent your integer in string format.
a1 a2 a3 a4 ....an, for instance in 121 a1 -> 1 a2 -> 2, a3 -> 1
Now,
We need to find out number of possibilities of placing a + sign in between two characters. + is to mean characters concatenation here.
a1 - a2 - a3 - .... - an, - shows the places where '+' can be placed. So, number of positions is n - 1, where n is the string length.
Assume a position may or may not have a + symbol shall be represented as a bit.
So, this boils down to how many different bit strings are possible with the length of n-1, which is clearly 2^(n-1). Now in order to enumerate the possibilities go through every bit string and place right + signs in respective positions to get every representations,
For your example, 121
Four bit strings are possible 00 01 10 11
1 2 1
1 2 + 1
1 + 2 1
1 + 2 + 1
And if you see a character followed by a +, just add the next char with the current one and do it sequentially to get the representation,
x + y z a + b + c d
would be (x+y) z (a+b+c) d
Hope it helps.
And you will have to take care of edge cases where the size of some integer > 26, of course.
I think, recursive traverse through all possible combinations would do just fine:
mapping = {"1":"A", "2":"B", "3":"C", "4":"D", "5":"E", "6":"F", "7":"G",
"8":"H", "9":"I", "10":"J",
"11":"K", "12":"L", "13":"M", "14":"N", "15":"O", "16":"P",
"17":"Q", "18":"R", "19":"S", "20":"T", "21":"U", "22":"V", "23":"W",
"24":"A", "25":"Y", "26":"Z"}
def represent(A, B):
if A == B == '':
return [""]
ret = []
if A in mapping:
ret += [mapping[A] + r for r in represent(B, '')]
if len(A) > 1:
ret += represent(A[:-1], A[-1]+B)
return ret
print represent("121", "")
Assuming you only need to count the number of combinations.
Assuming 0 followed by an integer in [1,9] is not a valid concatenation, then a brute-force strategy would be:
Count(s,n)
x=0
if (s[n-1] is valid)
x=Count(s,n-1)
y=0
if (s[n-2] concat s[n-1] is valid)
y=Count(s,n-2)
return x+y
A better strategy would be to use divide-and-conquer:
Count(s,start,n)
if (len is even)
{
//split s into equal left and right part, total count is left count multiply right count
x=Count(s,start,n/2) + Count(s,start+n/2,n/2);
y=0;
if (s[start+len/2-1] concat s[start+len/2] is valid)
{
//if middle two charaters concatenation is valid
//count left of the middle two characters
//count right of the middle two characters
//multiply the two counts and add to existing count
y=Count(s,start,len/2-1)*Count(s,start+len/2+1,len/2-1);
}
return x+y;
}
else
{
//there are three cases here:
//case 1: if middle character is valid,
//then count everything to the left of the middle character,
//count everything to the right of the middle character,
//multiply the two, assign to x
x=...
//case 2: if middle character concatenates the one to the left is valid,
//then count everything to the left of these two characters
//count everything to the right of these two characters
//multiply the two, assign to y
y=...
//case 3: if middle character concatenates the one to the right is valid,
//then count everything to the left of these two characters
//count everything to the right of these two characters
//multiply the two, assign to z
z=...
return x+y+z;
}
The brute-force solution has time complexity of T(n)=T(n-1)+T(n-2)+O(1) which is exponential.
The divide-and-conquer solution has time complexity of T(n)=3T(n/2)+O(1) which is O(n**lg3).
Hope this is correct.
Something like this?
Haskell code:
import qualified Data.Map as M
import Data.Maybe (fromJust)
combs str = f str [] where
charMap = M.fromList $ zip (map show [1..]) ['A'..'Z']
f [] result = [reverse result]
f (x:xs) result
| null xs =
case M.lookup [x] charMap of
Nothing -> ["The character " ++ [x] ++ " is not in the map."]
Just a -> [reverse $ a:result]
| otherwise =
case M.lookup [x,head xs] charMap of
Just a -> f (tail xs) (a:result)
++ (f xs ((fromJust $ M.lookup [x] charMap):result))
Nothing -> case M.lookup [x] charMap of
Nothing -> ["The character " ++ [x]
++ " is not in the map."]
Just a -> f xs (a:result)
Output:
*Main> combs "121"
["LA","AU","ABA"]
Here is the solution based on my discussion here:
private static int decoder2(int[] input) {
int[] A = new int[input.length + 1];
A[0] = 1;
for(int i=1; i<input.length+1; i++) {
A[i] = 0;
if(input[i-1] > 0) {
A[i] += A[i-1];
}
if (i > 1 && (10*input[i-2] + input[i-1]) <= 26) {
A[i] += A[i-2];
}
System.out.println(A[i]);
}
return A[input.length];
}
Just us breadth-first search.
for instance 121
Start from the first integer,
consider 1 integer character first, map 1 to a, leave 21
then 2 integer character map 12 to L leave 1.
This problem can be done in o(fib(n+2)) time with a standard DP algorithm.
We have exactly n sub problems and button up we can solve each problem with size i in o(fib(i)) time.
Summing the series gives fib (n+2).
If you consider the question carefully you see that it is a Fibonacci series.
I took a standard Fibonacci code and just changed it to fit our conditions.
The space is obviously bound to the size of all solutions o(fib(n)).
Consider this pseudo code:
Map<Integer, String> mapping = new HashMap<Integer, String>();
List<String > iterative_fib_sequence(string input) {
int length = input.length;
if (length <= 1)
{
if (length==0)
{
return "";
}
else//input is a-j
{
return mapping.get(input);
}
}
List<String> b = new List<String>();
List<String> a = new List<String>(mapping.get(input.substring(0,0));
List<String> c = new List<String>();
for (int i = 1; i < length; ++i)
{
int dig2Prefix = input.substring(i-1, i); //Get a letter with 2 digit (k-z)
if (mapping.contains(dig2Prefix))
{
String word2Prefix = mapping.get(dig2Prefix);
foreach (String s in b)
{
c.Add(s.append(word2Prefix));
}
}
int dig1Prefix = input.substring(i, i); //Get a letter with 1 digit (a-j)
String word1Prefix = mapping.get(dig1Prefix);
foreach (String s in a)
{
c.Add(s.append(word1Prefix));
}
b = a;
a = c;
c = new List<String>();
}
return a;
}
old question but adding an answer so that one can find help
It took me some time to understand the solution to this problem – I refer accepted answer and #Karthikeyan's answer and the solution from geeksforgeeks and written my own code as below:
To understand my code first understand below examples:
we know, decodings([1, 2]) are "AB" or "L" and so decoding_counts([1, 2]) == 2
And, decodings([1, 2, 1]) are "ABA", "AU", "LA" and so decoding_counts([1, 2, 1]) == 3
using the above two examples let's evaluate decodings([1, 2, 1, 4]):
case:- "taking next digit as single digit"
taking 4 as single digit to decode to letter 'D', we get decodings([1, 2, 1, 4]) == decoding_counts([1, 2, 1]) because [1, 2, 1, 4] will be decode as "ABAD", "AUD", "LAD"
case:- "combining next digit with the previous digit"
combining 4 with previous 1 as 14 as a single to decode to letter N, we get decodings([1, 2, 1, 4]) == decoding_counts([1, 2]) because [1, 2, 1, 4] will be decode as "ABN" or "LN"
Below is my Python code, read comments
def decoding_counts(digits):
# defininig count as, counts[i] -> decoding_counts(digits[: i+1])
counts = [0] * len(digits)
counts[0] = 1
for i in xrange(1, len(digits)):
# case:- "taking next digit as single digit"
if digits[i] != 0: # `0` do not have mapping to any letter
counts[i] = counts[i -1]
# case:- "combining next digit with the previous digit"
combine = 10 * digits[i - 1] + digits[i]
if 10 <= combine <= 26: # two digits mappings
counts[i] += (1 if i < 2 else counts[i-2])
return counts[-1]
for digits in "13", "121", "1214", "1234121":
print digits, "-->", decoding_counts(map(int, digits))
outputs:
13 --> 2
121 --> 3
1214 --> 5
1234121 --> 9
note: I assumed that input digits do not start with 0 and only consists of 0-9 and have a sufficent length
For Swift, this is what I came up with. Basically, I converted the string into an array and goes through it, adding a space into different positions of this array, then appending them to another array for the second part, which should be easy after this is done.
//test case
let input = [1,2,2,1]
func combination(_ input: String) {
var arr = Array(input)
var possible = [String]()
//... means inclusive range
for i in 2...arr.count {
var temp = arr
//basically goes through it backwards so
// adding the space doesn't mess up the index
for j in (1..<i).reversed() {
temp.insert(" ", at: j)
possible.append(String(temp))
}
}
print(possible)
}
combination(input)
//prints:
//["1 221", "12 21", "1 2 21", "122 1", "12 2 1", "1 2 2 1"]
def stringCombinations(digits, i=0, s=''):
if i == len(digits):
print(s)
return
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
total = 0
for j in range(i, min(i + 1, len(digits) - 1) + 1):
total = (total * 10) + digits[j]
if 0 < total <= 26:
stringCombinations(digits, j + 1, s + alphabet[total - 1])
if __name__ == '__main__':
digits = list()
n = input()
n.split()
d = list(n)
for i in d:
i = int(i)
digits.append(i)
print(digits)
stringCombinations(digits)

how to match dna sequence pattern

I am getting a trouble finding an approach to solve this problem.
Input-output sequences are as follows :
**input1 :** aaagctgctagag
**output1 :** a3gct2ag2
**input2 :** aaaaaaagctaagctaag
**output2 :** a6agcta2ag
Input nsequence can be of 10^6 characters and largest continuous patterns will be considered.
For example for input2 "agctaagcta" output will not be "agcta2gcta" but it will be "agcta2".
Any help appreciated.
Explanation of the algorithm:
Having a sequence S with symbols s(1), s(2),…, s(N).
Let B(i) be the best compressed sequence with elements s(1), s(2),…,s(i).
So, for example, B(3) will be the best compressed sequence for s(1), s(2), s(3).
What we want to know is B(N).
To find it, we will proceed by induction. We want to calculate B(i+1), knowing B(i), B(i-1), B(i-2), …, B(1), B(0), where B(0) is empty sequence, and and B(1) = s(1). At the same time, this constitutes a proof that the solution is optimal. ;)
To calculate B(i+1), we will pick the best sequence among the candidates:
Candidate sequences where the last block has one element:
B(i )s(i+1)1
B(i-1)s(i+1)2 ; only if s(i) = s(i+1)
B(i-2)s(i+1)3 ; only if s(i-1) = s(i) and s(i) = s(i+1)
…
B(1)s(i+1)[i-1] ; only if s(2)=s(3) and s(3)=s(4) and … and s(i) = s(i+1)
B(0)s(i+1)i = s(i+1)i ; only if s(1)=s(2) and s(2)=s(3) and … and s(i) = s(i+1)
Candidate sequences where the last block has 2 elements:
B(i-1)s(i)s(i+1)1
B(i-3)s(i)s(i+1)2 ; only if s(i-2)s(i-1)=s(i)s(i+1)
B(i-5)s(i)s(i+1)3 ; only if s(i-4)s(i-3)=s(i-2)s(i-1) and s(i-2)s(i-1)=s(i)s(i+1)
…
Candidate sequences where the last block has 3 elements:
…
Candidate sequences where the last block has 4 elements:
…
…
Candidate sequences where last block has n+1 elements:
s(1)s(2)s(3)………s(i+1)
For each possibility, the algorithm stops when the sequence block is no longer repeated. And that’s it.
The algorithm will be some thing like this in psude-c code:
B(0) = “”
for (i=1; i<=N; i++) {
// Calculate all the candidates for B(i)
BestCandidate=null
for (j=1; j<=i; j++) {
Calculate all the candidates of length (i)
r=1;
do {
Candidadte = B([i-j]*r-1) s(i-j+1)…s(i-1)s(i) r
If ( (BestCandidate==null)
|| (Candidate is shorter that BestCandidate))
{
BestCandidate=Candidate.
}
r++;
} while ( ([i-j]*r <= i)
&&(s(i-j*r+1) s(i-j*r+2)…s(i-j*r+j) == s(i-j+1) s(i-j+2)…s(i-j+j))
}
B(i)=BestCandidate
}
Hope that this can help a little more.
The full C program performing the required task is given below. It runs in O(n^2). The central part is only 30 lines of code.
EDIT I have restructured a little bit the code, changed the names of the variables and added some comment in order to be more readable.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
// This struct represents a compressed segment like atg4, g3, agc1
struct Segment {
char *elements;
int nElements;
int count;
};
// As an example, for the segment agagt3 elements would be:
// {
// elements: "agagt",
// nElements: 5,
// count: 3
// }
struct Sequence {
struct Segment lastSegment;
struct Sequence *prev; // Points to a sequence without the last segment or NULL if it is the first segment
int totalLen; // Total length of the compressed sequence.
};
// as an example, for the sequence agt32ta5, the representation will be:
// {
// lastSegment:{"ta" , 2 , 5},
// prev: #A,
// totalLen: 8
// }
// and A will be
// {
// lastSegment{ "agt", 3, 32},
// prev: NULL,
// totalLen: 5
// }
// This function converts a sequence to a string.
// You have to free the string after using it.
// The strategy is to construct the string from right to left.
char *sequence2string(struct Sequence *S) {
char *Res=malloc(S->totalLen + 1);
char *digits="0123456789";
int p= S->totalLen;
Res[p]=0;
while (S!=NULL) {
// first we insert the count of the last element.
// We do digit by digit starting with the units.
int C = S->lastSegment.count;
while (C) {
p--;
Res[p] = digits[ C % 10 ];
C /= 10;
}
p -= S->lastSegment.nElements;
strncpy(Res + p , S->lastSegment.elements, S->lastSegment.nElements);
S = S ->prev;
}
return Res;
}
// Compresses a dna sequence.
// Returns a string with the in sequence compressed.
// The returned string must be freed after using it.
char *dnaCompress(char *in) {
int i,j;
int N = strlen(in);; // Number of elements of a in sequence.
// B is an array of N+1 sequences where B(i) is the best compressed sequence sequence of the first i characters.
// What we want to return is B[N];
struct Sequence *B;
B = malloc((N+1) * sizeof (struct Sequence));
// We first do an initialization for i=0
B[0].lastSegment.elements="";
B[0].lastSegment.nElements=0;
B[0].lastSegment.count=0;
B[0].prev = NULL;
B[0].totalLen=0;
// and set totalLen of all the sequences to a very HIGH VALUE in this case N*2 will be enougth, We will try different sequences and keep the minimum one.
for (i=1; i<=N; i++) B[i].totalLen = INT_MAX; // A very high value
for (i=1; i<=N; i++) {
// at this point we want to calculate B[i] and we know B[i-1], B[i-2], .... ,B[0]
for (j=1; j<=i; j++) {
// Here we will check all the candidates where the last segment has j elements
int r=1; // number of times the last segment is repeated
int rNDigits=1; // Number of digits of r
int rNDigitsBound=10; // We will increment r, so this value is when r will have an extra digit.
// when r = 0,1,...,9 => rNDigitsBound = 10
// when r = 10,11,...,99 => rNDigitsBound = 100
// when r = 100,101,.,999 => rNDigitsBound = 1000 and so on.
do {
// Here we analitze a candidate B(i).
// where the las segment has j elements repeated r times.
int CandidateLen = B[i-j*r].totalLen + j + rNDigits;
if (CandidateLen < B[i].totalLen) {
B[i].lastSegment.elements = in + i - j*r;
B[i].lastSegment.nElements = j;
B[i].lastSegment.count = r;
B[i].prev = &(B[i-j*r]);
B[i].totalLen = CandidateLen;
}
r++;
if (r == rNDigitsBound ) {
rNDigits++;
rNDigitsBound *= 10;
}
} while ( (i - j*r >= 0)
&& (strncmp(in + i -j, in + i - j*r, j)==0));
}
}
char *Res=sequence2string(&(B[N]));
free(B);
return Res;
}
int main(int argc, char** argv) {
char *compressedDNA=dnaCompress(argv[1]);
puts(compressedDNA);
free(compressedDNA);
return 0;
}
Forget Ukonnen. Dynamic programming it is. With 3-dimensional table:
sequence position
subsequence size
number of segments
TERMINOLOGY: For example, having a = "aaagctgctagag", sequence position coordinate would run from 1 to 13. At sequence position 3 (letter 'g'), having subsequence size 4, the subsequence would be "gctg". Understood? And as for the number of segments, then expressing a as "aaagctgctagag1" consists of 1 segment (the sequence itself). Expressing it as "a3gct2ag2" consists of 3 segments. "aaagctgct1ag2" consists of 2 segments. "a2a1ctg2ag2" would consist of 4 segments. Understood? Now, with this, you start filling a 3-dimensional array 13 x 13 x 13, so your time and memory complexity seems to be around n ** 3 for this. Are you sure you can handle it for million-bp sequences? I think that greedy approach would be better, because large DNA sequences are unlikely to repeat exactly. And, I would suggest that you widen your assignment to approximate matches, and you can publish it straight in a journal.
Anyway, you will start filling the table of compressing a subsequence starting at some position (dimension 1) with length equal to dimension 2 coordinate, having at most dimension 3 segments. So you first fill the first row, representing compressions of subsequences of length 1 consisting of at most 1 segment:
a a a g c t g c t a g a g
1(a1) 1(a1) 1(a1) 1(g1) 1(c1) 1(t1) 1(g1) 1(c1) 1(t1) 1(a1) 1(g1) 1(a1) 1(g1)
The number is the character cost (always 1 for these trivial 1-char sequences; number 1 does not count into the character cost), and in the parenthesis, you have the compression (also trivial for this simple case). The second row will be still simple:
2(a2) 2(a2) 2(ag1) 2(gc1) 2(ct1) 2(tg1) 2(gc1) 2(ct1) 2(ta1) 2(ag1) 2(ga1) 2(ag1)
There is only 1 way to decompose a 2-character sequence into 2 subsequences -- 1 character + 1 character. If they are identical, the result is like a + a = a2. If they are different, such as a + g, then, because only 1-segment sequences are admissible, the result cannot be a1g1, but must be ag1. The third row will be finally more interesting:
2(a3) 2(aag1) 3(agc1) 3(gct1) 3(ctg1) 3(tgc1) 3(gct1) 3(cta1) 3(tag1) 3(aga1) 3(gag1)
Here, you can always choose between 2 ways of composing the compressed string. For example, aag can be composed either as aa + g or a + ag. But again, we cannot have 2 segments, as in aa1g1 or a1ag1, so we must be satisfied with aag1, unless both components consist of the same character, as in aa + a => a3, with character cost 2. We can continue onto 4 th line:
4(aaag1) 4(aagc1) 4(agct1) 4(gctg1) 4(ctgc1) 4(tgct1) 4(gcta1) 4(ctag1) 4(taga1) 3(ag2)
Here, on the first position, we cannot use a3g1, because only 1 segment is allowed at this layer. But at the last position, compression to character cost 3 is agchieved by ag1 + ag1 = ag2. This way, one can fill the whole first-level table all the way up to the single subsequence of 13 characters, and each subsequence will have its optimal character cost and its compression under the first-level constraint of at most 1 segment associated with it.
Then you go to the 2nd level, where 2 segments are allowed... And again, from the bottom up, you identify the optimum cost and compression of each table coordinate under the given level's segment count constraint, by comparing all the possible ways to compose the subsequence using already computed positions, until you fill the table completely and thus compute the global optimum. There are some details to solve, but sorry, I'm not gonna code this for you.
After trying my own way for a while, my kudos to jbaylina for his beautiful algorithm and C implementation. Here's my attempted version of jbaylina's algorithm in Haskell, and below it further development of my attempt at a linear-time algorithm that attempts to compress segments that include repeated patterns in a one-by-one fashion:
import Data.Map (fromList, insert, size, (!))
compress s = (foldl f (fromList [(0,([],0)),(1,([s!!0],1))]) [1..n - 1]) ! n
where
n = length s
f b i = insert (size b) bestCandidate b where
add (sequence, sLength) (sequence', sLength') =
(sequence ++ sequence', sLength + sLength')
j' = [1..min 100 i]
bestCandidate = foldr combCandidates (b!i `add` ([s!!i,'1'],2)) j'
combCandidates j candidate' =
let nextCandidate' = comb 2 (b!(i - j + 1)
`add` ((take j . drop (i - j + 1) $ s) ++ "1", j + 1))
in if snd nextCandidate' <= snd candidate'
then nextCandidate'
else candidate' where
comb r candidate
| r > uBound = candidate
| not (strcmp r True) = candidate
| snd nextCandidate <= snd candidate = comb (r + 1) nextCandidate
| otherwise = comb (r + 1) candidate
where
uBound = div (i + 1) j
prev = b!(i - r * j + 1)
nextCandidate = prev `add`
((take j . drop (i - j + 1) $ s) ++ show r, j + length (show r))
strcmp 1 _ = True
strcmp num bool
| (take j . drop (i - num * j + 1) $ s)
== (take j . drop (i - (num - 1) * j + 1) $ s) =
strcmp (num - 1) True
| otherwise = False
Output:
*Main> compress "aaagctgctagag"
("a3gct2ag2",9)
*Main> compress "aaabbbaaabbbaaabbbaaabbb"
("aaabbb4",7)
Linear-time attempt:
import Data.List (sortBy)
group' xxs sAccum (chr, count)
| null xxs = if null chr
then singles
else if count <= 2
then reverse sAccum ++ multiples ++ "1"
else singles ++ if null chr then [] else chr ++ show count
| [x] == chr = group' xs sAccum (chr,count + 1)
| otherwise = if null chr
then group' xs (sAccum) ([x],1)
else if count <= 2
then group' xs (multiples ++ sAccum) ([x],1)
else singles
++ chr ++ show count ++ group' xs [] ([x],1)
where x:xs = xxs
singles = reverse sAccum ++ (if null sAccum then [] else "1")
multiples = concat (replicate count chr)
sequences ws strIndex maxSeqLen = repeated' where
half = if null . drop (2 * maxSeqLen - 1) $ ws
then div (length ws) 2 else maxSeqLen
repeated' = let (sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) = repeated
in (sequence,(sequenceStart, sequenceEnd'))
repeated = foldr divide ([],(strIndex,strIndex),False) [1..half]
equalChunksOf t a = takeWhile(==t) . map (take a) . iterate (drop a)
divide chunkSize b#(sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) =
let t = take (2*chunkSize) ws
t' = take chunkSize t
in if t' == drop chunkSize t
then let ts = equalChunksOf t' chunkSize ws
lenTs = length ts
sequenceEnd = strIndex + lenTs * chunkSize
newEnd = if sequenceEnd > sequenceEnd'
then sequenceEnd else sequenceEnd'
in if chunkSize > 1
then if length (group' (concat (replicate lenTs t')) [] ([],0)) > length (t' ++ show lenTs)
then (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),True)
else b
else if notSinglesFlag
then b
else (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),False)
else b
addOne a b
| null (fst b) = a
| null (fst a) = b
| otherwise =
let (((start,end,patLen,lenS),sequence):rest,(sStart,sEnd)) = a
(((start',end',patLen',lenS'),sequence'):rest',(sStart',sEnd')) = b
in if sStart' < sEnd && sEnd < sEnd'
then let c = ((start,end,patLen,lenS),sequence):rest
d = ((start',end',patLen',lenS'),sequence'):rest'
in (c ++ d, (sStart, sEnd'))
else a
segment xs baseIndex maxSeqLen = segment' xs baseIndex baseIndex where
segment' zzs#(z:zs) strIndex farthest
| null zs = initial
| strIndex >= farthest && strIndex > 0 = ([],(0,0))
| otherwise = addOne initial next
where
next#(s',(start',end')) = segment' zs (strIndex + 1) farthest'
farthest' | null s = farthest
| otherwise = if start /= end && end > farthest then end else farthest
initial#(s,(start,end)) = sequences zzs strIndex maxSeqLen
areExclusive ((a,b,_,_),_) ((a',b',_,_),_) = (a' >= b) || (b' <= a)
combs [] r = [r]
combs (x:xs) r
| null r = combs xs (x:r) ++ if null xs then [] else combs xs r
| otherwise = if areExclusive (head r) x
then combs xs (x:r) ++ combs xs r
else if l' > lowerBound
then combs xs (x: reduced : drop 1 r) ++ combs xs r
else combs xs r
where lowerBound = l + 2 * patLen
((l,u,patLen,lenS),s) = head r
((l',u',patLen',lenS'),s') = x
reduce = takeWhile (>=l') . iterate (\x -> x - patLen) $ u
lenReduced = length reduce
reduced = ((l,u - lenReduced * patLen,patLen,lenS - lenReduced),s)
buildString origStr sequences = buildString' origStr sequences 0 (0,"",0)
where
buildString' origStr sequences index accum#(lenC,cStr,lenOrig)
| null sequences = accum
| l /= index =
buildString' (drop l' origStr) sequences l (lenC + l' + 1, cStr ++ take l' origStr ++ "1", lenOrig + l')
| otherwise =
buildString' (drop u' origStr) rest u (lenC + length s', cStr ++ s', lenOrig + u')
where
l' = l - index
u' = u - l
s' = s ++ show lenS
(((l,u,patLen,lenS),s):rest) = sequences
compress [] _ accum = reverse accum ++ (if null accum then [] else "1")
compress zzs#(z:zs) maxSeqLen accum
| null (fst segment') = compress zs maxSeqLen (z:accum)
| (start,end) == (0,2) && not (null accum) = compress zs maxSeqLen (z:accum)
| otherwise =
reverse accum ++ (if null accum || takeWhile' compressedStr 0 /= 0 then [] else "1")
++ compressedStr
++ compress (drop lengthOriginal zzs) maxSeqLen []
where segment'#(s,(start,end)) = segment zzs 0 maxSeqLen
combinations = combs (fst $ segment') []
takeWhile' xxs count
| null xxs = 0
| x == '1' && null (reads (take 1 xs)::[(Int,String)]) = count
| not (null (reads [x]::[(Int,String)])) = 0
| otherwise = takeWhile' xs (count + 1)
where x:xs = xxs
f (lenC,cStr,lenOrig) (lenC',cStr',lenOrig') =
let g = compare ((fromIntegral lenC + if not (null accum) && takeWhile' cStr 0 == 0 then 1 else 0) / fromIntegral lenOrig)
((fromIntegral lenC' + if not (null accum) && takeWhile' cStr' 0 == 0 then 1 else 0) / fromIntegral lenOrig')
in if g == EQ
then compare (takeWhile' cStr' 0) (takeWhile' cStr 0)
else g
(lenCompressed,compressedStr,lengthOriginal) =
head $ sortBy f (map (buildString (take end zzs)) (map reverse combinations))
Output:
*Main> compress "aaaaaaaaabbbbbbbbbaaaaaaaaabbbbbbbbb" 100 []
"a9b9a9b9"
*Main> compress "aaabbbaaabbbaaabbbaaabbb" 100 []
"aaabbb4"

number which appears more than n/3 times in an array

I have read this problem
Find the most common entry in an array
and the answer from jon skeet is just mind blowing .. :)
Now I am trying to solve this problem find an element which occurs more than n/3 times in an array ..
I am pretty sure that we cannot apply the same method because there can be 2 such elements which will occur more than n/3 times and that gives false alarm of the count ..so is there any way we can tweak around jon skeet's answer to work for this ..?
Or is there any solution that will run in linear time ?
Jan Dvorak's answer is probably best:
Start with two empty candidate slots and two counters set to 0.
for each item:
if it is equal to either candidate, increment the corresponding count
else if there is an empty slot (i.e. a slot with count 0), put it in that slot and set the count to 1
else reduce both counters by 1
At the end, make a second pass over the array to check whether the candidates really do have the required count. This isn't allowed by the question you link to but I don't see how to avoid it for this modified version. If there is a value that occurs more than n/3 times then it will be in a slot, but you don't know which one it is.
If this modified version of the question guaranteed that there were two values with more than n/3 elements (in general, k-1 values with more than n/k) then we wouldn't need the second pass. But when the original question has k=2 and 1 guaranteed majority there's no way to know whether we "should" generalize it as guaranteeing 1 such element or guaranteeing k-1. The stronger the guarantee, the easier the problem.
Using Boyer-Moore Majority Vote Algorithm, we get:
vector<int> majorityElement(vector<int>& nums) {
int cnt1=0, cnt2=0;
int a,b;
for(int n: A){
if (n == a) cnt1++;
else if (n == b) cnt2++;
else if (cnt1 == 0){
cnt1++;
a = n;
}
else if (cnt2 == 0){
cnt2++;
b = n;
}
else{
cnt1--;
cnt2--;
}
}
cnt1=cnt2=0;
for(int n: nums){
if (n==a) cnt1++;
else if (n==b) cnt2++;
}
vector<int> result;
if (cnt1 > nums.size()/3) result.push_back(a);
if (cnt2 > nums.size()/3) result.push_back(b);
return result;
}
Updated, correction from #Vipul Jain
You can use Selection algorithm to find the number in the n/3 place and 2n/3.
n1=Selection(array[],n/3);
n2=Selection(array[],n2/3);
coun1=0;
coun2=0;
for(i=0;i<n;i++)
{
if(array[i]==n1)
count1++;
if(array[i]==n2)
count2++;
}
if(count1>n)
print(n1);
else if(count2>n)
print(n2);
else
print("no found!");
At line number five, the if statement should have one more check:
if(n!=b && (cnt1 == 0 || n == a))
I use the following Python solution to discuss the correctness of the algorithm:
class Solution:
"""
#param: nums: a list of integers
#return: The majority number that occurs more than 1/3
"""
def majorityNumber(self, nums):
if nums is None:
return None
if len(nums) == 0:
return None
num1 = None
num2 = None
count1 = 0
count2 = 0
# Loop 1
for i, val in enumerate(nums):
if count1 == 0:
num1 = val
count1 = 1
elif val == num1:
count1 += 1
elif count2 == 0:
num2 = val
count2 = 1
elif val == num2:
count2 += 1
else:
count1 -= 1
count2 -= 1
count1 = 0
count2 = 0
for val in nums:
if val == num1:
count1 += 1
elif val == num2:
count2 += 1
if count1 > count2:
return num1
return num2
First, we need to prove claim A:
Claim A: Consider a list C which contains a majority number m which occurs more floor(n/3) times. After 3 different numbers are removed from C, we have C'. m is the majority number of C'.
Proof: Use R to denote m's occurrence count in C. We have R > floor(n/3). R > floor(n/3) => R - 1 > floor(n/3) - 1 => R - 1 > floor((n-3)/3). Use R' to denote m's occurrence count in C'. And use n' to denote the length of C'. Since 3 different numbers are removed, we have R' >= R - 1. And n'=n-3 is obvious. We can have R' > floor(n'/3) from R - 1 > floor((n-3)/3). So m is the majority number of C'.
Now let's prove the correctness of the loop 1. Define L as count1 * [num1] + count2 * [num2] + nums[i:]. Use m to denote the majority number.
Invariant
The majority number m is in L.
Initialization
At the start of the first itearation, L is nums[0:]. So the invariant is trivially true.
Maintenance
if count1 == 0 branch: Before the iteration, L is count2 * [num2] + nums[i:]. After the iteration, L is 1 * [nums[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.
if val == num1 branch: Before the iteration, L is count1 * [nums[i]] + count2 * [num2] + nums[i:]. After the iteration, L is (count1+1) * [num[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.
f count2 == 0 branch: Similar to condition 1.
elif val == num2 branch: Similar to condition 2.
else branch: nums[i], num1 and num2 are different to each other in this case. After the iteration, L is (count1-1) * [num1] + (count2-1) * [num2] + nums[i+1:]. In other words, three different numbers are moved from count1 * [num1] + count2 * [num2] + nums[i:]. From claim A, we know m is the majority number of L.So the invariant is maintained.
Termination
When the loop terminates, nums[n:] is empty. L is count1 * [num1] + count2 * [num2].
So when the loop terminates, the majority number is either num1 or num2.
If there are n elements in the array , and suppose in the worst case only 1 element is repeated n/3 times , then the probability of choosing one number that is not the one which is repeated n/3 times will be (2n/3)/n that is 1/3 , so if we randomly choose N elements from the array of size ‘n’, then the probability that we end up choosing the n/3 times repeated number will be atleast 1-(2/3)^N . If we eqaute this to say 99.99 percent probability of getting success, we will get N=23 for any value of “n”.
Therefore just choose 23 numbers randomly from the list and count their occurrences , if we get count greater than n/3 , we will return that number and if we didn’t get any solution after checking for 23 numbers randomly , return -1;
The algorithm is essentially O(n) as the value 23 doesn’t depend on n(size of list) , so we have to just traverse array 23 times at worst case of algo.
Accepted Code on interviewbit(C++):
int n=A.size();
int ans,flag=0;
for(int i=0;i<23;i++)
{
int index=rand()%n;
int elem=A[index];
int count=0;
for(int i=0;i<n;i++)
{
if(A[i]==elem)
count++;
}
if(count>n/3)
{
flag=1;
ans=elem;
}
if(flag==1)
break;
}
if(flag==1)
return ans;
else return -1;
}

Resources