Counting Binary Strings - algorithm

This is in reference to this problem. We are required to calculate f(n , k), which is the number of binary strings of length n that have the length of the longest substring of ones as k. I am having trouble coming up with a recursion.
The case when the ith digit is a 0 , i think i can handle.
Specifically, I am unable to extend the solution to a sub-problem f(i-1 , j) , when I consider the ith digit to be a 1. how do i stitch the two together?
Sorry if I am a bit unclear. Any pointers would be a great help. Thanks.

I think you could build up a table using a variation of dynamic programming, if you expand the state space. Suppose that you calculate f(n,k,e) defined as the number of different binary strings of length n with the longest substring of 1s length at most k and ending with e 1s in a row. If you have calculated f(n,k,e) for all possible values of k and e associated with a given n, then, because you have the values split up by e, you can calculate f(n+1,k,e) for all possible values of k and e - what happens to an n-long string when you extend it with 0 or 1 depends on how many 1s it ends with at the moment, and you know that because of e.

Let s be the start index of the length k pattern. Then s is in: 1 to n-k.
For each s, we divide the Sting S into three strings:
PRE(s,k,n) = S[1:s-1]
POST(s,k,n)=S[s+k-1:n]
ONE(s,k,n) which has all 1s from S[s] to S[s+k-1]
The longest sub-string of 1s for PRE and POST should be less than k.
Let
x = s-1
y = n-(s+k)-1
Let NS(p,k) is total number of ways you can have a longest sub-string of size greater than equal to k.
NS(p,k) = sum{f(p,k), f(p,k+1),... f(p,p)}
Terminating condition:
NS(p,k) = 1 if p==k, 0 if k>p
f(n,k) = 1 if n==k, 0, if k > n.
For a string of length n, the number of permutations such that the longest substring of 1s is of size less than k = 2^n - NS(n,k).
f(n,k) = Sum over all s=1 to n-k
{2^x - NS(x,k)}*{2^y - NS(y,k)}
i.e. product of the number of permutations of each of the pre and post substrings where the longest sub-string is less than size k.
So we have a repeating sub-problem, and a whole bunch of reuse which can be DPed
Added Later:
Based on the comment below, I guess we really do not need to go into NS.
We can define S(p,k) as
S(p,k) = sum{f(p,1), f(p,2),... f(p,k-1)}
and
f(n,k) = Sum over all s=1 to n-k
S(x,k)*S(y,k)

I know this is quite an old question if any one wants I can clarify my small answer..
Here is my code
#include<bits/stdc++.h>
using namespace std;
long long DP[64][64];
int main()
{
ios::sync_with_stdio(0);
cin.tie(0);
int i,j,k;
DP[1][0]=1;
DP[1][1]=1;
DP[0][0]=1;
cout<<"1 1\n";
for(i=2;i<=63;i++,cout<<"\n")
{
DP[i][0]=1;
DP[i][i]=1;
cout<<"1 ";
for(j=1;j<i;j++)
{
for(k=0;k<=j;k++)
DP[i][j]+=DP[i-k-1][j]+DP[i-j-1][k];
DP[i][j]-=DP[i-j-1][j];
cout<<DP[i][j]<<" ";
}
cout<<"1 ";
}
return 0;
}
DP[i][j] represents F(i,j) .
Transitions/Recurrence (Hard to think):
Considering F(i,j):
1)I can put k 1s on the right and seperate them using a 0 i.e
String + 0 + k times '1' .
F(i-k-1,j)
Note : k=0 signifies I am only keeping 0 at the right!
2) I am missing out the ways in which the right j+1 positions are filled with 0 and j '1' s and All the left do not form any consecutive string of length j !!
F(i-j-1,k) (Note I have used k to signify both just because I have done so in my Code , you can define other variables too!)

Related

Data structure to check if a static array does not contain an element of a given range

I'm stuck for hours on the following homework question for data-structures class:
You are given a static set S (i.e., S never changes) of n integers from {1, . . . , u}.
Describe a data structure of size O(n log u) that can answer the following queries in O(1) time:
Empty(i, j) - returns TRUE if and only if there is no element in S that is between i and j (where i and j are integers in {1, . . . , u}).
At first I thought of using a y-fast-trie.
Using y-fast-trie we can achieve O(n) space and O(loglogu) query (by finding the successor of i and check if it's bigger than j).
But O(loglogu) is not O(1)...
Then I thought maybe we can sort the array and create a second array of size n+1 of the ranges that are not in the array and then in the query we would check if [i, j] is a sub-range of one of the ranges but I didn't thought of any way to do it that uses O(nlogu) space and can answer the query in O(1).
I have no idea how to solve this and I feel like I'm not even close to the solution, any help would be nice.
We can create a x-fast-trie of S (takes O(nlogu) space) and save in each node the maximum and minimum value of a leaf in it's sub tree. Now we can use that to answer the Empty query in O(1). Like this:
Empty(i, j)
We first calculate xor(i,j) now the number of leading zeros in that number will be the number of leading bits i and j share in common let's mark this number as k. Now we'll take the first k bits of i (or j because they're equal) and check in the x-fast-trie hash table if there's a node that equels to those bits. If there isn't we'll return TRUE because any number between i and j would also have the same k leading bits and since there isn't any number with those leading bits there isn't any number between i and j. If there is let's mark that node as X.
if X->right->minimum > j and X->left->maximum < i we return TRUE and otherwise we return FALSE, because if this is false then there is a number between i and j and if it's true then all the numbers that are smaller than j are also smaller than i and all the numbers that are bigger than i are also bigger than j.
Sorry for bad English
You haven't clarify either the numbers given will be sorted or not. If not, sort them, while will take O(nlogn).
Find upper bound of i, say x. Find lower bound of j, say y.
Now just check 4 numbers. Numbers at index x, x+1, y-1 and y. If any of the numbers of the given array is between i and j return true. Otherwise return false.
If the given Set/Array is not sorted, then in this approach additional O(nlogn) is required to sort it. Memory requires O(n). For each query, it's O(1).
Consider a data structure consisting of
an array A[1,...,u] of size u such that A[i]=1 if i is present in S, and A[i]=0 otherwise. This array can be constructed from set S in O(n).
an array B[1,...,u] of size u which stores cumulative sum of A i.e. B[i] = A[1]+...+A[i]. This array can be constructed in O(u) from A using the relation B[i] = B[i-1] + A[i] for all i>1.
a function empty(i,j) which returns the desired Boolean query. If i==1, then define count = B[j], otherwise take count = B[j]-B[i-1]. Note that count gives the number of distinct elements in S lying in range [i,j]. Once we have count, simply return count==0. Clearly, each query takes O(1).
Edit: As pointed out in comments, the size of this data structure is O(u), which doesn't matches the constraints. But I hope it gives others an approximate target to shoot at.
It isn't a solution, but impossible to write it in a comment. There is an idea of how to solve the more specific task that possibly will help to solve the generic task from the question.
The specific task is the same except the following point, u = 1024. Also, it isn't a final solution, it is a rough sketch (for the specific task).
Data structure creation:
Create a bitmask for U = { 1, ..., u } - M = 0000.....100001, where Mᵥ = 1 when Uᵥ ∊ S, otherwice = 0.
Save bitmask M as 'unsigned intgers 32' array = G (32 items). Each item of G contains 32 items from M.
Combine integer H = bitmask where Hᵣ = 0 when Gᵣ = 0, otherwice = 1
Convert G to G that is HashMap r to Gᵣ. G is G but contains records for Gᵣ != 0 only.
Images in the following pseudocode use 8 bits except 32, just for simplicity.
Empty(i, j) {
I = i / 32
J = j / 32
if I != J {
if P == 0: return true
if P(I) == 0: return true
if P(J) == 0: return true
} else {
if P(J=I) == 0: return true
}
return false
}

Find longest positive substrings in binary string

Let's assume I have a string like 100110001010001. I'd like to find such substring that:
are as longest as possible
have total positive sum >0
So the longest substrings, that have more 1s than 0s.
For example for the string above 100110001010001 it would be: [10011]000[101]000[1]
Actually it's be satisfying to find the total length of those, in this case: 9.
Unfortunately I have no clue, how can it be done not in brute-force way. Any ideas, please?
As posted now, your question seems a bit unclear. The total length of valid substrings that are "as long as possible" could mean different things: for example, among other options, it could be (1) a list of the longest valid extension to the left of each index (which would allow overlaps in the list), (2) the longest combination of non-overlapping such longest left-extensions, (3) the longest combination of non-overlapping, valid substrings (where each substring is not necessarily the longest possible).
I will outline a method for (3) since it easily transforms to (1) or (2). Finding the longest left-extension from each index with more ones than zeros can be done in O(n log n) time and O(n) additional space (for just the longest valid substring in O(n) time, see here: Finding the longest non-negative sub array). With that preprocessing, finding the longest combination of valid, non-overlapping substrings can be done with dynamic programming in somewhat optimized O(n^2) time and O(n) additional space.
We start by traversing the string, storing sums representing the partial sum up to and including s[i], counting zeros as -1. We insert each partial sum in a binary tree where each node also stores an array of indexes where the value occurs, and the leftmost index of a value less than the node's value. (A substring from s[a] to s[b] has more ones than zeros if the prefix sum up to b is greater than the prefix sum up to a.) If a value is already in the tree, we add the index to the node's index array.
Since we are traversing from left to right, only when a new lowest value is inserted into the tree is the leftmost-index-of-lower-value updated — and it's updated only for the node with the previous lowest value. This is because any nodes with a lower value would not need updating; and if any nodes with lower values were already in the tree, any nodes with higher values would already have stored the index of the earliest one inserted.
The longest valid substring to the left of each index extends to the leftmost index with a lower prefix sum, which can be easily looked up in the tree.
To get the longest combination, let f(i) represent the longest combination up to index i. Then f(i) equals the maximum of the length of each valid left extension possible to index j added to f(j-1).
Dynamic programming.
We have a string. If it is positive, that's our answer. Otherwise we need to trim each end until it goes positive, and find each pattern of trims. So for each length (N-1, N-2, N-3) etc, we've got N- length possible paths (trim from a, trim from b) each of which give us a state. When state goes positive, we've found out substring.
So two lists of integers, representing what happens if we trim entirely from a or entirely from b. Then backtrack. If we trim 1 from a, we must trim all the rest from b, if we trim two from a, we must trim one fewer from b. Is there an answer that allows us to go positive?
We can quickly eliminate because the answer must be at a maximum, either max trimming from a or max trimming from b. If the other trim allows us go positive, that's the result.
pseudocode:
N = length(string);
Nones = countones(string);
Nzeros = N - Nones;
if(Nones > Nzeroes)
return string
vector<int> cuta;
vector<int> cutb;
int besta = Nones - Nzeros;
int bestb = Nones - Nzeros;
cuta.push_back(besta);
cutb.push_back(bestb);
bestia = 0;
bestib = 0;
for(i=0;i<N;i++)
{
cuta.push_back( string[i] == 1 ? cuta.back() - 1 : cuta.back() +1);
cutb.push_back( string[N-i-1] == 1 ? cutb.back() -1 : cutb.back()+1);
if(cuta.back() > besta)
{
besta = cuta.back();
bestia = i;
}
if(cutb.back() > bestb)
{
bestb = cutb.back();
bestib = i;
}
// checks, is a cut from wholly from a or b going to send us positive
if(besta == 1)
answer = substring(string, bestia, N);
if(bestb == 1)
answer = substring(string, 0, N - bestib);
// if not, is a combined cut from current position to the
// the peak in the other distribution going to send us positive?
if(Nones - Nzeros + besta + cutb.back() == 1)
{
answer = substring(string, bestai, N - i);
}
if(Nones - Nzeros + cuta.back() + bestb == 1)
{
answer = substring(string, i, N - bestbi);
}
}
/*if we get here the string was all zeros and no positive substring */
This is untested and the final checks are a bit fiddly and I might have
made an error somewhere, but the algorithm should work more or less
as described.

Minimal number of swaps?

There are N characters in a string of types A and B in the array (same amount of each type). What is the minimal number of swaps to make sure that no two adjacent chars are same if we can only swap two adjacent characters ?
For example, input is:
AAAABBBB
The minimal number of swaps is 6 to make the array ABABABAB. But how would you solve it for any kind of input ? I can only think of O(N^2) solution. Maybe some kind of sort ?
If we need just to count swaps, then we can do it with O(N).
Let's assume for simplicity that array X of N elements should become ABAB... .
GetCount()
swaps = 0, i = -1, j = -1
for(k = 0; k < N; k++)
if(k % 2 == 0)
i = FindIndexOf(A, max(k, i))
X[k] <-> X[i]
swaps += i - k
else
j = FindIndexOf(B, max(k, j))
X[k] <-> X[j]
swaps += j - k
return swaps
FindIndexOf(element, index)
while(index < N)
if(X[index] == element) return index
index++
return -1; // should never happen if count of As == count of Bs
Basically, we run from left to right, and if a misplaced element is found, it gets exchanged with the correct element (e.g. abBbbbA** --> abAbbbB**) in O(1). At the same time swaps are counted as if the sequence of adjacent elements would be swapped instead. Variables i and j are used to cache indices of next A and B respectively, to make sure that all calls together of FindIndexOf are done in O(N).
If we need to sort by swaps then we cannot do better than O(N^2).
The rough idea is the following. Let's consider your sample: AAAABBBB. One of Bs needs O(N) swaps to get to the A B ... position, another B needs O(N) to get to A B A B ... position, etc. So we get O(N^2) at the end.
Observe that if any solution would swap two instances of the same letter, then we can find a better solution by dropping that swap, which necessarily has no effect. An optimal solution therefore only swaps differing letters.
Let's view the string of letters as an array of indices of one kind of letter (arbitrarily chosen, say A) into the string. So AAAABBBB would be represented as [0, 1, 2, 3] while ABABABAB would be [0, 2, 4, 6].
We know two instances of the same letter will never swap in an optimal solution. This lets us always safely identify the first (left-most) instance of A with the first element of our index array, the second instance with the second element, etc. It also tells us our array is always in sorted order at each step of an optimal solution.
Since each step of an optimal solution swaps differing letters, we know our index array evolves at each step only by incrementing or decrementing a single element at a time.
An initial string of length n = 2k will have an array representation A of length k. An optimal solution will transform this array to either
ODDS = [1, 3, 5, ... 2k]
or
EVENS = [0, 2, 4, ... 2k - 1]
Since we know in an optimal solution instances of a letter do not pass each other, we can conclude an optimal solution must spend min(abs(ODDS[0] - A[0]), abs(EVENS[0] - A[0])) swaps to put the first instance in correct position.
By realizing the EVENS or ODDS choice is made only once (not once per letter instance), and summing across the array, we can count the minimum number of needed swaps as
define count_swaps(length, initial, goal)
total = 0
for i from 0 to length - 1
total += abs(goal[i] - initial[i])
end
return total
end
define count_minimum_needed_swaps(k, A)
return min(count_swaps(k, A, EVENS), count_swaps(k, A, ODDS))
end
Notice the number of loop iterations implied by count_minimum_needed_swaps is 2 * k = n; it runs in O(n) time.
By noting which term is smaller in count_minimum_needed_swaps, we can also tell which of the two goal states is optimal.
Since you know N, you can simply write a loop that generates the values with no swaps needed.
#define N 4
char array[N + N];
for (size_t z = 0; z < N + N; z++)
{
array[z] = 'B' - ((z & 1) == 0);
}
return 0; // The number of swaps
#Nemo and #AlexD are right. The algorithm is order n^2. #Nemo misunderstood that we are looking for a reordering where two adjacent characters are not the same, so we can not use that if A is after B they are out of order.
Lets see the minimum number of swaps.
We dont care if our first character is A or B, because we can apply the same algorithm but using A instead of B and viceversa everywhere. So lets assume that the length of the word WORD_N is 2N, with N As and N Bs, starting with an A. (I am using length 2N to simplify the calculations).
What we will do is try to move the next B right to this A, without taking care of the positions of the other characters, because then we will have reduce the problem to reorder a new word WORD_{N-1}. Lets also assume that the next B is not just after A if the word has more that 2 characters, because then the first step is done and we reduce the problem to the next set of characters, WORD_{N-1}.
The next B should be as far as possible to be in the worst case, so it is after half of the word, so we need $N-1$ swaps to put this B after the A (maybe less than that). Then our word can be reduced to WORD_N = [A B WORD_{N-1}].
We se that we have to perform this algorithm as most N-1 times, because the last word (WORD_1) will be already ordered. Performing the algorithm N-1 times we have to make
N_swaps = (N-1)*N/2.
where N is half of the lenght of the initial word.
Lets see why we can apply the same algorithm for WORD_{N-1} also assuming that the first word is A. In this case it matters than the first word should be the same as in the already ordered pair. We can be sure that the first character in WORD_{N-1} is A because it was the character just next to the first character in our initial word, ant if it was B the first work can perform only a swap between these two words and or none and we will already have WORD_{N-1} starting with the same character than WORD_{N}, while the first two characters of WORD_{N} are different at the cost of almost 1 swap.
I think this answer is similar to the answer by phs, just in Haskell. The idea is that the resultant-indices for A's (or B's) are known so all we need to do is calculate how far each starting index has to move and sum the total.
Haskell code:
Prelude Data.List> let is = elemIndices 'B' "AAAABBBB"
in minimum
$ map (sum . zipWith ((abs .) . (-)) is) [[1,3..],[0,2..]]
6 --output

Minimum sum that cant be obtained from a set

Given a set S of positive integers whose elements need not to be distinct i need to find minimal non-negative sum that cant be obtained from any subset of the given set.
Example : if S = {1, 1, 3, 7}, we can get 0 as (S' = {}), 1 as (S' = {1}), 2 as (S' = {1, 1}), 3 as (S' = {3}), 4 as (S' = {1, 3}), 5 as (S' = {1, 1, 3}), but we can't get 6.
Now we are given one array A, consisting of N positive integers. Their are M queries,each consist of two integers Li and Ri describe i'th query: we need to find this Sum that cant be obtained from array elements ={A[Li], A[Li+1], ..., A[Ri-1], A[Ri]} .
I know to find it by a brute force approach to be done in O(2^n). But given 1 ≤ N, M ≤ 100,000.This cant be done .
So is their any effective approach to do it.
Concept
Suppose we had an array of bool representing which numbers so far haven't been found (by way of summing).
For each number n we encounter in the ordered (increasing values) subset of S, we do the following:
For each existing True value at position i in numbers, we set numbers[i + n] to True
We set numbers[n] to True
With this sort of a sieve, we would mark all the found numbers as True, and iterating through the array when the algorithm finishes would find us the minimum unobtainable sum.
Refinement
Obviously, we can't have a solution like this because the array would have to be infinite in order to work for all sets of numbers.
The concept could be improved by making a few observations. With an input of 1, 1, 3, the array becomes (in sequence):
(numbers represent true values)
An important observation can be made:
(3) For each next number, if the previous numbers had already been found it will be added to all those numbers. This implies that if there were no gaps before a number, there will be no gaps after that number has been processed.
For the next input of 7 we can assert that:
(4) Since the input set is ordered, there will be no number less than 7
(5) If there is no number less than 7, then 6 cannot be obtained
We can come to a conclusion that:
(6) the first gap represents the minimum unobtainable number.
Algorithm
Because of (3) and (6), we don't actually need the numbers array, we only need a single value, max to represent the maximum number found so far.
This way, if the next number n is greater than max + 1, then a gap would have been made, and max + 1 is the minimum unobtainable number.
Otherwise, max becomes max + n. If we've run through the entire S, the result is max + 1.
Actual code (C#, easily converted to C):
static int Calculate(int[] S)
{
int max = 0;
for (int i = 0; i < S.Length; i++)
{
if (S[i] <= max + 1)
max = max + S[i];
else
return max + 1;
}
return max + 1;
}
Should run pretty fast, since it's obviously linear time (O(n)). Since the input to the function should be sorted, with quicksort this would become O(nlogn). I've managed to get results M = N = 100000 on 8 cores in just under 5 minutes.
With numbers upper limit of 10^9, a radix sort could be used to approximate O(n) time for the sorting, however this would still be way over 2 seconds because of the sheer amount of sorts required.
But, we can use statistical probability of 1 being randomed to eliminate subsets before sorting. On the start, check if 1 exists in S, if not then every query's result is 1 because it cannot be obtained.
Statistically, if we random from 10^9 numbers 10^5 times, we have 99.9% chance of not getting a single 1.
Before each sort, check if that subset contains 1, if not then its result is one.
With this modification, the code runs in 2 miliseconds on my machine. Here's that code on http://pastebin.com/rF6VddTx
This is a variation of the subset-sum problem, which is NP-Complete, but there is a pseudo-polynomial Dynamic Programming solution you can adopt here, based on the recursive formula:
f(S,i) = f(S-arr[i],i-1) OR f(S,i-1)
f(-n,i) = false
f(_,-n) = false
f(0,i) = true
The recursive formula is basically an exhaustive search, each sum can be achieved if you can get it with element i OR without element i.
The dynamic programming is achieved by building a SUM+1 x n+1 table (where SUM is the sum of all elements, and n is the number of elements), and building it bottom-up.
Something like:
table <- SUM+1 x n+1 table
//init:
for each i from 0 to SUM+1:
table[0][i] = true
for each j from 1 to n:
table[j][0] = false
//fill the table:
for each i from 1 to SUM+1:
for each j from 1 to n+1:
if i < arr[j]:
table[i][j] = table[i][j-1]
else:
table[i][j] = table[i-arr[j]][j-1] OR table[i][j-1]
Once you have the table, you need the smallest i such that for all j: table[i][j] = false
Complexity of solution is O(n*SUM), where SUM is the sum of all elements, but note that the algorithm can actually be trimmed after the required number was found, without the need to go on for the next rows, which are un-needed for the solution.

Given an array of integers where some numbers repeat 1 time or 2 times but one number repeats 3 times, how do you find it?

Given an array of integers where some numbers repeat 1 time, some numbers repeat 2 times and only one number repeats 3 times, how do you find the number that repeat 3 times. Using hash was not allowed. Complexity of algorithm should be O(n)
I assume the array is not sorted, or similary, repeats of a number don't appear in one contiguous run. Otherwise, the problem is really trivial: just scan the array once with a window of size 3, and if each number in that window is the same, then that's the number that repeats 3 times in one contiguous run.
If the repeats are scattered, then the problem becomes more interesting.
Since this is homework, I will only give you a hint.
This problem is a cousin of where you're given an array of unsorted integers, and all numbers appear an even number of times, except one that appears an odd number of times.
That number can be found quite easily in O(N) by performing an exclusive-or of all the numbers in the array; the result is the number that appears an odd number of times.
The reason why this works is that x xor x = 0.
So for example, 3 xor 4 xor 7 xor 0 xor 4 xor 0 xor 3 = 7.
Use radix sort (which is linear in the number of bits required to specify the integers), then scan for the triplet.
Here's an answer that assumes max(A) is reasonably small, where A is the input array:
int ValidCount(int[] a, int[] b, int i, int n) {
int num = a[i];
int ret = 0;
if (b[3*num] >= 0 && b[3*num] < n && a[b[3*num]] == num) ret++;
if (b[3*num+1] >= 0 && b[3*num+1] < n && a[b[3*num+1]] == num) ret++;
if (b[3*num+1] >= 0 && b[3*num+2] < n && a[b[3*num+2]] == num) ret++;
b[3*num+ret] = i;
return ++ret;
}
int threerep(int[] A, int aSize) {
int *B = malloc(sizeof(int) * 3 * max(A, aSize)); /* Problematic if max(A) is large */
/* Note that we don't rely on B being initialized before use */
for(int i = 0; i < aSize; i++) {
if (ValidCount(A, B, i, aSize) == 3) return A[i];
}
return ERROR_NO_ANSWER;
}
Essentially, the problem is to compute the mode of the array. This solution works "ONLY" if the array range is [0,n-1]. Putting the solution here since the problem does not put a clause of the range.
Assume that 'n' is the size of the array
Scan the array and mark A[A[i]]=A[A[i]]+n -----> 1st pass
Divide each array element by 'n', i.e A[i]=A[i]/n ----> 2nd pass
The element with the maximum value from the 2nd pass is the answer.
This is O(n) with O(1) space (but with a range clause).
I am not aware of any algorithm to compute the mode in O(n),O(1) with no clauses on the range.
Well all I can think of is this but I'm sure your prof is looking for a tricky equation that will solve this in 1 scan. You can do it in 2 scans which is O(n) assuming that you can create a 2nd array of size (0 to max number in 1st array). Scan once, find max number in array. Create 2nd array of that size. Iterate over 1st array again using 2nd array as buckets to increment a count for each element in 1st array. Once you increment a bucket to 3 that's your solution. Not the best but it would work in some cases.
i dont see what all the fuss is about:
using python 2.6
and a simple function which goes over the list, counts occurances, once it finds a number that occurs 3 times, returns it.
>>> def find3(l):
from collections import defaultdict
d = defaultdict(int)
for n in l:
d[n]+=1
if d[n] == 3:
return n
>>> print find3([1,1,1,2,3,4,5,6,7])
1
>>> print find3([1,1,2,3,4,5,6,7,5])
None
>>> print find3([1,1,2,3,4,5,6,7,5,4,5,5])
5
Bugaoo's algorithm looks neat that is quoted below. Actually, we can generalize it by doing an extra pass before "1st pass" to find min(A) and max(A) and another extra pass to move each element in A to the range of min(A) and max(A), i.e., A[0]-min(A). After "1st pass" and "2nd pass" (note that we should mod the elements by max(A)-min(A) instead of n), we could add min(A) to the duplicated number found at last.
Essentially, the problem is to compute the mode of the array. This solution works "ONLY" if the array range is [0,n-1]. Putting the solution here since the problem does not put a clause of the range.
Assume that 'n' is the size of the arrayScan the array and mark A[A[i]]=A[A[i]]+n -----> 1st pass
Divide each array element by 'n', i.e A[i]=A[i]/n ----> 2nd pass
The element with the maximum value from the 2nd pass is the answer.
This is O(n) with O(1) space (but with a range clause).
I am not aware of any algorithm to compute the mode in O(n),O(1) with no clauses on the range.
If you know min and max of the integer sequence and min>=0, create an array [min, max] filled with zeros.
Scan the given array and if i occures, increment i-th position by one. After finishing you have frequency table in the second array, where the array position points to an integer.
int count[2^32];
for x in input:
count[x] = 0; // delete this loop if you can assume ram is cleared to 0.
for x in input:
count[x]++;
for x in input:
if count[x] == 3:
return x
Please excuse the mix of languages :-) Also, this is really stupid to have an array that can be indexed with any integer - you can do it on a 64bit system and it does meet the requirements.
This Algorithm looks pretty good one.... but i don't know its implementation.. Only pseudocode.... If any1 good try his hands on code(C programming), then please post it....
PseudoCode goes here...
Take two bitset arrays of size n. We can use this array to count upto three occurrences i.e. if array1[i] = 1 and array2[i] = 1, then it means we have three occurrences of i+1th element.
for each integer 'i'
if(array2[i] == 1)
array2[i] = 0, array1[i] = 1;
else
array2[i] = 1;
for each element K in the arrays
if (array1[k] && array2[k])
return k;
Complexity = O(n) and Space = 2n bits.
I'll present a solution that works in general general, such that one number occurs m times and others n times.
We need an operator that cancels out n occurrences of a integer but keeps m occurrences. If we convert each number to its binary representation, and for each bit position, count the number of times that this bit is set, the value will be a multiple of n for all numbers that occur n times, plus 0 or m for the corresponding bit of the lone integer.
If we then take modulo n of each of these counts, and divide by m, the result is the value of the corresponding bit position for the lone integer. All that's left is to convert the binary result to decimal formal.
For example, given array = [3, -4, 3, 3], m = 1 and n = 3:
Binary of 3 = 011
Binary of -4 (2s complement) = 11111111111111111111111111111100
Bit 0: 3 % 3 = 0
Bit 1: 3 % 3 = 0
Bit 2 through 32: 1 % 3 = 1
The result is -4

Resources