Permutation for mostly balanced bipartitions - algorithm

I'm working on a tree search algorithm where I use bipartitions of elements represented via a bitset, i.e. the bitset 1000101 represents the bipartition {0,2,6} {1,3,4,5}.
At the moment, I iterate through all bipartitions simply by incrementing a bitset, i.e. to iterate through all bipartitions of the set {0,1,2,3}, I go from 0001 (inclusive) to 1000 (exclusive)
Since my algorithm sometimes allows me to 'fail fast' when I have found a suitable bipartition, I want to reorder them such that I look at more balanced bipartitions first.
Thus, I wanted to ask if someone knows of a permutation of the numbers from 1 to 2^k where min(#set bits, #unset bits) more or less only decreases, which can still be computed efficiently.
Since this is a heuristic, I'm not looking for exact results, only a way to speed up my algorithm a bit.

Rory's comment took me in the right direction:
If we start with a fixed number of ones in the bitset, we can simply iterate over all of them using some bit-twiddling hacks.
Start from 0...01...1 first with k/2 ones, then with k/2 - 1, k/2 - 2 and so on.
For each starting value, iterate over all possible permutations of the bitset using Gosper's Hack until we reach the boundary of our bitset.
A simple implementation might look like this (for k <= 63)
for (int i = k / 2; i > 0; --i) {
// start with 0 ... 0 1 ... 1 (i times)
unsigned int v = (1 << i) - 1;
// first bitset that doesn't represent a valid bipartition
unsigned int end = 1 << k;
// without this, we would count some bipartitions twice for k even
if (k % 2 == 0 && i == k / 2) end >>= 1;
while(v < end) {
// do something with v...
// iterate to the lexicographically next permutation
unsigned int t = v | (v - 1);
v = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
}

Related

Find minimum sum that cannot be formed

Given positive integers from 1 to N where N can go upto 10^9. Some K integers from these given integers are missing. K can be at max 10^5 elements. I need to find the minimum sum that can't be formed from remaining N-K elements in an efficient way.
Example; say we have N=5 it means we have {1,2,3,4,5} and let K=2 and missing elements are: {3,5} then remaining array is now {1,2,4} the minimum sum that can't be formed from these remaining elements is 8 because :
1=1
2=2
3=1+2
4=4
5=1+4
6=2+4
7=1+2+4
So how to find this un-summable minimum?
I know how to find this if i can store all the remaining elements by this approach:
We can use something similar to Sieve of Eratosthenes, used to find primes. Same idea, but with different rules for a different purpose.
Store the numbers from 0 to the sum of all the numbers, and cross off 0.
Then take numbers, one at a time, without replacement.
When we take the number Y, then cross off every number that is Y plus some previously-crossed off number.
When we have done this for every number that is remaining, the smallest un-crossed-off number is our answer.
However, its space requirement is high. Can there be a better and faster way to do this?
Here's an O(sort(K))-time algorithm.
Let 1 &leq; x1 &leq; x2 &leq; … &leq; xm be the integers not missing from the set. For all i from 0 to m, let yi = x1 + x2 + … + xi be the partial sum of the first i terms. If it exists, let j be the least index such that yj + 1 < xj+1; otherwise, let j = m. It is possible to show via induction that the minimum sum that cannot be made is yj + 1 (the hypothesis is that, for all i from 0 to j, the numbers x1, x2, …, xi can make all of the sums from 0 to yi and no others).
To handle the fact that the missing numbers are specified, there is an optimization that handles several consecutive numbers in constant time. I'll leave it as an exercise.
Let X be a bitvector initialized to zero. For each number Ni you set X = (X | X << Ni) | Ni. (i.e. you can make Ni and you can increase any value you could make previously by Ni).
This will set a '1' for every value you can make.
Running time is linear in N, and bitvector operations are fast.
process 1: X = 00000001
process 2: X = (00000001 | 00000001 << 2) | (00000010) = 00000111
process 4: X = (00000111 | 00000111 << 4) | (00001000) = 01111111
First number you can't make is 8.
Here is my O(K lg K) approach. I didn't test it very much because of lazy-overflow, sorry about that. If it works for you, I can explain the idea:
const int MAXK = 100003;
int n, k;
int a[MAXK];
long long sum(long long a, long long b) { // sum of elements from a to b
return max(0ll, b * (b + 1) / 2 - a * (a - 1) / 2);
}
void answer(long long ans) {
cout << ans << endl;
exit(0);
}
int main()
{
cin >> n >> k;
for (int i = 1; i <= k; ++i) {
cin >> a[i];
}
a[0] = 0;
a[k+1] = n+1;
sort(a, a+k+2);
long long ans = 0;
for (int i = 1; i <= k+1; ++i) {
// interval of existing numbers [lo, hi]
int lo = a[i-1] + 1;
int hi = a[i] - 1;
if (lo <= hi && lo > ans + 1)
break;
ans += sum(lo, hi);
}
answer(ans + 1);
}
EDIT: well, thanks God #DavidEisenstat in his answer wrote the description of the approach I used, so I don't have to write it. Basically, what he mentions as exercise is not adding the "existing numbers" one by one, but all at the same time. Before this,you just need to check if some of them breaks the invariant, which can be done using binary search. Hope it helped.
EDIT2: as #DavidEisenstat pointed in the comments, the binary search is not needed, since only the first number in every interval of existing numbers can break the invariant. Modified the code accordingly.

Counting bounded slice codility

I have recently attended a programming test in codility, and the question is to find the Number of bounded slice in an array..
I am just giving you breif explanation of the question.
A Slice of an array said to be a Bounded slice if Max(SliceArray)-Min(SliceArray)<=K.
If Array [3,5,6,7,3] and K=2 provided .. the number of bounded slice is 9,
first slice (0,0) in the array Min(0,0)=3 Max(0,0)=3 Max-Min<=K result 0<=2 so it is bounded slice
second slice (0,1) in the array Min(0,1)=3 Max(0,1)=5 Max-Min<=K result 2<=2 so it is bounded slice
second slice (0,2) in the array Min(0,1)=3 Max(0,2)=6 Max-Min<=K result 3<=2 so it is not bounded slice
in this way you can find that there are nine bounded slice.
(0, 0), (0, 1), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3), (4, 4).
Following is the solution i have provided
private int FindBoundSlice(int K, int[] A)
{
int BoundSlice=0;
Stack<int> MinStack = new Stack<int>();
Stack<int> MaxStack = new Stack<int>();
for (int p = 0; p < A.Length; p++)
{
MinStack.Push(A[p]);
MaxStack.Push(A[p]);
for (int q = p; q < A.Length; q++)
{
if (IsPairBoundedSlice(K, A[p], A[q], MinStack, MaxStack))
BoundSlice++;
else
break;
}
}
return BoundSlice;
}
private bool IsPairBoundedSlice(int K, int P, int Q,Stack<int> Min,Stack<int> Max)
{
if (Min.Peek() > P)
{
Min.Pop();
Min.Push(P);
}
if (Min.Peek() > Q)
{
Min.Pop();
Min.Push(Q);
}
if (Max.Peek() < P)
{
Max.Pop();
Max.Push(P);
}
if (Max.Peek() < Q)
{
Max.Pop();
Max.Push(Q);
}
if (Max.Peek() - Min.Peek() <= K)
return true;
else
return false;
}
But as per codility review the above mentioned solution is running in O(N^2), can anybody help me in finding the solution which runs in O(N).
Maximum Time Complexity allowed O(N).
Maximum Space Complexity allowed O(N).
Disclaimer
It is possible and I demonstrate it here to write an algorithm that solves the problem you described in linear time in the worst case, visiting each element of the input sequence at a maximum of two times.
This answer is an attempt to deduce and describe the only algorithm I could find and then gives a quick tour through an implementation written in Clojure. I will probably write a Java implementation as well and update this answer but as of now that task is left as an excercise to the reader.
EDIT: I have now added a working Java implementation. Please scroll down to the end.
EDIT: Notices that PeterDeRivaz provided a sequence ([0 1 2 3 4], k=2) making the algorithm visit certain elements three times and probably falsifying it. I will update the answer at later time regarding that issue.
Unless I have overseen something trivial I can hardly imagine significant further simplification. Feedback is highly welcome.
(I found your question here when googling for codility like exercises as a preparation for a job test there myself. I set myself aside half an hour to solve it and didn't come up with a solution, so I was unhappy and spent some dedicated hammock time - now that I have taken the test I must say found the presented exercises significantly less difficult than this problem).
Observations
For any valid bounded slice of size we can say that it is divisible into the triangular number of size bounded sub-slices with their individual bounds lying within the slices bounds (including itself).
Ex. 1: [3 1 2] is a bounded slice for k=2, has a size of 3 and thus can be divided into (3*4)/2=6 sub-slices:
[3 1 2] ;; slice 1
[3 1] [1 2] ;; slices 2-3
[3] [1] [2] ;; slices 4-6
Naturally, all those slices are bounded slices for k.
When you have two overlapping slices that are both bounded slices for k but differ in their bounds, the amount of possible bounded sub-slices in the array can be calculated as the sum of the triangular numbers of those slices minus the triangular number of the count of elements they share.
Ex. 2: The bounded slices [4 3 1] and [3 1 2] for k=2 differ in bounds and overlap in the array [4 3 1 2]. They share the bounded slice [3 1] (notice that overlapping bounded slices always share a bounded slice, otherwise they could not overlap). For both slices the triangular number is 6, the triangular number of the shared slice is (2*3)/2=3. Thus the array can be divided into 6+6-3=9 slices:
[4 3 1] [3 1 2] ;; 1-2 the overlapping slices
[4 3] 6 [3 1] 6 [1 2] ;; 3-5 two slices and the overlapping slice
[4] [3] 3 [1] [2] ;; 6-9 single-element slices
As observable, the triangle of the overlapping bounded slice is part of both triangles element count, so that is why it must be subtracted from the added triangles as it otherwise would be counted twice. Again, all counted slices are bounded slices for k=2.
Approach
The approach is to find the largest possible bounded slices within the input sequence until all elements have been visited, then to sum them up using the technique described above.
A slice qualifies as one of the largest possible bounded slices (in the following text often referred as one largest possible bounded slice which shall then not mean the largest one, only one of them) if the following conditions are fulfilled:
It is bounded
It may share elements with two other slices to its left and right
It can not grow to the left or to the right without becoming unbounded - meaning: If it is possible, it has to contain so many elements that its maximum-minimum=k
By implication a bounded slice does not qualify as one of the largest possible bounded slices if there is a bounded slice with more elements that entirely encloses this slice
As a goal our algorithm must be capable to start at any element in the array and determine one largest possible bounded slice that contains that element and is the only one to contain it. It is then guaranteed that the next slice constructed from a starting point outside of it will not share the starting element of the previous slice because otherwise it would be one largest possible bounded slice with the previously found slice together (which now, by definition, is impossible). Once that algorithm has been found it can be applied sequentially from the beginning building such largest possible slices until no more elements are left. This would guarantee that each element is traversed two times in the worst case.
Algorithm
Start at the first element and find the largest possible bounded slice that includes said first element. Add the triangular number of its size to the counter.
Continue exactly one element after found slice and repeat. Subtract the triangular number of the count of elements shared with the previous slice (found searching backwards), add the triangular number of its total size (found searching forwards and backwards) until the sequence has been traversed. Repeat until no more elements can be found after a found slice, return the result.
Ex. 3: For the input sequence [4 3 1 2 0] with k=2 find the count of bounded slices.
Start at the first element, find the largest possible bounded slice:
[4 3], count=2, overlap=0, result=3
Continue after that slice, find the largest possible bounded slice:
[3 1 2], size=3, overlap=1, result=3-1+6=8
...
[1 2 0], size=3, overlap=2, result=8-3+6=11
result=11
Process behavior
In the worst case the process grows linearly in time and space. As proven above, elements are traversed two times at max. and per search for a largest possible bounded slice some locals need to be stored.
However, the process becomes dramatically faster as the array contains less largest possible bounded slices. For example, the array [4 4 4 4] with k>=0 has only one largest possible bounded slice (the array itself). The array will be traversed once and the triangular number of the count of its elements is returned as the correct result. Notice how this is complementary to solutions of worst case growth O((n * (n+1)) / 2). While they reach their worst case with only one largest possible bounded slice, for this algorithm such input would mean the best case (one visit per element in one pass from start to end).
Implementation
The most difficult part of the implementation is to find a largest bounded slice from one element scanning in two directions. When we search in one direction, we track the minimum and maximum bounds of our search and see how they compare to k. Once an element has been found that stretches the bounds so that maximum-minimum <= k does not hold anymore, we are done in that direction. Then we search into the other direction but use the last valid bounds of the backwards scan as starting bounds.
Ex.4: We start in the array [4 3 1 2 0] at the third element (1) after we have successfully found the largest bounded slice [4 3]. At this point we only know that our starting value 1 is the minimum, the maximum (of the searched largest bounded slice) or between those two. We scan backwards (exclusive) and stop after the second element (as 4 - 1 > k=2). The last valid bounds were 1 and 3. When we now scan forwards, we use the same algorithm but use 1 and 3 as bounds. Notice that even though in this example our starting element is one of the bounds, that is not always the case: Consider the same scenario with a 2 instead of the 3: Neither that 2 or the 1 would be determined to be a bound as we could find a 0 but also a 3 while scanning forwards - only then it could be decided which of 2 or 3 is a lower or upper bound.
To solve that problem here is a special counting algorithm. Don't worry if you don't understand Clojure yet, it does just what it says.
(defn scan-while-around
"Count numbers in `coll` until a number doesn't pass an (inclusive)
interval filter where said interval is guaranteed to contain
`around` and grows with each number to a maximum size of `size`.
Return count and the lower and upper bounds (inclusive) that were not
passed as [count lower upper]."
([around size coll]
(scan-while-around around around size coll))
([lower upper size coll]
(letfn [(step [[count lower upper :as result] elem]
(let [lower (min lower elem)
upper (max upper elem)]
(if (<= (- upper lower) size)
[(inc count) lower upper]
(reduced result))))]
(reduce step [0 lower upper] coll))))
Using this function we can search backwards, from before the starting element passing it our starting element as around and using k as the size.
Then we start a forward scan from the starting element with the same function, by passing it the previously returned bounds lower and upper.
We add their returned counts to the total count of the found largest possible slide and use the count of the backwards scan as the length of the overlap and subtract its triangular number.
Notice that in any case the forward scan is guaranteed to return a count of at least one. This is important for the algorithm for two reasons:
We use the resulting count of the forward scan to determine the starting point of the next search (and would loop infinitely with it being 0)
The algorithm would not be correct as for any starting element the smallest possible largest possible bounded slice always exists as an array of size 1 containing the starting element.
Assuming that triangular is a function returning the triangular number, here is the final algorithm:
(defn bounded-slice-linear
"Linear implementation"
[s k]
(loop [start-index 0
acc 0]
(if (< start-index (count s))
(let [start-elem (nth s start-index)
[backw lower upper] (scan-while-around start-elem
k
(rseq (subvec s 0
start-index)))
[forw _ _] (scan-while-around lower upper k
(subvec s start-index))]
(recur (+ start-index forw)
(-> acc
(+ (triangular (+ forw
backw)))
(- (triangular backw)))))
acc)))
(Notice that the creation of subvectors and their reverse sequences happens in constant time and that the resulting vectors share structure with the input vector so no "rest-size" depending allocation is happening (although it may look like it). This is one of the beautiful aspects of Clojure, that you can avoid tons of index-fiddling and usually work with elements directly.)
Here is a triangular implementation for comparison:
(defn bounded-slice-triangular
"O(n*(n+1)/2) implementation for testing."
[s k]
(reduce (fn [c [elem :as elems]]
(+ c (first (scan-while-around elem k elems))))
0
(take-while seq
(iterate #(subvec % 1) s))))
Both functions only accept vectors as input.
I have extensively tested their behavior for correctness using various strategies. Please try to prove them wrong anyway. Here is a link to a full file to hack on: https://www.refheap.com/32229
Here is the algorithm implemented in Java (not tested as extensively but seems to work, Java is not my first language. I'd be happy about feedback to learn)
public class BoundedSlices {
private static int triangular (int i) {
return ((i * (i+1)) / 2);
}
public static int solve (int[] a, int k) {
int i = 0;
int result = 0;
while (i < a.length) {
int lower = a[i];
int upper = a[i];
int countBackw = 0;
int countForw = 0;
for (int j = (i-1); j >= 0; --j) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countBackw++;
}
for (int j = i; j <a.length; j++) {
if (a[j] < lower) {
if (upper - a[j] > k)
break;
else
lower = a[j];
}
else if (a[j] > upper) {
if (a[j] - lower > k)
break;
else
upper = a[j];
}
countForw++;
}
result -= triangular(countBackw);
result += triangular(countForw + countBackw);
i+= countForw;
}
return result;
}
}
Now codility release their golden solution with O(N) time and space.
https://codility.com/media/train/solution-count-bounded-slices.pdf
if you still confused after read the pdf, like me.. here is a
very nice explanation
The solution from the pdf:
def boundedSlicesGolden(K, A):
N = len(A)
maxQ = [0] * (N + 1)
posmaxQ = [0] * (N + 1)
minQ = [0] * (N + 1)
posminQ = [0] * (N + 1)
firstMax, lastMax = 0, -1
firstMin, lastMin = 0, -1
j, result = 0, 0
for i in xrange(N):
while (j < N):
# added new maximum element
while (lastMax >= firstMax and maxQ[lastMax] <= A[j]):
lastMax -= 1
lastMax += 1
maxQ[lastMax] = A[j]
posmaxQ[lastMax] = j
# added new minimum element
while (lastMin >= firstMin and minQ[lastMin] >= A[j]):
lastMin -= 1
lastMin += 1
minQ[lastMin] = A[j]
posminQ[lastMin] = j
if (maxQ[firstMax] - minQ[firstMin] <= K):
j += 1
else:
break
result += (j - i)
if result >= maxINT:
return maxINT
if posminQ[firstMin] == i:
firstMin += 1
if posmaxQ[firstMax] == i:
firstMax += 1
return result
HINTS
Others have explained the basic algorithm which is to keep 2 pointers and advance the start or the end depending on the current difference between maximum and minimum.
It is easy to update the maximum and minimum when moving the end.
However, the main challenge of this problem is how to update when moving the start. Most heap or balanced tree structures will cost O(logn) to update, and will result in an overall O(nlogn) complexity which is too high.
To do this in time O(n):
Advance the end until you exceed the allowed threshold
Then loop backwards from this critical position storing a cumulative value in an array for the minimum and maximum at every location between the current end and the current start
You can now advance the start pointer and immediately lookup from the arrays the updated min/max values
You can carry on using these arrays to update start until start reaches the critical position. At this point return to step 1 and generate a new set of lookup values.
Overall this procedure will work backwards over every element exactly once, and so the total complexity is O(n).
EXAMPLE
For the sequence with K of 4:
4,1,2,3,4,5,6,10,12
Step 1 advances the end until we exceed the bound
start,4,1,2,3,4,5,end,6,10,12
Step 2 works backwards from end to start computing array MAX and MIN.
MAX[i] is maximum of all elements from i to end
Data = start,4,1,2,3,4,5,end,6,10,12
MAX = start,5,5,5,5,5,5,critical point=end -
MIN = start,1,1,2,3,4,5,critical point=end -
Step 3 can now advance start and immediately lookup the smallest values of max and min in the range start to critical point.
These can be combined with the max/min in the range critical point to end to find the overall max/min for the range start to end.
PYTHON CODE
def count_bounded_slices(A,k):
if len(A)==0:
return 0
t=0
inf = max(abs(a) for a in A)
left=0
right=0
left_lows = [inf]*len(A)
left_highs = [-inf]*len(A)
critical = 0
right_low = inf
right_high = -inf
# Loop invariant
# t counts number of bounded slices A[a:b] with a<left
# left_lows[i] is defined for values in range(left,critical)
# and contains the min of A[left:critical]
# left_highs[i] contains the max of A[left:critical]
# right_low is the minimum of A[critical:right]
# right_high is the maximum of A[critical:right]
while left<len(A):
# Extend right as far as possible
while right<len(A) and max(left_highs[left],max(right_high,A[right]))-min(left_lows[left],min(right_low,A[right]))<=k:
right_low = min(right_low,A[right])
right_high = max(right_high,A[right])
right+=1
# Now we know that any slice starting at left and ending before right will satisfy the constraints
t += right-left
# If we are at the critical position we need to extend our left arrays
if left==critical:
critical=right
left_low = inf
left_high = -inf
for x in range(critical-1,left,-1):
left_low = min(left_low,A[x])
left_high = max(left_high,A[x])
left_lows[x] = left_low
left_highs[x] = left_high
right_low = inf
right_high = -inf
left+=1
return t
A = [3,5,6,7,3]
print count_bounded_slices(A,2)
Here is my attempt at solving this problem:
- you start with p and q form position 0, min =max =0;
- loop until p = q = N-1
- as long as max-min<=k advance q and increment number of bounded slides.
- if max-min >k advance p
- you need to keep track of 2x min/max values because when you advance p, you might remove one or both of the min/max values
- each time you advance p or q update min/max
I can write the code if you want, but I think the idea is explicit enough...
Hope it helps.
Finally a code that works according to the below mentioned idea. This outputs 9.
(The code is in C++. You can change it for Java)
#include <iostream>
using namespace std;
int main()
{
int A[] = {3,5,6,7,3};
int K = 2;
int i = 0;
int j = 0;
int minValue = A[0];
int maxValue = A[0];
int minIndex = 0;
int maxIndex = 0;
int length = sizeof(A)/sizeof(int);
int count = 0;
bool stop = false;
int prevJ = 0;
while ( (i < length || j < length) && !stop ) {
if ( maxValue - minValue <= K ) {
if ( j < length-1 ) {
j++;
if ( A[j] > maxValue ) {
maxValue = A[j];
maxIndex = j;
}
if ( A[j] < minValue ) {
minValue = A[j];
minIndex = j;
}
} else {
count += j - i + 1;
stop = true;
}
} else {
if ( j > 0 ) {
int range = j - i;
int count1 = range * (range + 1) / 2; // Choose 2 from range with repitition.
int rangeRep = prevJ - i; // We have to subtract already counted ones.
int count2 = rangeRep * (rangeRep + 1) / 2;
count += count1 - count2;
prevJ = j;
}
if ( A[j] == minValue ) {
// first reach the first maxima
while ( A[i] - minValue <= K )
i++;
// then come down to correct level.
while ( A[i] - minValue > K )
i++;
maxValue = A[i];
} else {//if ( A[j] == maxValue ) {
while ( maxValue - A[i] <= K )
i++;
while ( maxValue - A[i] > K )
i++;
minValue = A[i];
}
}
}
cout << count << endl;
return 0;
}
Algorithm (minor tweaking done in code):
Keep two pointers i & j and maintain two values minValue and maxValue..
1. Initialize i = 0, j = 0, and minValue = maxValue = A[0];
2. If maxValue - minValue <= K,
- Increment count.
- Increment j.
- if new A[j] > maxValue, maxValue = A[j].
- if new A[j] < minValue, minValue = A[j].
3. If maxValue - minValue > K, this can only happen iif
- the new A[j] is either maxValue or minValue.
- Hence keep incrementing i untill abs(A[j] - A[i]) <= K.
- Then update the minValue and maxValue and proceed accordingly.
4. Goto step 2 if ( i < length-1 || j < length-1 )
I have provided the answer for the same question in different SO Question
(1) For an A[n] input , for sure you will have n slices , So add at first.
for example for {3,5,4,7,6,3} you will have for sure (0,0)(1,1)(2,2)(3,3)(4,4) (5,5).
(2) Then find the P and Q based on min max comparison.
(3) apply the Arithmetic series formula to find the number of combination between (Q-P) as a X . then it would be X ( X+1) /2 But we have considered "n" already so the formula would be (x ( x+1) /2) - x) which is x (x-1) /2 after basic arithmetic.
For example in the above example if P is 0 (3) and Q is 3 (7) we have Q-P is 3 . When apply the formula the value would be 3 (3-1)/2 = 3. Now add the 6 (length) + 3 .Then take care of Q- min or Q - max records.
Then check the Min and Max index .In this case Min as 0 Max as 3 (obivously any one of the would match with currentIndex (which ever used to loop). here we took care of (0,1)(0,2)(1,2) but we have not taken care of (1,3) (2,3) . Rather than start the hole process from index 1 , save this number (position 2,3 = 2) , then start same process from currentindex( assume min and max as A[currentIndex] as we did while starting). finaly multiply the number with preserved . in our case 2 * 2 ( A[7],A[6]) .
It runs in O(N) time with O(N) space.
I came up with a solution in Scala:
package test
import scala.collection.mutable.Queue
object BoundedSlice {
def apply(k:Int, a:Array[Int]):Int = {
var c = 0
var q:Queue[Int] = Queue()
a.map(i => {
if(!q.isEmpty && Math.abs(i-q.last) > k)
q.clear
else
q = q.dropWhile(j => (Math.abs(i-j) > k)).toQueue
q += i
c += q.length
})
c
}
def main(args: Array[String]): Unit = {
val a = Array[Int](3,5,6,7,3)
println(BoundedSlice(2, a))
}
}

Finding the missing number in an array

An array a[] contains all of the integers from 0 to N, except one. However, you cannot access an element with a single operation. Instead, you can call get(i, k) which returns the kth bit of a[i] or you can call swap(i, j) which swaps the ith and jth elements of a[]. Design a O(N) algorithm to find the missing integer.
(For simplicity, assume N is a power of 2.)
If N is a power of 2, it can be done in O(N) using divide and conquer.
Note that there are logN bits in the numbers. Now, using this information - you can use a combination of partition based selection algorithm and radix-sort.
Iterate the numbers for the first bit, and divide the array to two
halves - the first half has this bit as 0, the other half has it as 1. (Use the swap() for partitioning the array).
Note that one half has ceil(N/2) elements, and the other has floor(N/2) elements.
Repeat the process for the smaller array, until you find the missing
number.
The complexity of this approach will be N + N/2 + N/4 + ... + 1 < 2N, so it is O(n)
O(N*M), where M is the number of bits:
N is a power of 2, only one number is missing, so if you check each bit, and count the numbers where that bit is 0, and count where is 1, you'll get 2^(M-1) and 2^(M-1)-1, the shorter one belongs to the missing number. With this, you can get all the bits of the missing number.
there are really no even need to use swap operation!!
Use XOR!
Okay, first you can calculate binary XOR of all number from 0 to N.
So first:
long nxor = 0;
for (long i = 0; i <= N; i++)
nxor = XOR(nxor, i);
Then we can calculate XOR of all numbers in array, it's also simple. Let's call as K - maximal number of bits inside all number.
long axor = 0;
long K = 0;
long H = N;
while (H > 0)
{
H >>= 1; K++;
}
for (long i = 0; i < N - 1; i++)
for (long j = 0; j < K; k++)
axor = XOR(axor, get(i,j) << j);
Finally you can calculate XOR of result:
long result = XOR(nxor, axor).
And by the way, if n is a power of 2, then nxor value will be equal to n ;-)!
Suppose that the input is a[]=0,1,2,3,4,5,7,8, so that 6 is missing. The numbers are sorted for convenience only, because they don't have to be sorted for the solution to work.
Since N is 8 then the numbers are represented using 4 bits.
From 0000 to 1000.
First partition the array using the most significant bit.
You get 0,1,2,3,4,5,7 and 8. Since 8 is present, continue with the left partition.
Partition the sub array using the 2nd most significant bit.
You get 0,1,2,3 and 4,5,7. Now continue with the partition that has odd number of elements, which is 4,5,7.
Partition the sub array using the 3rd most significant bit.
You get 4,5 and 7. Again continue with the partition that has odd number of elements, which is 7.
Partition the sub array using the 4th most significant bit you get nothing and 7.
So the missing number is 6.
Another example:
a[]=0,1,3,4,5,6,7,8, so that 2 is missing.
1st bit partition: 0,1,3,4,5,6,7 and 8, continue with 0,1,3,4,5,6,7.
2nd bit partition: 0,1,3 and 4,5,6,7, continue with 0,1,3 (odd number of elements).
3rd bit partition: 0,1 and 3, continue with 3 (odd number of elements).
4th bit partition: nothing and 3, so 2 is missing.
Another example:
a[]=1,2,3,4,5,6,7,8, so that 0 is missing.
1st bit partition: 1,2,3,4,5,6,7 and 8, continue with 1,2,3,4,5,6,7.
2nd bit partition: 1,2,3 and 4,5,6,7, continue with 1,2,3 (odd number of elements).
3rd bit partition: 1 and 2,3, continue with 1 (odd number of elements).
4th bit partition: nothing and 1, so 0 is missing.
The 1st partition takes N operations.
The 2nd partition takes N operations.
The 3rd partition takes N/2 operations.
The 4th partition takes N/4 operations.
And so on.
So the running time is O(N+N+N/2+N/4+...)=O(N).
And also you another anwer when we will use sum operation instead of xor operation.
Just below please find code.
long allsum = n * (n + 1) / 2;
long sum = 0;
long K = 0;
long H = N;
while (H > 0)
{
H >>= 1; K++;
}
for (long i = 0; i < N - 1; i++)
for (long j = 0; j < K; k++)
sum += get(i,j) << j;
long result = allsum - sum.
With out xor operation, we will answer this question like this way
package missingnumberinarray;
public class MissingNumber
{
public static void main(String args[])
{
int array1[] = {1,2,3,4,6,7,8,9,10}; // we need sort the array first.
System.out.println(array1[array1.length-1]);
int n = array1[array1.length-1];
int total = (n*(n+1))/2;
System.out.println(total);
int arraysum = 0;
for(int i = 0; i < array1.length; i++)
{
arraysum += array1[i];
}
System.out.println(arraysum);
int mis = total-arraysum;
System.out.println("The missing number in array is "+mis);
}
}

There is an array having 1 to 100 numbers randomly placed. But two numbers are missing from the list. What are those two numbers? [duplicate]

I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers 1, 2, 3, …, 100. Each number appears exactly once, so there are 100 numbers. Now one number is randomly picked out of the bag. Find the missing number.
I've heard this interview question before, of course, so I very quickly answered along the lines of:
A1: Well, the sum of the numbers 1 + 2 + 3 + … + N is (N+1)(N/2) (see Wikipedia: sum of arithmetic series). For N = 100, the sum is 5050.
Thus, if all numbers are present in the bag, the sum will be exactly 5050. Since one number is missing, the sum will be less than this, and the difference is that number. So we can find that missing number in O(N) time and O(1) space.
At this point I thought I had done well, but all of a sudden the question took an unexpected turn:
Q2: That is correct, but now how would you do this if TWO numbers are missing?
I had never seen/heard/considered this variation before, so I panicked and couldn't answer the question. The interviewer insisted on knowing my thought process, so I mentioned that perhaps we can get more information by comparing against the expected product, or perhaps doing a second pass after having gathered some information from the first pass, etc, but I really was just shooting in the dark rather than actually having a clear path to the solution.
The interviewer did try to encourage me by saying that having a second equation is indeed one way to solve the problem. At this point I was kind of upset (for not knowing the answer before hand), and asked if this is a general (read: "useful") programming technique, or if it's just a trick/gotcha answer.
The interviewer's answer surprised me: you can generalize the technique to find 3 missing numbers. In fact, you can generalize it to find k missing numbers.
Qk: If exactly k numbers are missing from the bag, how would you find it efficiently?
This was a few months ago, and I still couldn't figure out what this technique is. Obviously there's a Ω(N) time lower bound since we must scan all the numbers at least once, but the interviewer insisted that the TIME and SPACE complexity of the solving technique (minus the O(N) time input scan) is defined in k not N.
So the question here is simple:
How would you solve Q2?
How would you solve Q3?
How would you solve Qk?
Clarifications
Generally there are N numbers from 1..N, not just 1..100.
I'm not looking for the obvious set-based solution, e.g. using a bit set, encoding the presence/absence each number by the value of a designated bit, therefore using O(N) bits in additional space. We can't afford any additional space proportional to N.
I'm also not looking for the obvious sort-first approach. This and the set-based approach are worth mentioning in an interview (they are easy to implement, and depending on N, can be very practical). I'm looking for the Holy Grail solution (which may or may not be practical to implement, but has the desired asymptotic characteristics nevertheless).
So again, of course you must scan the input in O(N), but you can only capture small amount of information (defined in terms of k not N), and must then find the k missing numbers somehow.
Here's a summary of Dimitris Andreou's link.
Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations
a1 + a2 + ... + ak = b1
a12 + a22 + ... + ak2 = b2
...
a1k + a2k + ... + akk = bk
Using Newton's identities, knowing bi allows to compute
c1 = a1 + a2 + ... ak
c2 = a1a2 + a1a3 + ... + ak-1ak
...
ck = a1a2 ... ak
If you expand the polynomial (x-a1)...(x-ak) the coefficients will be exactly c1, ..., ck - see Viète's formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means ai are uniquely determined, up to permutation.
This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.
However, when k is varying, the direct approach of computing c1,...,ck is prohibitely expensive, since e.g. ck is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Zq field, where q is a prime such that n <= q < 2n - it exists by Bertrand's postulate. The proof doesn't need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.
High level pseudocode for constant k:
Compute i-th powers of given numbers
Subtract to get sums of i-th powers of unknown numbers. Call the sums bi.
Use Newton's identities to compute coefficients from bi; call them ci. Basically, c1 = b1; c2 = (c1b1 - b2)/2; see Wikipedia for exact formulas
Factor the polynomial xk-c1xk-1 + ... + ck.
The roots of the polynomial are the needed numbers a1, ..., ak.
For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.
EDIT: The previous version of this answer stated that instead of Zq, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton's formulas require division by numbers up to k.
You will find it by reading the couple of pages of Muthukrishnan - Data Stream Algorithms: Puzzle 1: Finding Missing Numbers. It shows exactly the generalization you are looking for. Probably this is what your interviewer read and why he posed these questions.
Also see sdcvvc's directly related answer, which also includes pseudocode (hurray! no need to read those tricky math formulations :)) (thanks, great work!).
We can solve Q2 by summing both the numbers themselves, and the squares of the numbers.
We can then reduce the problem to
k1 + k2 = x
k1^2 + k2^2 = y
Where x and y are how far the sums are below the expected values.
Substituting gives us:
(x-k2)^2 + k2^2 = y
Which we can then solve to determine our missing numbers.
As #j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too.
Assuming that the "bag" is represented by a 1-based array A[] of size N - k, we can solve Qk in O(N) time and O(k) additional space.
First, we extend our array A[] by k elements, so that it is now of size N. This is the O(k) additional space. We then run the following pseudo-code algorithm:
for i := n - k + 1 to n
A[i] := A[1]
end for
for i := 1 to n - k
while A[A[i]] != A[i]
swap(A[i], A[A[i]])
end while
end for
for i := 1 to n
if A[i] != i then
print i
end if
end for
The first loop initialises the k extra entries to the same as the first entry in the array (this is just a convenient value that we know is already present in the array - after this step, any entries that were missing in the initial array of size N-k are still missing in the extended array).
The second loop permutes the extended array so that if element x is present at least once, then one of those entries will be at position A[x].
Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.
The third loop prints those indexes of the array i that are not occupied by the value i - this means that i must have been missing.
I asked a 4-year-old to solve this problem. He sorted the numbers and then counted along. This has a space requirement of O(kitchen floor), and it works just as easy however many balls are missing.
Not sure, if it's the most efficient solution, but I would loop over all entries, and use a bitset to remember, which numbers are set, and then test for 0 bits.
I like simple solutions - and I even believe, that it might be faster than calculating the sum, or the sum of squares etc.
I haven't checked the maths, but I suspect that computing Σ(n^2) in the same pass as we compute Σ(n) would provide enough info to get two missing numbers, Do Σ(n^3) as well if there are three, and so on.
The problem with solutions based on sums of numbers is they don't take into account the cost of storing and working with numbers with large exponents... in practice, for it to work for very large n, a big numbers library would be used. We can analyse the space utilisation for these algorithms.
We can analyse the time and space complexity of sdcvvc and Dimitris Andreou's algorithms.
Storage:
l_j = ceil (log_2 (sum_{i=1}^n i^j))
l_j > log_2 n^j (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)
l_j < log_2 ((sum_{i=1}^n i)^j) + 1
l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)`
So l_j \in \Theta(j log n)
Total storage used: \sum_{j=1}^k l_j \in \Theta(k^2 log n)
Space used: assuming that computing a^j takes ceil(log_2 j) time, total time:
t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n) \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)
Total time used: \Theta(kn log n)
If this time and space is satisfactory, you can use a simple recursive
algorithm. Let b!i be the ith entry in the bag, n the number of numbers before
removals, and k the number of removals. In Haskell syntax...
let
-- O(1)
isInRange low high v = (v >= low) && (v <= high)
-- O(n - k)
countInRange low high = sum $ map (fromEnum . isInRange low high . (!)b) [1..(n-k)]
findMissing l low high krange
-- O(1) if there is nothing to find.
| krange=0 = l
-- O(1) if there is only one possibility.
| low=high = low:l
-- Otherwise total of O(knlog(n)) time
| otherwise =
let
mid = (low + high) `div` 2
klow = countInRange low mid
khigh = krange - klow
in
findMissing (findMissing low mid klow) (mid + 1) high khigh
in
findMising 1 (n - k) k
Storage used: O(k) for list, O(log(n)) for stack: O(k + log(n))
This algorithm is more intuitive, has the same time complexity, and uses less space.
A very simple solution to Q2 which I'm surprised nobody answered already. Use the method from Q1 to find the sum of the two missing numbers. Let's denote it by S, then one of the missing numbers is smaller than S/2 and the other is bigger than S/2 (duh). Sum all the numbers from 1 to S/2 and compare it to the formula's result (similarly to the method in Q1) to find the lower between the missing numbers. Subtract it from S to find the bigger missing number.
Wait a minute. As the question is stated, there are 100 numbers in the bag. No matter how big k is, the problem can be solved in constant time because you can use a set and remove numbers from the set in at most 100 - k iterations of a loop. 100 is constant. The set of remaining numbers is your answer.
If we generalise the solution to the numbers from 1 to N, nothing changes except N is not a constant, so we are in O(N - k) = O(N) time. For instance, if we use a bit set, we set the bits to 1 in O(N) time, iterate through the numbers, setting the bits to 0 as we go (O(N-k) = O(N)) and then we have the answer.
It seems to me that the interviewer was asking you how to print out the contents of the final set in O(k) time rather than O(N) time. Clearly, with a bit set, you have to iterate through all N bits to determine whether you should print the number or not. However, if you change the way the set is implemented you can print out the numbers in k iterations. This is done by putting the numbers into an object to be stored in both a hash set and a doubly linked list. When you remove an object from the hash set, you also remove it from the list. The answers will be left in the list which is now of length k.
To solve the 2 (and 3) missing numbers question, you can modify quickselect, which on average runs in O(n) and uses constant memory if partitioning is done in-place.
Partition the set with respect to a random pivot p into partitions l, which contain numbers smaller than the pivot, and r, which contain numbers greater than the pivot.
Determine which partitions the 2 missing numbers are in by comparing the pivot value to the size of each partition (p - 1 - count(l) = count of missing numbers in l and
n - count(r) - p = count of missing numbers in r)
a) If each partition is missing one number, then use the difference of sums approach to find each missing number.
(1 + 2 + ... + (p-1)) - sum(l) = missing #1 and
((p+1) + (p+2) ... + n) - sum(r) = missing #2
b) If one partition is missing both numbers and the partition is empty, then the missing numbers are either (p-1,p-2) or (p+1,p+2)
depending on which partition is missing the numbers.
If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.
With only 2 missing numbers, this algorithm always discards at least one partition, so it retains O(n) average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also discards at least one partition with each pass (because as with 2 missing numbers, at most only 1 partition will contain multiple missing numbers). However, I'm not sure how much the performance decreases when more missing numbers are added.
Here's an implementation that does not use in-place partitioning, so this example does not meet the space requirement but it does illustrate the steps of the algorithm:
<?php
$list = range(1,100);
unset($list[3]);
unset($list[31]);
findMissing($list,1,100);
function findMissing($list, $min, $max) {
if(empty($list)) {
print_r(range($min, $max));
return;
}
$l = $r = [];
$pivot = array_pop($list);
foreach($list as $number) {
if($number < $pivot) {
$l[] = $number;
}
else {
$r[] = $number;
}
}
if(count($l) == $pivot - $min - 1) {
// only 1 missing number use difference of sums
print array_sum(range($min, $pivot-1)) - array_sum($l) . "\n";
}
else if(count($l) < $pivot - $min) {
// more than 1 missing number, recurse
findMissing($l, $min, $pivot-1);
}
if(count($r) == $max - $pivot - 1) {
// only 1 missing number use difference of sums
print array_sum(range($pivot + 1, $max)) - array_sum($r) . "\n";
} else if(count($r) < $max - $pivot) {
// mroe than 1 missing number recurse
findMissing($r, $pivot+1, $max);
}
}
Demo
For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and takes O(k) space.
The idea is to run the original algorithm two times. In the first one you get a total number which is missing, which gives you an upper bound of the missing numbers. Let's call this number N. You know that the missing two numbers are going to sum up to N, so the first number can only be in the interval [1, floor((N-1)/2)] while the second is going to be in [floor(N/2)+1,N-1].
Thus you loop on all numbers once again, discarding all numbers that are not included in the first interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two numbers, and by extension the second.
I have a feeling that this method could be generalized and maybe multiple searches run in "parallel" during a single pass over the input, but I haven't yet figured out how.
Here's a solution that uses k bits of extra storage, without any clever tricks and just straightforward. Execution time O (n), extra space O (k). Just to prove that this can be solved without reading up on the solution first or being a genius:
void puzzle (int* data, int n, bool* extra, int k)
{
// data contains n distinct numbers from 1 to n + k, extra provides
// space for k extra bits.
// Rearrange the array so there are (even) even numbers at the start
// and (odd) odd numbers at the end.
int even = 0, odd = 0;
while (even + odd < n)
{
if (data [even] % 2 == 0) ++even;
else if (data [n - 1 - odd] % 2 == 1) ++odd;
else { int tmp = data [even]; data [even] = data [n - 1 - odd];
data [n - 1 - odd] = tmp; ++even; ++odd; }
}
// Erase the lowest bits of all numbers and set the extra bits to 0.
for (int i = even; i < n; ++i) data [i] -= 1;
for (int i = 0; i < k; ++i) extra [i] = false;
// Set a bit for every number that is present
for (int i = 0; i < n; ++i)
{
int tmp = data [i];
tmp -= (tmp % 2);
if (i >= even) ++tmp;
if (tmp <= n) data [tmp - 1] += 1; else extra [tmp - n - 1] = true;
}
// Print out the missing ones
for (int i = 1; i <= n; ++i)
if (data [i - 1] % 2 == 0) printf ("Number %d is missing\n", i);
for (int i = n + 1; i <= n + k; ++i)
if (! extra [i - n - 1]) printf ("Number %d is missing\n", i);
// Restore the lowest bits again.
for (int i = 0; i < n; ++i) {
if (i < even) { if (data [i] % 2 != 0) data [i] -= 1; }
else { if (data [i] % 2 == 0) data [i] += 1; }
}
}
Motivation
If you want to solve the general-case problem, and you can store and edit the array, then Caf's solution is by far the most efficient. If you can't store the array (streaming version), then sdcvvc's answer is the only type of solution currently suggested.
The solution I propose is the most efficient answer (so far on this thread) if you can store the array but can't edit it, and I got the idea from Svalorzen's solution, which solves for 1 or 2 missing items. This solution takes Θ(k*n) time and O(min(k,log(n))) and Ω(log(k)) space. It also works well with parallelism.
Concept
The idea is that if you use the original approach of comparing sums:
sum = SumOf(1,n) - SumOf(array)
... then you take the average of the missing numbers:
average = sum/n_missing_numbers
... which provides a boundary: Of the missing numbers, there's guaranteed to be at least one number less-or-equal to average, and at least one number greater than average. This means that we can split into sub problems that each scan the array [O(n)] and are only concerned with their respective sub-arrays.
Code
C-style solution (don't judge me for the global variables, I'm just trying to make the code readable for non-c folks):
#include "stdio.h"
// Example problem:
const int array [] = {0, 7, 3, 1, 5};
const int N = 8; // size of original array
const int array_size = 5;
int SumOneTo (int n)
{
return n*(n-1)/2; // non-inclusive
}
int MissingItems (const int begin, const int end, int & average)
{
// We consider only sub-array elements with values, v:
// begin <= v < end
// Initialise info about missing elements.
// First assume all are missing:
int n = end - begin;
int sum = SumOneTo(end) - SumOneTo(begin);
// Minus everything that we see (ie not missing):
for (int i = 0; i < array_size; ++i)
{
if ((begin <= array[i]) && (array[i] < end))
{
--n;
sum -= array[i];
}
}
// used by caller:
average = sum/n;
return n;
}
void Find (const int begin, const int end)
{
int average;
if (MissingItems(begin, end, average) == 1)
{
printf(" %d", average); // average(n) is same as n
return;
}
Find(begin, average + 1); // at least one missing here
Find(average + 1, end); // at least one here also
}
int main ()
{
printf("Missing items:");
Find(0, N);
printf("\n");
}
Analysis
Ignoring recursion for a moment, each function call clearly takes O(n) time and O(1) space. Note that sum can equal as much as n(n-1)/2, so requires double the amount of bits needed to store n-1. At most this means than we effectively need two extra elements worth of space, regardless of the size of the array or k, hence it's still O(1) space under the normal conventions.
It's not so obvious how many function calls there are for k missing elements, so I'll provide a visual. Your original sub-array (connected array) is the full array, which has all k missing elements in it. We'll imagine them in increasing order, where -- represent connections (part of same sub-array):
m1 -- m2 -- m3 -- m4 -- (...) -- mk-1 -- mk
The effect of the Find function is to disconnect the missing elements into different non-overlapping sub-arrays. It guarantees that there's at least one missing element in each sub-array, which means breaking exactly one connection.
What this means is that regardless of how the splits occur, it will always take k-1 Find function calls to do the work of finding the sub-arrays that have only one missing element in it.
So the time complexity is Θ((k-1 + k) * n) = Θ(k*n).
For the space complexity, if we divide proportionally each time then we get O(log(k)) space complexity, but if we only separate one at a time it gives us O(k).
See here for a proof as to why the space complexity is O(log(n)). Given that above we've shown that it's also O(k), then we know that it's O(min(k,log(n))).
May be this algorithm can work for question 1:
Precompute xor of first 100 integers(val=1^2^3^4....100)
xor the elements as they keep coming from input stream ( val1=val1^next_input)
final answer=val^val1
Or even better:
def GetValue(A)
val=0
for i=1 to 100
do
val=val^i
done
for value in A:
do
val=val^value
done
return val
This algorithm can in fact be expanded for two missing numbers. The first step remains the same. When we call GetValue with two missing numbers the result will be a a1^a2 are the two missing numbers. Lets say
val = a1^a2
Now to sieve out a1 and a2 from val we take any set bit in val. Lets say the ith bit is set in val. That means that a1 and a2 have different parity at ith bit position.
Now we do another iteration on the original array and keep two xor values. One for the numbers which have the ith bit set and other which doesn't have the ith bit set. We now have two buckets of numbers, and its guranteed that a1 and a2 will lie in different buckets. Now repeat the same what we did for finding one missing element on each of the bucket.
There is a general way to solve streaming problems like this.
The idea is to use a bit of randomization to hopefully 'spread' the k elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.
Make an array, a, of size u = k^2.
Pick any universal hash function, h : {1,...,n} -> {1,...,u}. (Like multiply-shift)
For each i in 1, ..., n increase a[h(i)] += i
For each number x in the input stream, decrement a[h(x)] -= x.
If all of the missing numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.
The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at most k^2/2/u=1/2. That is, we succeed with probability at least 50%, and if we increase u we increase our chances.
Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.) This matches the space required by #Dimitris Andreou's answer (In particular the space requirement of polynomial factorization, which happens to also be randomized.)
This algorithm also has constant time per update, rather than time k in the case of power-sums.
In fact, we can be even more efficient than the power sum method by using the trick described in the comments.
Can you check if every number exists? If yes you may try this:
S = sum of all numbers in the bag (S < 5050)
Z = sum of the missing numbers 5050 - S
if the missing numbers are x and y then:
x = Z - y and
max(x) = Z - 1
So you check the range from 1 to max(x) and find the number
You can solve Q2 if you have the sum of both lists and the product of both lists.
(l1 is the original, l2 is the modified list)
d = sum(l1) - sum(l2)
m = mul(l1) / mul(l2)
We can optimise this since the sum of an arithmetic series is n times the average of the first and last terms:
n = len(l1)
d = (n/2)*(n+1) - sum(l2)
Now we know that (if a and b are the removed numbers):
a + b = d
a * b = m
So we can rearrange to:
a = s - b
b * (s - b) = m
And multiply out:
-b^2 + s*b = m
And rearrange so the right side is zero:
-b^2 + s*b - m = 0
Then we can solve with the quadratic formula:
b = (-s + sqrt(s^2 - (4*-1*-m)))/-2
a = s - b
Sample Python 3 code:
from functools import reduce
import operator
import math
x = list(range(1,21))
sx = (len(x)/2)*(len(x)+1)
x.remove(15)
x.remove(5)
mul = lambda l: reduce(operator.mul,l)
s = sx - sum(x)
m = mul(range(1,21)) / mul(x)
b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2
a = s - b
print(a,b) #15,5
I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.)
Here is a solution that doesn't rely on complex math as sdcvvc's/Dimitris Andreou's answers do, doesn't change the input array as caf and Colonel Panic did, and doesn't use the bitset of enormous size as Chris Lercher, JeremyP and many others did. Basically, I began with Svalorzen's/Gilad Deutch's idea for Q2, generalized it to the common case Qk and implemented in Java to prove that the algorithm works.
The idea
Suppose we have an arbitrary interval I of which we only know that it contains at least one of the missing numbers. After one pass through the input array, looking only at the numbers from I, we can obtain both the sum S and the quantity Q of missing numbers from I. We do this by simply decrementing I's length each time we encounter a number from I (for obtaining Q) and by decreasing pre-calculated sum of all numbers in I by that encountered number each time (for obtaining S).
Now we look at S and Q. If Q = 1, it means that then I contains only one of the missing numbers, and this number is clearly S. We mark I as finished (it is called "unambiguous" in the program) and leave it out from further consideration. On the other hand, if Q > 1, we can calculate the average A = S / Q of missing numbers contained in I. As all numbers are distinct, at least one of such numbers is strictly less than A and at least one is strictly greater than A. Now we split I in A into two smaller intervals each of which contains at least one missing number. Note that it doesn't matter to which of the intervals we assign A in case it is an integer.
We make the next array pass calculating S and Q for each of the intervals separately (but in the same pass) and after that mark intervals with Q = 1 and split intervals with Q > 1. We continue this process until there are no new "ambiguous" intervals, i.e. we have nothing to split because each interval contains exactly one missing number (and we always know this number because we know S). We start out from the sole "whole range" interval containing all possible numbers (like [1..N] in the question).
Time and space complexity analysis
The total number of passes p we need to make until the process stops is never greater than the missing numbers count k. The inequality p <= k can be proved rigorously. On the other hand, there is also an empirical upper bound p < log2N + 3 that is useful for large values of k. We need to make a binary search for each number of the input array to determine the interval to which it belongs. This adds the log k multiplier to the time complexity.
In total, the time complexity is O(N ᛫ min(k, log N) ᛫ log k). Note that for large k, this is significantly better than that of sdcvvc/Dimitris Andreou's method, which is O(N ᛫ k).
For its work, the algorithm requires O(k) additional space for storing at most k intervals, that is significantly better than O(N) in "bitset" solutions.
Java implementation
Here's a Java class that implements the above algorithm. It always returns a sorted array of missing numbers. Besides that, it doesn't require the missing numbers count k because it calculates it in the first pass. The whole range of numbers is given by the minNumber and maxNumber parameters (e.g. 1 and 100 for the first example in the question).
public class MissingNumbers {
private static class Interval {
boolean ambiguous = true;
final int begin;
int quantity;
long sum;
Interval(int begin, int end) { // begin inclusive, end exclusive
this.begin = begin;
quantity = end - begin;
sum = quantity * ((long)end - 1 + begin) / 2;
}
void exclude(int x) {
quantity--;
sum -= x;
}
}
public static int[] find(int minNumber, int maxNumber, NumberBag inputBag) {
Interval full = new Interval(minNumber, ++maxNumber);
for (inputBag.startOver(); inputBag.hasNext();)
full.exclude(inputBag.next());
int missingCount = full.quantity;
if (missingCount == 0)
return new int[0];
Interval[] intervals = new Interval[missingCount];
intervals[0] = full;
int[] dividers = new int[missingCount];
dividers[0] = minNumber;
int intervalCount = 1;
while (true) {
int oldCount = intervalCount;
for (int i = 0; i < oldCount; i++) {
Interval itv = intervals[i];
if (itv.ambiguous)
if (itv.quantity == 1) // number inside itv uniquely identified
itv.ambiguous = false;
else
intervalCount++; // itv will be split into two intervals
}
if (oldCount == intervalCount)
break;
int newIndex = intervalCount - 1;
int end = maxNumber;
for (int oldIndex = oldCount - 1; oldIndex >= 0; oldIndex--) {
// newIndex always >= oldIndex
Interval itv = intervals[oldIndex];
int begin = itv.begin;
if (itv.ambiguous) {
// split interval itv
// use floorDiv instead of / because input numbers can be negative
int mean = (int)Math.floorDiv(itv.sum, itv.quantity) + 1;
intervals[newIndex--] = new Interval(mean, end);
intervals[newIndex--] = new Interval(begin, mean);
} else
intervals[newIndex--] = itv;
end = begin;
}
for (int i = 0; i < intervalCount; i++)
dividers[i] = intervals[i].begin;
for (inputBag.startOver(); inputBag.hasNext();) {
int x = inputBag.next();
// find the interval to which x belongs
int i = java.util.Arrays.binarySearch(dividers, 0, intervalCount, x);
if (i < 0)
i = -i - 2;
Interval itv = intervals[i];
if (itv.ambiguous)
itv.exclude(x);
}
}
assert intervalCount == missingCount;
for (int i = 0; i < intervalCount; i++)
dividers[i] = (int)intervals[i].sum;
return dividers;
}
}
For fairness, this class receives input in form of NumberBag objects. NumberBag doesn't allow array modification and random access and also counts how many times the array was requested for sequential traversing. It is also more suitable for large array testing than Iterable<Integer> because it avoids boxing of primitive int values and allows wrapping a part of a large int[] for a convenient test preparation. It is not hard to replace, if desired, NumberBag by int[] or Iterable<Integer> type in the find signature, by changing two for-loops in it into foreach ones.
import java.util.*;
public abstract class NumberBag {
private int passCount;
public void startOver() {
passCount++;
}
public final int getPassCount() {
return passCount;
}
public abstract boolean hasNext();
public abstract int next();
// A lightweight version of Iterable<Integer> to avoid boxing of int
public static NumberBag fromArray(int[] base, int fromIndex, int toIndex) {
return new NumberBag() {
int index = toIndex;
public void startOver() {
super.startOver();
index = fromIndex;
}
public boolean hasNext() {
return index < toIndex;
}
public int next() {
if (index >= toIndex)
throw new NoSuchElementException();
return base[index++];
}
};
}
public static NumberBag fromArray(int[] base) {
return fromArray(base, 0, base.length);
}
public static NumberBag fromIterable(Iterable<Integer> base) {
return new NumberBag() {
Iterator<Integer> it;
public void startOver() {
super.startOver();
it = base.iterator();
}
public boolean hasNext() {
return it.hasNext();
}
public int next() {
return it.next();
}
};
}
}
Tests
Simple examples demonstrating the usage of these classes are given below.
import java.util.*;
public class SimpleTest {
public static void main(String[] args) {
int[] input = { 7, 1, 4, 9, 6, 2 };
NumberBag bag = NumberBag.fromArray(input);
int[] output = MissingNumbers.find(1, 10, bag);
System.out.format("Input: %s%nMissing numbers: %s%nPass count: %d%n",
Arrays.toString(input), Arrays.toString(output), bag.getPassCount());
List<Integer> inputList = new ArrayList<>();
for (int i = 0; i < 10; i++)
inputList.add(2 * i);
Collections.shuffle(inputList);
bag = NumberBag.fromIterable(inputList);
output = MissingNumbers.find(0, 19, bag);
System.out.format("%nInput: %s%nMissing numbers: %s%nPass count: %d%n",
inputList, Arrays.toString(output), bag.getPassCount());
// Sieve of Eratosthenes
final int MAXN = 1_000;
List<Integer> nonPrimes = new ArrayList<>();
nonPrimes.add(1);
int[] primes;
int lastPrimeIndex = 0;
while (true) {
primes = MissingNumbers.find(1, MAXN, NumberBag.fromIterable(nonPrimes));
int p = primes[lastPrimeIndex]; // guaranteed to be prime
int q = p;
for (int i = lastPrimeIndex++; i < primes.length; i++) {
q = primes[i]; // not necessarily prime
int pq = p * q;
if (pq > MAXN)
break;
nonPrimes.add(pq);
}
if (q == p)
break;
}
System.out.format("%nSieve of Eratosthenes. %d primes up to %d found:%n",
primes.length, MAXN);
for (int i = 0; i < primes.length; i++)
System.out.format(" %4d%s", primes[i], (i % 10) < 9 ? "" : "\n");
}
}
Large array testing can be performed this way:
import java.util.*;
public class BatchTest {
private static final Random rand = new Random();
public static int MIN_NUMBER = 1;
private final int minNumber = MIN_NUMBER;
private final int numberCount;
private final int[] numbers;
private int missingCount;
public long finderTime;
public BatchTest(int numberCount) {
this.numberCount = numberCount;
numbers = new int[numberCount];
for (int i = 0; i < numberCount; i++)
numbers[i] = minNumber + i;
}
private int passBound() {
int mBound = missingCount > 0 ? missingCount : 1;
int nBound = 34 - Integer.numberOfLeadingZeros(numberCount - 1); // ceil(log_2(numberCount)) + 2
return Math.min(mBound, nBound);
}
private void error(String cause) {
throw new RuntimeException("Error on '" + missingCount + " from " + numberCount + "' test, " + cause);
}
// returns the number of times the input array was traversed in this test
public int makeTest(int missingCount) {
this.missingCount = missingCount;
// numbers array is reused when numberCount stays the same,
// just Fisher–Yates shuffle it for each test
for (int i = numberCount - 1; i > 0; i--) {
int j = rand.nextInt(i + 1);
if (i != j) {
int t = numbers[i];
numbers[i] = numbers[j];
numbers[j] = t;
}
}
final int bagSize = numberCount - missingCount;
NumberBag inputBag = NumberBag.fromArray(numbers, 0, bagSize);
finderTime -= System.nanoTime();
int[] found = MissingNumbers.find(minNumber, minNumber + numberCount - 1, inputBag);
finderTime += System.nanoTime();
if (inputBag.getPassCount() > passBound())
error("too many passes (" + inputBag.getPassCount() + " while only " + passBound() + " allowed)");
if (found.length != missingCount)
error("wrong result length");
int j = bagSize; // "missing" part beginning in numbers
Arrays.sort(numbers, bagSize, numberCount);
for (int i = 0; i < missingCount; i++)
if (found[i] != numbers[j++])
error("wrong result array, " + i + "-th element differs");
return inputBag.getPassCount();
}
public static void strideCheck(int numberCount, int minMissing, int maxMissing, int step, int repeats) {
BatchTest t = new BatchTest(numberCount);
System.out.println("╠═══════════════════════╬═════════════════╬═════════════════╣");
for (int missingCount = minMissing; missingCount <= maxMissing; missingCount += step) {
int minPass = Integer.MAX_VALUE;
int passSum = 0;
int maxPass = 0;
t.finderTime = 0;
for (int j = 1; j <= repeats; j++) {
int pCount = t.makeTest(missingCount);
if (pCount < minPass)
minPass = pCount;
passSum += pCount;
if (pCount > maxPass)
maxPass = pCount;
}
System.out.format("║ %9d %9d ║ %2d %5.2f %2d ║ %11.3f ║%n", missingCount, numberCount, minPass,
(double)passSum / repeats, maxPass, t.finderTime * 1e-6 / repeats);
}
}
public static void main(String[] args) {
System.out.println("╔═══════════════════════╦═════════════════╦═════════════════╗");
System.out.println("║ Number count ║ Passes ║ Average time ║");
System.out.println("║ missimg total ║ min avg max ║ per search (ms) ║");
long time = System.nanoTime();
strideCheck(100, 0, 100, 1, 20_000);
strideCheck(100_000, 2, 99_998, 1_282, 15);
MIN_NUMBER = -2_000_000_000;
strideCheck(300_000_000, 1, 10, 1, 1);
time = System.nanoTime() - time;
System.out.println("╚═══════════════════════╩═════════════════╩═════════════════╝");
System.out.format("%nSuccess. Total time: %.2f s.%n", time * 1e-9);
}
}
Try them out on Ideone
I think this can be done without any complex mathematical equations and theories. Below is a proposal for an in place and O(2n) time complexity solution:
Input form assumptions :
# of numbers in bag = n
# of missing numbers = k
The numbers in the bag are represented by an array of length n
Length of input array for the algo = n
Missing entries in the array (numbers taken out of the bag) are replaced by the value of the first element in the array.
Eg. Initially bag looks like [2,9,3,7,8,6,4,5,1,10].
If 4 is taken out, value of 4 will become 2 (the first element of the array).
Therefore after taking 4 out the bag will look like [2,9,3,7,8,6,2,5,1,10]
The key to this solution is to tag the INDEX of a visited number by negating the value at that INDEX as the array is traversed.
IEnumerable<int> GetMissingNumbers(int[] arrayOfNumbers)
{
List<int> missingNumbers = new List<int>();
int arrayLength = arrayOfNumbers.Length;
//First Pass
for (int i = 0; i < arrayLength; i++)
{
int index = Math.Abs(arrayOfNumbers[i]) - 1;
if (index > -1)
{
arrayOfNumbers[index] = Math.Abs(arrayOfNumbers[index]) * -1; //Marking the visited indexes
}
}
//Second Pass to get missing numbers
for (int i = 0; i < arrayLength; i++)
{
//If this index is unvisited, means this is a missing number
if (arrayOfNumbers[i] > 0)
{
missingNumbers.Add(i + 1);
}
}
return missingNumbers;
}
Thanks for this very interesting question:
It's because you reminded me Newton's work which really can solve this problem
Please refer Newton's Identities
As number of variables to find = number of equations (must for consistency)
I believe for this we should raise power to bag numbers so as to create number of different equations.
I don't know but, I believe if there should a function say f for which we'll add f( xi )
x1 + x2 + ... + xk = z1
x12 + x22 + ... + xk2 = z2
............
............
............
x1k + x2k + ... + xkk = zk
rest is a mathematical work not sure about time and space complexity but Newton's Identities will surely play important role.
Can't we use set theory
.difference_update() or Is there any chance of Linear Algebra in this question method?
You'd probably need clarification on what O(k) means.
Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum bitwise ANDed with 2^i is zero, then i is missing. (Or numerically, if floor of the sum divided by 2^i is even. Or sum modulo 2^(i+1)) < 2^i.)
Easy, right? O(N) time, O(1) storage, and it supports arbitrary k.
Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.
So you could be clever and compute the sum and the sum of squares and the sum of cubes... up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?
I have read all thirty answers and found the simplest one i.e to use a bit array of 100 to be the best. But as the question said we can't use an array of size N, I would use O(1) space complexity and k iterations i.e O(NK) time complexity to solve this.
To make the explanation simpler, consider I have been given numbers from 1 to 15 and two of them are missing i.e 9 and 14 but I don't know. Let the bag look like this:
[8,1,2,12,4,7,5,10,11,13,15,3,6].
We know that each number is represented internally in the form of bits.
For numbers till 16 we only need 4 bits. For numbers till 10^9, we will need 32 bits. But let's focus on 4 bits and then later we can generalize it.
Now, assume if we had all the numbers from 1 to 15, then internally, we would have numbers like this (if we had them ordered):
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
But now we have two numbers missing. So our representation will look something like this (shown ordered for understanding but can be in any order):
(2MSD|2LSD)
00|01
00|10
00|11
-----
01|00
01|01
01|10
01|11
-----
10|00
missing=(10|01)
10|10
10|11
-----
11|00
11|01
missing=(11|10)
11|11
Now let's make a bit array of size 2 that holds the count of numbers with corresponding 2 most significant digits. i.e
= [__,__,__,__]
00,01,10,11
Scan the bag from left and right and fill the above array such that each of bin of bit array contains the count of numbers. The result will be as under:
= [ 3, 4, 3, 3]
00,01,10,11
If all the numbers would have been present, it would have looked like this:
= [ 3, 4, 4, 4]
00,01,10,11
Thus we know that there are two numbers missing: one whose most 2 significant digits are 10 and one whose most 2 significant bits are 11. Now scan the list again and fill out a bit array of size 2 for the lower 2 significant digits. This time, only consider elements whose most 2 significant digits are 10. We will have the bit array as:
= [ 1, 0, 1, 1]
00,01,10,11
If all numbers of MSD=10 were present, we would have 1 in all the bins but now we see that one is missing. Thus we have the number whose MSD=10 and LSD=01 is missing which is 1001 i.e 9.
Similarly, if we scan again but consider only elements whose MSD=11,we get MSD=11 and LSD=10 missing which is 1110 i.e 14.
= [ 1, 0, 1, 1]
00,01,10,11
Thus, we can find the missing numbers in a constant amount of space. We can generalize this for 100, 1000 or 10^9 or any set of numbers.
References: Problem 1.6 in http://users.ece.utexas.edu/~adnan/afi-samples-new.pdf
Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:
missing = (1..100).to_a - bag
It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.
You could try using a Bloom Filter. Insert each number in the bag into the bloom, then iterate over the complete 1-k set until reporting each one not found. This may not find the answer in all scenarios, but might be a good enough solution.
I'd take a different approach to that question and probe the interviewer for more details about the larger problem he's trying to solve. Depending on the problem and the requirements surrounding it, the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-afterward approach might not.
For example, it might be that the interviewer is going to dispatch n messages and needs to know the k that didn't result in a reply and needs to know it in as little wall clock time as possible after the n-kth reply arrives. Let's also say that the message channel's nature is such that even running at full bore, there's enough time to do some processing between messages without having any impact on how long it takes to produce the end result after the last reply arrives. That time can be put to use inserting some identifying facet of each sent message into a set and deleting it as each corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove its identifier from the set, which in typical implementations takes O(log k+1). After that, the set contains the list of k missing elements and there's no additional processing to be done.
This certainly isn't the fastest approach for batch processing pre-generated bags of numbers because the whole thing runs O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k)). But it does work for any value of k (even if it's not known ahead of time) and in the example above it was applied in a way that minimizes the most critical interval.
This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.
So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.
You can motivate the solution by thinking about it in terms of symmetries (groups, in math language). No matter the order of the set of numbers, the answer should be the same. If you're going to use k functions to help determine the missing elements, you should be thinking about what functions have that property: symmetric. The function s_1(x) = x_1 + x_2 + ... + x_n is an example of a symmetric function, but there are others of higher degree. In particular, consider the elementary symmetric functions. The elementary symmetric function of degree 2 is s_2(x) = x_1 x_2 + x_1 x_3 + ... + x_1 x_n + x_2 x_3 + ... + x_(n-1) x_n, the sum of all products of two elements. Similarly for the elementary symmetric functions of degree 3 and higher. They are obviously symmetric. Furthermore, it turns out they are the building blocks for all symmetric functions.
You can build the elementary symmetric functions as you go by noting that s_2(x,x_(n+1)) = s_2(x) + s_1(x)(x_(n+1)). Further thought should convince you that s_3(x,x_(n+1)) = s_3(x) + s_2(x)(x_(n+1)) and so on, so they can be computed in one pass.
How do we tell which items were missing from the array? Think about the polynomial (z-x_1)(z-x_2)...(z-x_n). It evaluates to 0 if you put in any of the numbers x_i. Expanding the polynomial, you get z^n-s_1(x)z^(n-1)+ ... + (-1)^n s_n. The elementary symmetric functions appear here too, which is really no surprise, since the polynomial should stay the same if we apply any permutation to the roots.
So we can build the polynomial and try to factor it to figure out which numbers are not in the set, as others have mentioned.
Finally, if we are concerned about overflowing memory with large numbers (the nth symmetric polynomial will be of the order 100!), we can do these calculations mod p where p is a prime bigger than 100. In that case we evaluate the polynomial mod p and find that it again evaluates to 0 when the input is a number in the set, and it evaluates to a non-zero value when the input is a number not in the set. However, as others have pointed out, to get the values out of the polynomial in time that depends on k, not N, we have to factor the polynomial mod p.
I believe I have a O(k) time and O(log(k)) space algorithm, given that you have the floor(x) and log2(x) functions for arbitrarily big integers available:
You have an k-bit long integer (hence the log8(k) space) where you add the x^2, where x is the next number you find in the bag: s=1^2+2^2+... This takes O(N) time (which is not a problem for the interviewer). At the end you get j=floor(log2(s)) which is the biggest number you're looking for. Then s=s-j and you do again the above:
for (i = 0 ; i < k ; i++)
{
j = floor(log2(s));
missing[i] = j;
s -= j;
}
Now, you usually don't have floor and log2 functions for 2756-bit integers but instead for doubles. So? Simply, for each 2 bytes (or 1, or 3, or 4) you can use these functions to get the desired numbers, but this adds an O(N) factor to time complexity
Try to find the product of numbers from 1 to 50:
Let product, P1 = 1 x 2 x 3 x ............. 50
When you take out numbers one by one, multiply them so that you get the product P2. But two numbers are missing here, hence P2 < P1.
The product of the two mising terms, a x b = P1 - P2.
You already know the sum, a + b = S1.
From the above two equations, solve for a and b through a quadratic equation. a and b are your missing numbers.

Counting the bits set in the Fibonacci number system?

We know that, each non negative decimal number can be represented uniquely by sum of Fibonacci numbers(here we are concerned about minimal representation i.e- no consecutive Fibonacci numbers are taken in the representation of a number and also each Fibonacci number is taken at most one in the representation).
For example:
1-> 1
2-> 10
3->100
4->101, here f1=1 , f2=2 and f(n)=f(n-1)+f(n-2);
so each decimal number can be represented in the Fibonacci system as a binary sequence. If we write all natural numbers successively in Fibonacci system, we will obtain a sequence like this: 110100101… This is called “Fibonacci bit sequence of natural numbers”.
My task is is counting the numbers of times that bit 1 appears in first N bits of this sequence.Since N can take value from 1 to 10^15,Can i do this without storing the Fibonacci sequence ?
for example: if N is 5,the answer is 3.
So this is just a preliminary sketch of an algorithm. It works when the upper bound is itself a Fibonacci number, but I'm not sure how to adapt it for general upper bounds. Hopefully someone can improve upon this.
The general idea is to look at the structure of the Fibonacci encodings. Here are the first few numbers:
0
1
10
100
101
1000
1001
1010
10000
10001
10010
10100
10101
100000
The invariant in each of these numbers is that there's never a pair of consecutive 1s. Given this invariant, we can increment from one number to the next using the following pattern:
If the last digit is 0, set it to 1.
If the last digit is 1, then since there aren't any consecutive 1s, set the last digit to 0 and the next digit to 1.
Eliminate any doubled 1s by setting them both to 0 and setting the next digit to a 1, repeating until all doubled 1s are eliminated.
The reason that this is important is that property (3) tells us something about the structure of these numbers. Let's revisit the first few Fibonacci-encoded numbers once more. Look, for example, at the first three numbers:
00
01
10
Now, look at all four-bit numbers:
1000
1001
1010
The next number will have five digits, as shown here:
1011 → 1100 → 10000
The interesting detail to notice is that the number of numbers with four digits is equal to the number of values with up to two digits. In fact, we get the four-digit numbers by just prefixing the at-most-two-digit-numbers with 10.
Now, look at three-digit numbers:
000
001
010
100
101
And look at five-digit numbers:
10000
10001
10010
10100
10101
Notice that the five-digit numbers are just the three-digit numbers with 10 prefixed.
This gives us a very interesting way for counting up how many 1s there are. Specifically, if you look at (k+2)-digit numbers, each of them is just a k-digit number with a 10 prefixed to it. This means that if there are B 1s total in all of the k-digit numbers, the number of Bs total in numbers that are just k+2 digits is equal to B plus the number of k-digit numbers, since we're just replaying the sequence with an extra 1 prepended to each number.
We can exploit this to compute the number of 1s in the Fibonacci codings that have at most k digits in them. The trick is as follows - if for each number of digits we keep track of
How many numbers have at most that many digits (call this N(d)), and
How many 1s are represented numbers with at most d digits (call this B(d)).
We can use this information to compute these two pieces of information for one more digit. It's a beautiful DP recurrence. Initially, we seed it as follows. For one digit, N(d) = 2 and B(d) is 1, since for one digit the numbers are 0 and 1. For two digits, N(d) = 3 (there's just one two-digit number, 10, and the two one-digit numbers 0 and 1) and B(d) is 2 (one from 1, one from 10). From there, we have that
N(d + 2) = N(d) + N(d + 1). This is because the number of numbers with up to d + 2 digits is the number of numbers with up to d + 1 digits (N(d + 1)), plus the numbers formed by prefixing 10 to numbers with d digits (N(d))
B(d + 2) = B(d + 1) + B(d) + N(d) (The number of total 1 bits in numbers of length at most d + 2 is the total number of 1 bits in numbers of length at most d + 1, plus the extra we get from numbers of just d + 2 digits)
For example, we get the following:
d N(d) B(d)
---------------------
1 2 1
2 3 2
3 5 5
4 8 10
5 13 20
We can actually check this. For 1-digit numbers, there are a total of 1 one bit used. For 2-digit numbers, there are two ones (1 and 10). For 3-digit numbers, there are five 1s (1, 10, 100, 101). For four-digit numbers, there are 10 ones (the five previous, plus 1000, 1001, 1010). Extending this outward gives us the sequence that we'd like.
This is extremely easy to compute - we can compute the value for k digits in time O(k) with just O(1) memory usage if we reuse space from before. Since the Fibonacci numbers grow exponentially quickly, this means that if we have some number N and want to find the sum of all 1s bits to the largest Fibonacci number smaller than N, we can do so in time O(log N) and space O(1).
That said, I'm not sure how to adapt this to work with general upper bounds. However, I'm optimistic that there is some way to do it. This is a beautiful recurrence and there just has to be a nice way to generalize it.
Hope this helps! Thanks for an awesome problem!
Lest solve 3 problems. Each next is harder then previous, each one uses result of previous.
1. How many ones are set if you write down every number from 0 to fib[i]-1.
Call this dp[i]. Lets look at the numbers
0
1
10
100
101
1000
1001
1010 <-- we want to count ones up to here
10000
If you write all numbers up to fib[i]-1, first you write all numbers up to fib[i-1]-1 (dp[i-1]), then you write the last block of numbers. There are exactly fib[i-2] of those numbers, each has a one on the first position, so we add fib[i-2], and if you erase those ones
000
001
010
then remove leading zeros, you can see that each number from 0 to fib[i-2]-1 is written down. Numbers of one there is equal to dp[i-2], which gives us:
dp[i] = fib[i-2] + dp[i-2] + dp[i-1];
2. How many ones are set if you write down every number from 0 to n.
0
1
10
100
101
1000
1001 <-- we want to count ones up to here
1010
Lets call this solNumber(n)
Suppose, that your number is f[i] + x, where f[i] is a maximum possible fibonacci number. Then anser if dp[i] + solNumber(x). This can be proved in the same way as in point 1.
3. How many ones are set in first n digits.
3a. How many numbers have representation length exactly l
if l = 1 the answer is 1, else its fib[l-2] + 1.
You can note, that if you erase leading ones and then all leading zeros you'll have each number from 0 to fib[l-1]-1. Exactly fib[l] numbers.
//End of 3a
Now you can find such number m than, if you write all numbers from 1 to m, their total length will be <=n. But if you write all from 1 to m+1, total length will be > n. Solve the problem manually for m+1 and add solNumber(m).
All 3 problems are solved in O(log n)
#include <iostream>
using namespace std;
#define FOR(i, a, b) for(int i = a; i < b; ++i)
#define RFOR(i, b, a) for(int i = b - 1; i >= a; --i)
#define REP(i, N) FOR(i, 0, N)
#define RREP(i, N) RFOR(i, N, 0)
typedef long long Long;
const int MAXL = 30;
long long fib[MAXL];
//How much ones are if you write down the representation of first fib[i]-1 natural numbers
long long dp[MAXL];
void buildDP()
{
fib[0] = 1;
fib[1] = 1;
FOR(i,2,MAXL)
fib[i] = fib[i-1] + fib[i-2];
dp[0] = 0;
dp[1] = 0;
dp[2] = 1;
FOR(i,3,MAXL)
dp[i] = fib[i-2] + dp[i-2] + dp[i-1];
}
//How much ones are if you write down the representation of first n natural numbers
Long solNumber(Long n)
{
if(n == 0)
return n;
Long res = 0;
RREP(i,MAXL)
if(n>=fib[i])
{
n -= fib[i];
res += dp[i];
res += (n+1);
}
return res;
}
int solManual(Long num, Long n)
{
int cr = 0;
RREP(i,MAXL)
{
if(n == 0)
break;
if(num>=fib[i])
{
num -= fib[i];
++cr;
}
if(cr != 0)
--n;
}
return cr;
}
Long num(int l)
{
if(l<=2)
return 1;
return fib[l-1];
}
Long sol(Long n)
{
//length of fibonacci representation
int l = 1;
//totatl acumulated length
int cl = 0;
while(num(l)*l + cl <= n)
{
cl += num(l)*l;
++l;
}
//Number of digits, that represent numbers with maxlength
Long nn = n - cl;
//Number of full numbers;
Long t = nn/l;
//The last full number
n = fib[l] + t-1;
return solNumber(n) + solManual(n+1, nn%l);
}
int main(int argc, char** argv)
{
ios_base::sync_with_stdio(false);
buildDP();
Long n;
while(cin>>n)
cout<<"ANS: "<<sol(n)<<endl;
return 0;
}
Compute m, the number responsible for the (N+1)th bit of the sequence. Compute the contribution of m to the count.
We have reduced the problem to counting the number of one bits in the range [1, m). In the style of interval trees, partition this range into O(log N) subranges, each having an associated glob like 10100???? that matches the representations of exactly the numbers belonging to that range. It is easy to compute the contribution of the prefixes.
We have reduced the problem to counting the total number T(k) of one bits in all Fibonacci words of length k (i.e., the ???? part of the globs). T(k) is given by the following recurrence.
T(0) = 0
T(1) = 1
T(k) = T(k - 1) + T(k - 2) + F(k - 2)
Mathematica says there's a closed form solution, but it looks awful and isn't needed for this polylog(N)-time algorithm.
This is not a full answer but it does outline how you can do this calculation without using brute force.
The Fibonacci representation of Fn is a 1 followed by n-1 zeros.
For the numbers from Fn up to but not including F(n+1), the number of 1's consists of two parts:
There are F(n-1) such numbers, so there are F(n-1) leading 1's.
The binary digits after the leading numbers are just the binary representations of all numbers up to but not including F(n-1).
So, if we call the total number of bits in the sequence up to but not including the nth Fibonacci number an, then we have the following recursion:
a(n+1) = an + F(n-1) + a(n-1)
You can also easily get the number of bits in the sequence up to Fn.
If it takes k Fibonacci numbers to get to (but not pass) N, then you can count those bits with the above formula, and after some further manipulation reduce the problem to counting the number of bits in the remaining sequence.
[Edit] : Basically I have followed the property that for any number n which is to be represented in fibonacci base, we can break it as n = n - x where x is the largest fibonacci just less than n. Using this property, any number can be broken in bit form.
First step is finding the decimal number such that Nth bit ends in it.
We can see that all numbers between fibonacci number F(n) and F(n+1) will have same number of bits. Using this, we can pre-calculate a table and find the appropriate number.
Lets say that you have the decimal number D at which there is the Nth bit.
Now, let X be the largest fibonacci number lesser than or equal to D.
To find set bits for all numbers from 1 to D we represnt it as ...
X+0, X+1, X+2, .... X + D-X. So, all the X will be repsented by 1 at the end and we have broken the problem into a much smaller sub-problem. That is, we need to find all set bits till D-X. We keep doing this recusively. Using the same logic, we can build a table which has appropriate number of set bits count for all fibonacci numbers (till limit). We would use this table for finding number of set bits from 1 to X.
So,
Findsetbits(D) { // finds number of set bits from 1 to D.
find X; // largest fibonacci number just less than D
ans = tablesetbits[X];
ans += 1 * (D-x+1); // All 1s at the end due to X+0,X+1,...
ans += Findsetbits(D-x);
return ans;
}
I tried some examples by hand and saw the pattern.
I have coded a rough solution which I have checked by hand for N <= 35. It works pretty fast for large numbers, though I can't be sure that it is correct. If it is an online judge problem, please give the link to it.
#include<iostream>
#include<vector>
#include<map>
#include<algorithm>
using namespace std;
#define pb push_back
typedef long long LL;
vector<LL>numbits;
vector<LL>fib;
vector<LL>numones;
vector<LL>cfones;
void init() {
fib.pb(1);
fib.pb(2);
int i = 2;
LL c = 1;
while ( c < 100000000000000LL ) {
c = fib[i-1] + fib[i-2];
i++;
fib.pb(c);
}
}
LL answer(LL n) {
if (n <= 3) return n;
int a = (lower_bound(fib.begin(),fib.end(),n))-fib.begin();
int c = 1;
if (fib[a] == n) {
c = 0;
}
LL ans = cfones[a-1-c] ;
return ans + answer(n - fib[a-c]) + 1 * (n - fib[a-c] + 1);
}
int fillarr(vector<int>& a, LL n) {
if (n == 0)return -1;
if (n == 1) {
a[0] = 1;
return 0;
}
int in = lower_bound(fib.begin(),fib.end(),n) - fib.begin(),v=0;
if (fib[in] != n) v = 1;
LL c = n - fib[in-v];
a[in-v] = 1;
fillarr(a, c);
return in-v;
}
int main() {
init();
numbits.pb(1);
int b = 2;
LL c;
for (int i = 1; i < fib.size()-2; i++) {
c = fib[i+1] - fib[i] ;
c = c*(LL)b;
b++;
numbits.pb(c);
}
for (int i = 1; i < numbits.size(); i++) {
numbits[i] += numbits[i-1];
}
numones.pb(1);
cfones.pb(1);
numones.pb(1);
cfones.pb(2);
numones.pb(1);
cfones.pb(5);
for (int i = 3; i < fib.size(); i++ ) {
LL c = 0;
c += cfones[i-2]+ 1 * fib[i-1];
numones.pb(c);
cfones.pb(c + cfones[i-1]);
}
for (int i = 1; i < numones.size(); i++) {
numones[i] += numones[i-1];
}
LL N;
cin>>N;
if (N == 1) {
cout<<1<<"\n";
return 0;
}
// find the integer just before Nth bit
int pos;
for (int i = 0;; i++) {
if (numbits[i] >= N) {
pos = i;
break;
}
}
LL temp = (N-numbits[pos-1])/(pos+1);
LL temp1 = (N-numbits[pos-1]);
LL num = fib[pos]-1 + (temp1>0?temp+(temp1%(pos+1)?1:0):0);
temp1 -= temp*(pos+1);
if(!temp1) temp1 = pos+1;
vector<int>arr(70,0);
int in = fillarr(arr, num);
int sub = 0;
for (int i = in-(temp1); i >= 0; i--) {
if (arr[i] == 1)
sub += 1;
}
cout<<"\nNumber answer "<<num<<" "<<answer(num) - sub<<"\n";
return 0;
}
Here is O((log n)^3).
Lets compute how many numbers fits in first N bits
Imagine that we have function:
long long number_of_all_bits_in_sequence(long long M);
It computes length of "Fibonacci bit sequence of natural numbers" created by all numbers that aren't greater than M.
With this function we could use binary search to find how many numbers fits in the first N bits.
How many bits are 1's in representation of first M numbers
Lets create function which calculates how many numbers <= M have 1 at k-th bit.
long long kth_bit_equal_1(long long M, int k);
First lets preprocess results of this function for all small values, lets say M <= 1000000.
Implementation for M > PREPROCESS_LIMIT:
long long kth_bit_equal_1(long long M, int k) {
if (M <= PREPROCESS_LIMIT) return preprocess_result[M][k];
long long fib_number = greatest_fib_which_isnt_greater_than(M);
int fib_index = index_of_fib_in_fibonnaci_sequence(fib);
if (fib_index < k) {
// all numbers are smaller than k-th fibbonacci number
return 0;
}
if (fib_index == k) {
// only numbers between [fib_number, M] have k-th bit set to 1
return M - fib_number + 1;
}
if (fib_index > k) {
long long result = 0;
// all numbers between [fib_number, M] have bit at fib_index set to 1
// so lets subtrack fib_number from all numbers in this interval
// now this interval is [0, M - fib_number]
// lets calculate how many numbers in this inteval have k-th bit set.
result += kth_bit_equal_1(M - fib_number, k);
// don't forget about remaining numbers (interval [1, fib_number - 1])
result += kth_bit_equal_1(fib_number - 1, k);
return result;
}
}
Complexity of this function is O(M / PREPROCESS_LIMIT).
Notice that in reccurence one of the addends is always one of fibbonaci numbers.
kth_bit_equal_1(fib_number - 1, k);
So if we memorize all computed results than complexity will improve to T(N) = T(N/2) + O(1) . T(n) = O(log N).
Lets get back to number_of_all_bits_in_sequence
We can slighly modify kth_bit_equal_1 so it would also count bits equal to 0.
Here's a way to count all the one digits in the set of numbers up to a given digit length bound. This seems to me to be a reasonable starting point for a solution
Consider 10 digits. Start by writing;
0000000000
Now we can turn some number of these zeros into ones, keeping the last digit always as a 0. Consider the possibilities case by case.
0 There's just one way to chose 0 of these to be ones. Summing the 1-bits in this one case gives 0.
1 There are {9 choose 1} ways to turn one of the zeros into a one. Each of these contributes 1.
2 There are {8 choose 2} ways to turn two of the zeros into ones. Each of these contributes 2.
...
5 There are {5 choose 5} ways to turn five of the zeros into ones. Each of these contributes 5 to the bit count.
It's easy to think of this as a tiling problem. The string of 10 zeros is a 10x1 board, which we want to tile with 1x1 squares and 2x1 dominoes. Choosing some number of the zeros to be ones is then the same as choosing some of the tiles to be dominoes. My solution is closely related to Identity 4 in "Proofs that really count" by Benjamin and Quinn.
Second step Now try to use the above construction to solve the original problem
Suppose we want to the one bits in the first 100100010 bits (the number is in Fibonacci representation of course). Start by overcounting the sum for all ways to replace the x's with zeros and ones in 10xxxxx0. To overcompensate for overcounting, subract the count for 10xxx0. Continue the procedure of overcounting and overcompensation.
This problem has a dynamic solution, as illustrated by the tested algorithm below.
Some points to keep in mind, which are evident in the code:
The best solution for each number i will be obtained by using the fibonacci number f where f == i
OR where f is less than i then it must be f and the greatest number n <= f: i = f+n.
Note that the fib sequence is memoized over the entire algorithm.
public static int[] fibonacciBitSequenceOfNaturalNumbers(int num) {
int[] setBits = new int[num + 1];
setBits[0] = 0;//anchor case of fib seq
setBits[1] = 1;//anchor case of fib seq
int a = 1, b = 1;//anchor case of fib seq
for (int i = 2; i <= num; i++) {
int c = b;
while (c < i) {
c = a + b;
a = b;
b = c;
}//fib
if (c == i) {
setBits[i] = 1;
continue;
}
c = a;
int tmp = c;//to optimize further, make tmp the fib before a
while (c + tmp != i) {
tmp--;
}
setBits[i] = 1 + setBits[tmp];
}//done
return setBits;
}
Test with:
public static void main(String... args) {
int[] arr = fibonacciBitSequenceOfNaturalNumbers(23);
//print result
for(int i=1; i<arr.length; i++)
System.out.format("%d has %d%n", i, arr[i]);
}
RESULT OF TEST: i has x set bits
1 has 1
2 has 1
3 has 1
4 has 2
5 has 1
6 has 2
7 has 2
8 has 1
9 has 2
10 has 2
11 has 2
12 has 3
13 has 1
14 has 2
15 has 2
16 has 2
17 has 3
18 has 2
19 has 3
20 has 3
21 has 1
22 has 2
23 has 2
EDIT BASED ON COMMENT:
//to return total number of set between 1 and n inclusive
//instead of returning as in original post, replace with this code
int total = 0;
for(int i: setBits)
total+=i;
return total;

Resources